648
Views
1
CrossRef citations to date
0
Altmetric
Original Articles

Is deep dreaming the new collage?

Pages 268-275 | Received 25 Jul 2016, Accepted 12 Jun 2017, Published online: 02 Nov 2017

ABSTRACT

Deep dreaming (DD) can combine and transform images in surprising ways. But, being based in deep learning (DL), it is not analytically understood. Collage is an art form that is constrained along various dimensions. DD will not be able to generate collages until DL can be guided in a disciplined fashion.

1. Introduction

Deep dreaming (DD) is a computer technique for generating novel images that was released by Google in July 2015 as the open-source DeepDream software. (It is based on GoogLeNet, the winner of the annual image-recognition competition in 2014: Szegedy et al., Citation2014.) It typically generates highly unusual, and often “hallucinatory”, images. Some people, accordingly, believe it to have interesting aesthetic potential.

It has been used to modify a number of well-known artworks. So it has been applied, for instance, to C. M. Coolidge's oil paintings of “Dogs Playing Poker” (commissioned in 1913 to advertise cigars). The DD versions have more dogs in the picture, with “malformed” heads of various types. Coolidge is not alone. DD images have also been based on works by van Gogh, Hieronymus Bosch, Leonardo da Vinci, among others, the list is endless. And they have been generated in their thousands from snapshots of people's relatives and friends. Clearly, then, every visual image can be grist to the DD mill.

Whether that “mill” can reasonably be expected to generate anything of genuine aesthetic interest – perhaps even a new artistic genre – is another matter. That, in a nutshell, is the question addressed by this paper.

2. What is DD?

DD is an application of deep learning, or DL.

In essence, the DD researcher feeds an image into a multilayer DL network (or utilises a randomly chosen image that is already there), and asks the system to focus on a certain feature, selected either deliberately or at random. The system is next required to modify the image so as to emphasise that feature. This process is then iterated – perhaps very many times.

In straightforward DL, multilayer neural networks find patterns on several hierarchical levels (LeCun, Bengio, & Hinton, Citation2015). The output of one level is used as the input to the one above it, and so on for (sometimes) very many levels. Finally, the last one or two levels are tuned by backprop.

As a technique for machine learning, DL has already been hugely successful. The DeepMind team, for instance, led by Demis Hassabis and now attached to Google, have used it to learn to play the classic suite of 49 Atari games – in 22 cases, even better than professional game testers (Mnih et al., Citation2015). The same team later produced a DL system, AlphaGo, that beat the European Go champion 5–0 in October 2015 (Silver et al., Citation2016), and defeated the world's leading player, Lee Sedol, in March 2016. Since Go has long been recognised as a much more difficult game than chess, it is not surprising that these feats gained huge attention in the media worldwide.

So far, so good. And, presumably, soon to be better still. (DeepMind are currently experimenting with artificial neural networks of over 100 layers.)

But DL suffers from two grave problems. Namely, the patterns found at each level cannot be easily identified. And the next step (e.g. the Go-player's next move) cannot be reliably predicted.

DD, being an application of DL, suffers from the same problems. As we will see in Section 5, this prevents it from being hailed as “the new collage”.

3. What is collage?

Collage is a genre lying between realism and abstraction. It is a form of combinational creativity (Boden 2004), for the picture is made up of (usually) seemingly unrelated items. It is also, often, a form of multi-media, where paint is combined with paper, fabric, and even wood and metal. It may be purely decorative, or it may have some identifiable theme.

Collage was pioneered by Pablo Picasso and Georges Braques, who soon gained many followers. For example, most of their many works (in 1912/1913) depicting violins or guitars were not paintings but visual representations made up of various sorts of paper (art paper, wallpapers, newspaper, etc.); a few also involved bits of wood, string, and/or wire. Although the mixed-media approach was unusual, the general aim was not. These works (and many others produced at that time) were relatively realistic representations, instantly recognisable as a violin or a guitar.

With Kurt Schwitters (1887–1948), the media became even more mixed, and the realism waned (Elderfield, Citation1985). Schwitters used found objects such as bus and theatre tickets, newsprint, wire, wood, and nails. Indeed, in 1918 he introduced himself to the artist Raoul Hausmann by saying “I'm a painter, and I nail my pictures together!” (Elderfield, Citation1985, p. 35).

Schwitters' early art – until the age of 32 – had been naturalistic landscapes and portraits. But from 1919 onwards, he was one of the most influential practitioners of collage – which he termed Merzwerk. The word “Merz”, he said, “denotes the combination of all conceivable materials for artistic purposes … a perambulator wheel, wire netting, string and cotton wool are factors having equal rights with paint”.

Later, he expanded the term to include his poetry and other artistic activities (Elderfield, Citation1985, p. 12f.). For Schwitters was a true aesthetic adventurer (Cardinal & Webster, Citation2012). He explored abstract drama and poetry, cabaret and performance art, body painting, typography, architecture and installation art, and other experimental genres. Today, he is seen as a forerunner of movements such as Pop art, conceptual art, installation art, and happenings. But he is famous above all for his collages.

The foremost examples of Merzwerk were done in 1919–1923. They expressed a disturbing (although optimistic) political vision, wherein “new art forms [were made] out of the remains of a former culture” (Elderfield, Citation1985, p. 12f). Schwitters went through the 1914–1918 war as a technical draughtsman, and experienced the ensuing chaos in his native Germany. “In the war,” he said,

things were in terrible turmoil. What I had learned at the Academy was of no use to me and the useful new ideas were still unready … . Everything had broken down and new things had to be made out of the fragments; and this is Merz.

These “new things” were aesthetically subversive, as well as being politically unsettling. The aesthetic shock lay not merely in the fact that the objects in his collages (like Marcel Duchamps' urinal: Fountain 1917) usually had not been made by him. Worse: they were rubbish, often broken fragments, discarded by others and picked up by Schwitters from café-tables or on the streets. As he put it: “One can even shout out through refuse, and this is what I did, nailing and gluing it together” (Elderfield, Citation1985, p. 35).

This highly unorthodox art was denounced by the Nazis, and included in their infamous “Degenerate Art” exhibitions of 1933–1937. After fleeing in 1937 to Norway and (when the Nazis invaded) in 1940 to England, Schwitters occasionally painted naturalistic landscapes and portraits as a way to earn a living. But he continued to work on collage throughout his life. Living in London from 1941 to 1945, he drew black inspiration – and tempting detritus – from the bombed-out landscape surrounding him. (He moved to the Lake District at the end of World War II, where he lived until his death.)

Among his most well-known assemblages are The Cherry Picture (1921: Merzbild 32A), currently in New York's Museum of Modern Art, and Picture with Flywheel (1920 and 1939: Merzbild 29A), now in Hannover's Sprengel Museum.

The Cherry Picture combines paint (a base of medium-to-dark blues) with various found objects glued or nailed to the board. These include fragments of fabric, sweet wrappers, newspaper clippings, an advertising label, and a tiny snapshot of two fluffy white kittens – plus various bits of wood, and a short-stemmed tobacco pipe. Near the top is a scarlet square overlapping an orange rectangle, and near the bottom is a scarlet triangle. And at the centre is a printed flashcard showing a twig bearing four cherries.

The cherries have given the piece its title. But why? They are geometrically central, and (with their plain white background) colourifically prominent too. But they are not thematically central. For, besides Schwitters' very general background message about cultural fragmentation (see above), The Cherry Picture has no theme. There is no conceptual link between these oddly assorted items.

That is not to say, however, that the picture is random. Quite apart from speculating on the reasons behind Schwitters' choice of individual items (the kitsch kittens, for instance), one can appreciate his placement of them in relation to each other. This placement is governed by considerations of colour, size, shape, balance, and line. To be sure, were one to shake all these items together and place them down wherever they fell, one might sometimes arrive at a new image no less aesthetically compelling than the original. But more often, the resultant combination would make no sense in purely visual terms. In short, for all its abstraction and bizarreness, The Cherry Picture – like collage in general – is no random melange.

That is not to say that collage cannot deal with conceptual themes. Schwitters himself often used tools (e.g. hammers), machines, or machine parts as metaphors for people and/or politics. Wheels, for example, are fairly often featured in his Merzwerk. (See, e.g. Small Sailors' Home, Bild 1926, 12; Revolving, 1919; Construction for Noble Ladies, 1919; The Alienist, Merzbild 1A, 1919; Santa Claus, 1922; Cicero, Bild 1926, 3; New Merz Picture, 1931; Merzpicture with Rainbow, 1939; and Heavy Relief, 1945.) Sometimes these are painted wheels, sometimes real wheels (from perambulators, perhaps); some have cogs, while others do not; and some are fragments or fragmentary intimations of (real or painted) wheels, recognisable as such if one is familiar with Schwitters' oeuvre.

Picture with Flywheel, mentioned above, is well known partly because of (contested) claims that its seeming abstraction carries political meaning. The picture was amended by him 20 years after first being made. In the second version, lighter colours are prominent, added over the dark blues/greens of the original. Also, a real cogwheel was added which can be turned only to the right (and Schwitters wrote instructions on the back including “It is forbidden to turn the wheel to the left”). Some art critics have interpreted this as a wry comment on the political environment surrounding Norway in 1939, and set to invade Norway in 1940. Others, to the contrary, see it as an ironical relic of the relatively innocent past days of farming (the wheels being seen, despite their squared-off cogs, as rural rather than industrial).

The point of relevance, here, is that a reasoned case can be made, by reference to this collage and the Merzwerk in general, for either interpretation. Much as the non-thematic works could be criticised in terms of the achievement – or the challenging (see Cicero: Bild 1926, 3) – of balance, so this possibly thematic work could be judged in terms (for instance) of its aptness for referencing what was happening in contemporary German politics.

Now, in the twenty-first century, collage is a widely accepted art form. Let us consider a recent example: a wall hanging designed in 1995 for a client whose “passions” were music and Venice (see the item Music on www.jehane.com/embroideries/textile-embroideries).

Like Schwitters, the textile-artist has employed mixed media: embroidery and silk-screen printing. Like Schwitters, too, the maker uses multiple types of content. These include naturalistic images (of architecture and musical instruments), decorative imagery (the fleur-de-lis), printed texts (both blank verse and prose), and musical notation (single symbols and snippets of musical scores).

The choice of contents is far from random, even given the fact that all fall within the two overall themes. “Music”, for instance, is represented by images of ancient stringed instruments such as lutes, lyres, viols, and harps, plus some old wind instruments, too. The instruments depicted were all popular in the thirteenth to fifteenth centuries, when Venice's power and influence were at their peak (see the remark about thematic integration, below). To have added a piano or an electric guitar alongside the other stringed instruments would have been worse than clumsy: it would have been an aesthetic mistake.

As for “Venice”, this theme prompts screen-printed images of St Mark's Basilica and the Campanile – but not of Venice's railway station. Moreover, careful examination of the whole wall hanging makes it obvious that, no matter how visually attractive the city's railway station may actually be, to have included it in the piece would have been a mistake. That is because of the strong integration of the two themes, such that many (though not quite all) items remind the viewer of the historical period when Venice's power was at its height, in both political and aesthetic terms.

One snippet of prose references the biography of Antonio Vivaldi, a prominent (seventeenth century) Venetian composer – who was not only born there, but famously trained the choir in Venice's girls' orphanage. A reference to Johann Sebastian Bach or Richard Wagner would have been a mistake, and one to Paul McCartney an absurdity. The familiar line of poetry – “If music be the food of love, play on” – was written by a (sixteenth century) successor of Venice's Golden Age, who located some of his plays in that very city: William Shakespeare. A line about music from Gerald Manley Hopkins would have been less fitting. The fleur-de-lis was a familiar heraldic symbol across Europe at that time, and represents political power as well as ancient times. And, much as the musical instruments recall that same period, so the fragments of musical scores (being drawn from the native son Vivaldi, not from Bach or Wagner) reference Venice too.

In short, this collage is a carefully crafted artwork with specific, and closely co-relevant, cultural connotations. A viewer from the Amazon jungle might, perhaps, appreciate its colours, placement, and lines. But they would be blind to its interlocking cultural aspects.

4. AI and collage – the story so far

Collage has not been entirely neglected by artificial intelligence (AI). There is even an app on the Internet (the BeFunky Collage Maker) that enables users to compose collages. But a more interesting example is a module within Simon Colton's The Painting Fool software (Colton, Citation2012; Krzeczkowska, El-Hage, Colton, & Clark, Citation2010; Cook & Colton, Citation2011).

This module is designed, in effect, to illustrate newspaper articles. It picks headlines from The Guardian website, and uses text analysis to extract the most important nouns from the news-articles concerned. These are then used as keywords to search tagged image-banks, such as GoogleImages or Flickr. Finally, the chosen images are juxtaposed, and then rendered in a more “painterly” (i.e. less photorealistic) way by another module within the Fool.

Thematic interest is near-guaranteed, because each collage is prompted by something that had previously been considered important enough to be mentioned in a newspaper headline. Thematic coherence is near-guaranteed too, because of the co-occurrence of the image-selecting keywords in the original text. (The team plan to employ sentiment analysis and emotional image-tagging to generate collages suggesting “positive” or “negative” opinions – even including subversive examples, which turn the usual emotional bias upside down.)

Thematic content, then, is reasonably well served. But visual interest and coherence are not.

A collage on the theme of War is regarded by the authors as “one of the most impressive” pieces generated by this system (Cook & Colton, Citation2011). It was kicked off by a Guardian piece about the fighting in Afghanistan, whose keywords were automatically identified as: Afghanistan, brown [??], forces, troops, Nato, British, speech, country, and Afghan. The collage combines images of a fighter plane, an explosion, a family with a small baby, a girl in ethnic headwear, and (“most poignantly of all”, say the authors) a field of war graves (see Figure 1.11 of Colton, Citation2012; or Figure 1 of Cook & Colton, Citation2011). The authors admit that it was “not rendered to a particularly high aesthetic standard”, but say that “the contents are striking” nevertheless. After all, juxtapositions of death and destruction with children will interest, even emotionally engage, most human audiences.

No details are given in these papers about precisely how the chosen images are spatially juxtaposed. But the (coloured) reproduction of this collage does not suggest any interesting visual concerns. The constituent images are not clearly separable. If they do not merely sit alongside each other, they blend into or overlap each other in ways that have (for me) no particular aesthetic rationale. And there is no clear background space (as there is in The Cherry Picture, for example). There is a central “stripe” of “hot” colours (yellow, orange, and reds), but such colours occur also at top-right and bottom-right. I suspect that the colour-placements may be coincidental.

However, it is clear that future work on The Painting Fool could, in principle, adds various criteria for guiding the placement of images. So its collages could be visually improved in a number of ways.

It is much less clear that DD, which involves a radically different type of software, holds any promise for AI-based collage.

5. Does DD measure up?

There are some similarities between DD and collage. For DD can provide, and even “explore”, novel combinations of images. Having produced one such combination, the DD system can generate countless derivatives of it, by emphasising different features in differing ways.

Moreover, the technique can combine images that are not necessarily similar. Indeed, much of the fascination (not always “attraction”) of DD is that hugely different, and perhaps also mutually irrelevant, images can be combined. For example, one can generate images that blend multiple dogs' noses with a human face, or with a basket of apples.

As noted in Section 2, the choice of initial images can be done (deliberately) by the human user or (randomly) by the machine. The option of deliberate choice may suggest that someone might use DD to build a collage of interest to them. That suggestion, however, would be misleading. For the differences between DD and collage are far more important than the similarities.

These differences (setting aside the fact that DD has no connection with mixed media) can be traced to the two problems with DL that were identified in Section 2: the difficulty of identifying particular patterns, and the lack of predictability. Both these, as remarked above, are due to researchers' lack of analytical understanding of DL in general.

Section 3 showed that artists' collages are very carefully considered. They are all composed according to various principles of theme, colour, balance, and shape.

That is clearly true of the least adventurous examples, such as Picasso's multi-papered “violins” – which are obviously violin-shaped, and instantly recognisable as such. To be sure, Picasso made many collages depicting violins, some of which are much less orthodox. But they, too, are recognisable as images representing those instruments. The thematic rationale linking the multifarious items in the Music wall hanging is not so instantly obvious, but soon becomes clear on examination. The Cherry Picture presents more of a challenge, because of the lack of thematic coherence between the constituent items. Nevertheless, although Schwitters' rationale for locating this item in that location is not always evident, there is no doubt that he had one.

Given the DL problems mentioned above, how could such aesthetic rationales be applied in DD? At present, they could not. Even if someone carefully chooses the initial image(s), they cannot guide the modifications carried out by the DD system.

Some modifications would require additional advances in AI, such as progress in computer vision. The composition (and appreciation) of Picasso's violins, for instance, requires understanding of the overall and the component shapes of a violin. The pictures are indeed mixed media (e.g. paint plus several different types of paper), but they are visually not far distant from naturalistic paintings of a violin. In that sense, they resemble visual puns.

The most prominent composer of visual puns is Giuseppe Arcimbaldo (1527–1593). He painted many highly bizarre portraits (both full-faced and in profile), made up entirely of (for instance) fruits and/or vegetables, or marine creatures such as fish and crabs.

This required careful matching of facial body-parts to image-parts drawn from a very different class of objects. His Portrait of Eve shows a normal hand and apple, but Eve's face is composed of many naked people, in various anatomical poses (the tip of her nose is the heel of someone's extended leg). And his Vortumnus, the Roman god of seasons, has a pear for a nose, wheatears for eyebrows, and pea pods for the flesh above the eyes.

The visual sensitivity to analogous shapes that was required to paint these extraordinary pictures, and that is required to appreciate them today, ranges far beyond anything that current computer vision – including visual DL – can achieve. And the disciplined attention to visual detail that was required far outstrips anything that could be done by DD. Arcimbaldo's pictures are not hallucinatory in the sense that DD images (often) are hallucinatory. For DD's “hallucinations” remind us of the loss of control, whereas Arcimbaldo's are controlled to the nth degree.

“Pure” DL involves hierarchical pattern-matching, to be sure. But the art-examples described in this paper all require more than that. Even if DL were able to recognize visual analogies (so mimicking Arcimbaldo), it would not be able to recognize or manipulate thematic matters, such as those mentioned above. That would require that DL’s multilayer pattern-matching be both analytically understood and computationally integrated with complex reasoning of various kinds.

This is not in principle impossible. Presumably, the human mind-brain does it. But how? As yet, we do not know. The triumvirate originators of DL admit that, to combine DL with complex reasoning, “new paradigms are needed” – scholarly code for We haven't got a clue! (LeCun et al., Citation2015, p. 442).

6. Conclusion

Arguably, one small clue has already been provided by the developers of AlphaGo (see Section 2). For AlphaGo combines logical-symbolic tree-search with probabilistic pattern matching. In other words, it is a hybrid system, with computational strengths drawn from both parallel distributed processing and symbolic AI.

If and when our analytical understanding of DL (and of its hybridisation) improves, it may become possible to apply DD in a carefully controlled fashion so as to generate novel images – including collages – having aesthetic interest. But without such understanding, DD can be no more than simply undisciplined playing around.

Play can be creative, of course. It can feature serendipity, and can enable tentative exploration of still half-baked ideas. Collage in general is regarded as a relatively playful genre.

Certainly, DD could serendipitously generate images that might inspire someone with artistic imagination and skill to compose interesting collages. Perhaps it has already done so. But that is a very far cry from saying that DD “is” the new collage.

Disclosure statement

No potential conflict of interest was reported by the author.

References

  • Cardinal, R., & Webster, G. (2012). Kurt Schwitters: A journey through art. Berlin: Hatje Kantz.
  • Colton, S. (2012). The painting fool: Stories from building an automated painter. In J. McCormack & M. d’Inverno (Eds.), Computers and creativity (pp. 3–38). Berlin: Springer.
  • Cook, M., & Colton, S. (2011). Automated collage generation—with more intent. Proceedings of the second international conference on computational creativity, Mexico City.
  • Elderfield, J. (1985). Kurt schwitters (pp. 1–3). London: Thames and Hudson.
  • Krzeczkowska, A., El-Hage, J., Colton, S., & Clark, S. (2010). Automated collage generation—with intent. Proceedings of the first international conference on computational creativity, Lisbon, Portugal (pp. 36–40).
  • LeCun, Y., Bengio, Y., & Hinton, G. E. (2015). Deep learning. Nature, 521, 436–444. doi: 10.1038/nature14539
  • Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., … Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518, 529–533. doi: 10.1038/nature14236
  • Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., … Hassabis, D. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529, 484–489. doi: 10.1038/nature16961
  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., … Rabinovich, A. (2014). Going deeper with convolutions. IEEE conference on computer vision and pattern recognition, Columbus Convention Center, Columbus, OH, USA (pp. 1–9).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.