2,928
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Artificial intelligence and visual discourse: a multimodal critical discourse analysis of AI-generated images of “Dementia”

, &
Received 28 Nov 2023, Accepted 29 Nov 2023, Published online: 14 Dec 2023

Figures & data

Figure 1. A screen capture of our interface for Stable Diffusion, version 1.4.

Image shows the display screen for Stable Diffusion, version 1.4. This includes the textual prompt box, into which we have typed “dementia”, and the following settings: Euler a sampling method, 20 sampling steps, -1 seed (which randomly generates seeds), batch count of 3, batch size of 1, image height and width as 512 × 512.
Figure 1. A screen capture of our interface for Stable Diffusion, version 1.4.

Figure 2. The results for the textual prompt “frog,” according to the number of sampling steps used*. * Sampler: Euler A, CFG scale: 15, Seed: 4193228899, Size: 512 × 512.

Figure shows 10 different images of frogs, each produced by a different number of steps (1, 3, 5, 7, 9, 12, 15, 20, 50 and 90). The first seven images (steps 1 to 15) show a marked improvement in the photorealistic accuracy of their representation of a frog. For the final three images (steps 20, 50 and 90) there is less of a perceptible change in the quality of the frogs produced. Across the ten different images, there are shifts in the angle of the frog and details such as colour and background.
Figure 2. The results for the textual prompt “frog,” according to the number of sampling steps used*. * Sampler: Euler A, CFG scale: 15, Seed: 4193228899, Size: 512 × 512.

Figure 3. The first generated result for each diffusion sampler, using the same image seed*. *Seed: 3463148386, Steps: 20, CFG scale: 15, Size: 512 × 704. Diffusion samplers displayed left-right in order of appearance in the option menu.

Figure shows the first generated result using the same seed for each of the 19 available diffusion samplers. All show a dog, and the style and appearance of the dogs differs. Some images of the dogs are much more similar and only show small differences, such as with the ears or face shape.
Figure 3. The first generated result for each diffusion sampler, using the same image seed*. *Seed: 3463148386, Steps: 20, CFG scale: 15, Size: 512 × 704. Diffusion samplers displayed left-right in order of appearance in the option menu.

Figure 4. An example of the type of metadata saved for each generated image.

Figure shows a generated image of a cartoon green frog set against a pink background. The corresponding metadata for this image is reported next to it, which is: frog Steps: 20, Sampler: Euler a, CFG scale: 15, Seed: 3812278636, Size: 512 × 704, Model hash: fe4efff1e1.
Figure 4. An example of the type of metadata saved for each generated image.

Figure 5. The individual living with dementia.

Figure shows 11 images of an individual living with dementia. These are close ups, showing the heads and shoulders of the people pictured. Many of the individuals look away or have their eyes closed. Everyone has light skin and shows signs of older age, such as grey or white hair, and wrinkles. Black and white or blues and browns are the main colour palettes used.
Figure 5. The individual living with dementia.

Figure 6. Variants of the “head clutcher” image for individuals living with dementia.

Figure shows three images, each of an older person with their hands or fingers touching their head, with a pained facial expression.
Figure 6. Variants of the “head clutcher” image for individuals living with dementia.

Figure 7. The brain of people with dementia.

Figure shows seven images that each make the brain visible, either as an organ or through representing its neurones. Some of them are accompanied by words, most of which bear a resemblance to “dementia”, for example, “demnaiiaa”.
Figure 7. The brain of people with dementia.

Figure 8. Cellular level visualisations of the brain of someone living with dementia.

Figure shows three images that each appear to show cells in the body. Two of these images closely resemble images of neurones in the brain.
Figure 8. Cellular level visualisations of the brain of someone living with dementia.

Figure 9. Metaphorical visualisations of the brain of someone living with dementia.

Figure shows three images, each of which visually blend the outline of a human head with a tree that has bare branches.
Figure 9. Metaphorical visualisations of the brain of someone living with dementia.

Figure 10. Images with multiple people in different visual styles.

Figure shows eight images, each featuring multiple people. Four of the images show multiple faces or people alongside each other but without any clear sense of engagement between them. Four images show people looking at each other, gathered around an object, or touching. One of these interactional images uses silhouettes of faces with their neurones foregrounded. The visual styles of these images vary greatly, and include pencil sketches, comics and film poster styles.
Figure 10. Images with multiple people in different visual styles.

Figure 11. Interactions between people with (and without) dementia.

Figure shows eight images of interactions between people. These images share many of the stock image conventions, including non-specific backgrounds. Three images show another person's hand touching the forehead and/or eyes of an individual, whose facial expression conveys distress, confusion or suffering. The other five images show two or more people in the frame. These individuals tend to interact in some way, and three images appear to be of heterosexual couples who hold hands or are otherwise close. One image shows a person who appears to be unresponsive to the two people around her. One image shows six people walking past one another, no one making eye contact.
Figure 11. Interactions between people with (and without) dementia.

Figure 12. Three visual anomalies in the data.

Figure shows three images that diverge in some way from the broader patterns discussed in relation to the previous images. One image shows a man smiling and gazing directly ahead, imitating eye contact with viewers, with a bright blue and green colour scheme. One image is formatted like a comic or graphic novel and shows multiple stages of a conversation between people who are smartly dressed and socially engaged, demonstrating a range of facial expressions throughout their exchange. The other image uses vibrant, rainbow-like colours, which swirl around the face of a man who is visually coded as younger through full, dark facial hair and a lack of wrinkles.
Figure 12. Three visual anomalies in the data.