1,981
Views
1
CrossRef citations to date
0
Altmetric
Scientific and Technical

Representation in medical illustration: the impact of model bias in a dermatology pilot study

ORCID Icon, ORCID Icon, ORCID Icon &
Pages 253-262 | Received 25 Feb 2022, Accepted 30 May 2022, Published online: 01 Aug 2022

Abstract

As greater attention is paid to representation and the ‘decolonizing’ of education and media, the field of medical illustration must stay current. Multiple previous studies have concluded that the majority of medical textbooks depict primarily ‘default’ young, white men. Many have expressed that this lack of representation resulted in feelings of alienation; others posited it is a contributing factor for the disparity of care for marginalised groups. This research took arguably the most identifiable feature, skin colour, to explore this disparity – the variation of dermatological symptom expression on melanin-dense skin for four conditions. To evaluate the impact of having a diverse range of models, a study was devised to demonstrate identification rates of melanin-dependent dermatological symptoms in a quantifiable, though non-statistically significant manner. Participants were split into two groups and asked to review four different skin conditions (Group-A receiving illustrations of homogeneous pale skin tones, and Group-B receiving illustrations depicting diverse skin tones) before identifying clinical photographs. While the group with a diverse reference pool performed marginally better overall, they performed better identifying specific conditions in which melanin levels impact the appearance of the condition. This pilot study serves as a strong base for a more developed future study.

Introduction

Disparities in representation exist in virtually every capacity, from popular media to professional positions, to the quality of care in medicine. As no exception, the field of medical illustration plays host to some inequities, particularly in the choice of models used to convey information to professionals, students and the public. Be it race, gender, ability or any other intersectional facet of identity, a large portion of the population will not be depicted when using the current standard. The ‘default’ model in the majority of media shows a young, fit, white, man as the standard, and medical illustration is no different. A host of recent studies have found that a majority of medical textbooks show a distinct lack of intersectional representation in identifiable models (Bellicoso et al., Citation2021; Louie & Wilkes, Citation2018; Parker, Larkin, & Cockburn, Citation2017).

While it is clear that appropriate representation will not be the sole solution for medical inequity, it is valuable to highlight where individuals and companies can make an actionable change. This serves as the motivation for trying to find quantifiable data to demonstrate the impact that a bias reference pool can have. It is important to note that this pursuit of evidence in no way seeks to speak over the lived experience of those that express feelings of alienation or inequity, but rather to bolster the impassioned personal arguments that are made for action.

Beyond the general desires for representation, such as feeling included and reflecting on a history of inequity, a lack of representation in medical diagrams could have very tangible and damaging impacts, including the under or misdiagnosis of time-sensitive conditions. This is particularly distinct in the case of dermatological examples. Not only is this one of the few disciplines of medicine where skin pigmentation has a direct impact on the conditions, but it is also one of the least representative disciplines, with approximately 3% of dermatologists in the USA being people of colour (Pandya, Alexis, Berger, & Wintroub, Citation2016). Even in the time-honoured pursuit of ‘googling’ conditions for more information, many have expressed difficulties in accessing reliable information for how a given condition expresses on a variety of skin colours. This has given rise to a number of community lead projects that try to compile and disseminate information from private submissions (Mukwende, Tamony, & Turner, Citation2020). However, without the suitable standards of clinical photography, this has limited practical applications beyond personal self-identification.

The limitations of self-identification and relative lack of available information reflects a statistical trend in which people of colour experience a lower incidence but higher mortality rate of various dermatological conditions like skin cancer and Rocky Mountain Spotted Fever (Gupta et al., Citation2016; Patel, Citation2021). It is also common for people of colour, despite lower incidence rates, to be treated for psoriasis and atopic dermatitis at a much later and in a more severe state when compared to general population statistics (Kurd & Gelfand Citation2009; Shaw et al., Citation2011). While there are a number of factors that contribute to these statistics, such as socio-economic disparity and access to healthcare, creating inclusive resources has the potential to mitigate harm at the educational level (Monk, Citation2015).

For many, the identification of a given skin condition borders on axiomatic, as is the case with Lyme disease. The distinctive ‘bullseye’ rash of erythema migrans is a well-known warning sign in regions where this condition is prevalent (Bhate & Schwartz Citation2011). However, it is relatively uncommon to find examples of how darker skin reacts to this tick-borne infection, as the melanin of the skin obscures the telltale redness. Not having an awareness of the concurrent symptoms can lead to people being unsure of the cause and thus delaying an accurate diagnosis. As the condition progresses a few days can lead to the development of serious health complications like Bell’s palsy, arthritis and chronic nerve pain. When viewing case statistics, people of colour are again shown to have a lower rate of incidence, but a greater severity by the time they receive treatment (Dennison et al., Citation2019).

From these statistical trends, it follows that a greater prevalence of inclusive healthcare resources might mitigate the harm caused by inaction through unawareness.

Therefore, the hypothesis of this pilot study was:

Participants studying using a diverse skin colour resource will be more adept at identifying a range of conditions on a greater spectrum of potential patients. Conversely: Participants studying using a homogenous resource will have greater difficulty identifying a range of conditions on a spectrum of potential patients.

The null hypothesis would therefore be:

Participants studying using a diverse resource will not be more adept at identifying a range of conditions on a greater spectrum of potential patients than participants studying using a homogenous resource.

Materials and methods

In order to create a study that would reflect the real-world conditions of trying to appraise a skin condition, and how disparate examples can influence that goal, participants were split into two groups. Group A received a collection of resources that only depicted pale skin serving as the general analogue to the current state of representation, while Group B received a separate collection of more diverse examples representing a hypothetical inclusive set of resources. When tasked with identifying a skin condition, the use of these different resources mimics the course of action a person would take to find information about their symptoms. This aims to show any discrepancies in the ‘information’ link of the participant’s decision chain. In the context of skin conditions variation which depends on the type and density of melamine cells, a person may have a clear mental image of what chickenpox looks like for example, but may not be able to visualise those symptoms on skin of a different melanin density than their mental image. This, in turn, can lead to misdiagnosis or an underreaction to the potential harm caused by the illness.

In order to present this concept in a tangible and quantifiable manner, a comparison of outlooks was evaluated. Additionally, a level of control over the quality of the relevant information needed to be exerted. This manifested as a series of questions asking participants to identify the condition present in a given clinical photograph. Participants studied the causes and symptoms of four skin conditions, with the intention to quickly identify a series of photographs of the various afflictions. The test groups consisted of Variable Group A; which used study materials illustrated in the general standard pale skin tone (HOM as in Homogeneous), and Variable Group B; which used illustrations showing a range of skin tones (HET, as in Heterogenous). After a period of study, the participants were shown the random selection photographs of patients, derived from reputable and accurate sourcesFootnote1.

The clinical photographs were sourced from a larger pool of examples from DermNetNZFootnote2 and various disparate studies looking at melanin-dependent symptoms of the given range of skin conditions, resulting in two distinct collections of clinical photos. For a list of all the image sources used in the pilot study, please see the image source appendix. This measure was taken in the interest of fairness and was made to reflect realistic population distributions. The conditions that the participants were asked to identify were chosen based on a number of criteria. These consist of: Prevalence (general and demographic specific), severity, variation between demographics, and ease of identification. Additionally, efforts were made to take examples from a variety of different types of conditions. With that in mind, the conditions on display were: Atopic dermatitis (eczema), erythema migrans (Lyme disease), herpes zoster (shingles) and psoriasis. Several others were considered but were excluded due to one or more complicating factors.

The resources were created using descriptions provided by DermNetNZ to ensure bias was not expressed in the quality of descriptions (see and ). The resources consisted of a central image depicting a detailed rendering of the condition in full effect, as well as smaller supplementary images and a textual description of the symptoms. The creation of the bespoke study resources for each condition came after extensive study and experimentation for the most efficient way to represent the natural variation of human skin tone and textures that are displayed by the various dermatological conditions. This process began by the author/illustrator appraising their own skill level and determining that the medium of opaque watercolour gouache was able to create the desired texture with relative ease compared to other mediums. The process of creating the study materials consisted of painting a large circle with gradations between four quadrants of skin tone (pale pink, yellow dominant brown, red dominant brown and blue dominant brown) and ‘applying’ the symptoms of the given condition, corresponding to which quadrant the affected skin is painted over. This process was replicated for an exclusively pale toned reference (as opposed to the four-quadrant model) in order to create the homogenous variable group. This process was repeated, and the resulting illustrations scanned into Adobe Photoshop CC, as well as a smaller more impressionistic view of the condition, to create a final infographic. These infographics were incorporated into the pilot study so the variables would be as consistent as possible between participants and variable groups.

Figure 1. Erythema migrans bespoke resources for the HET and HOM variable groups.

Figure 1. Erythema migrans bespoke resources for the HET and HOM variable groups.

Figure 2. Atopic dermatitis bespoke resources for the HET and HOM variable groups.

Figure 2. Atopic dermatitis bespoke resources for the HET and HOM variable groups.

This research was conducted with the full knowledge and approval of the University of Dundee ethics board. Additional aspects of the methodology such as the participant selection process, the objectives of the study and the statistical processing of the results are presented individually for clarity.

Selection and description of participants

The participants that engaged with this study were self-selecting and there were no factors of identity that precluded an individual from participating, other than a requirement of being a legal adult over 18. As the study was anonymised, no data regarding the participants sex, gender, age, etc., was recorded. The only factors that were collected from the participants were their level and subject of education. This served to separate the participants and analyse the results between those with relevant experience and those without. The first efforts to recruit participants took the form of a call to action posted on the authors professional social media and published in a university newsletter. Additionally, participants found the study through the direction of sympathetic groups (Brown Skin Matters, for instance).

Objectives of the study

The primary objective of this pilot study was to record and reflect on the responses of the participants in relation to the resources from which they studied, and to compare the two variable groups’ responses. Ideally, this would make it possible to identify any trends in the responses given. A secondary objective of this pilot study was to identify which questions and clinical photographs were not fit for purpose, and to determine what aspects of the resources needed to be improved for future iterations of the study.

Four separate questionnaires were created (two different clinical photograph reference pools for each variable group) in the online survey administrator JISC. Participants, once they decided to take part in the study, used a hyperlink, which led to an HTML random redirect algorithm written to randomly redirect the participant to one of the four studies without researcher oversight. This measure was taken to eliminate the push for a specific result in the selection of the participants and their background knowledge. In an effort to ensure double blinding, the researcher did not know which variable group of study resources the participant would use or know which selections of images were used until after the study was complete. This hopefully mitigated bias and prevented compelling specific responses or the intentional creation of a result that matches the hypothesis.

Data analysis

The individual responses were graded for correct responses, recorded, and then the results were averaged by their variable group to determine an average accuracy of each group, after studying the given resource. This resulted in two distinct data sets, which were used to determine significance.

Due to the small sample size, the non-parametric Mann–Whitney U test was used. As the study is concerned with only one aspect of aspect of change (whether the HET group were more accurate) a One-Tailed hypothesis was used, and the significance level was set at .05. IBM® SPSS® Statistics was used to determine the level of significance.

Results

After the pilot study was posted online, thirty participants took part ( and ). Their responses were collected into two tables, one for each version of the clinical photograph groups, and were further demarcated by which variable group they were in, either HOM or HET. It is important to note that a small sample size like that used in this study is less likely to generate statistically significant data. However, this has been useful in demonstrating what changes need to be made for a full-scale project, as well as determining any trends that are visible.

Table 1. Results of the pilot study (first set of clinical photographs), with responses, question order and level of education.

Table 2. Results of the pilot study (second set of clinical photographs), with responses, question order and level of education.

In initially appraising the results, there is no clearly defined trend that is readily visible. In a large-scale breakdown of the results, the HOM groups scored 54.2% while the HET groups scored 60.4% correct and with regard to melanin specific conditions, the HOM group scored 53% and the HET group scored 58.5%.

However, in viewing the slides that appraise the ability to identify the condition on melanistic skin, there is a greater contrast in results (see ). In (Slide 3) and 2 (Slide10) depicting erythema migrans on dark skin, all sixteen participants exposed to diverse resources (HET) were able to correctly identify the condition while only eleven of sixteen or 68.7% of participants exposed to exclusively pale resources (HOM) were able to do so. Additionally, in (Slide 7) and 2 (Slide 12), only five of fourteen, or 35.7% of HOM participants were able to correctly identify psoriasis with grey and purple plaques on dark skin, as opposed to ten of sixteen, or 62.5% HET participants which were able to correctly identify the melanin-dependent symptom.

Figure 3. Noteworthy examples of specific response trends and their associated clinical photographs.

Figure 3. Noteworthy examples of specific response trends and their associated clinical photographs.

Using more diverse reference material may also mitigate false positives, as is the case with (Slide 4) which depicts vitiligo, which is not one of the four ailments being assessed. While six of nine or 66.6% HOM participants assumed it was another condition entirely, every HET participant correctly recognised this as ‘None of the Above’. The inverse, in which a diverse reference does not hinder recognition on pale skin, is also demonstrated. There are no instances of a distinctly worse response rate from the HET group when viewing a pale skin tone clinical photograph, or of a distinctly better response rate from the HOM group when appraising a clinical photograph depicting a darker skin tone. Although in the case of (Slide 10), an example of Atopic Dermatitis, found every HOM participant responding incorrectly, while four of seven or 57% of HET participants responded correctly. Ultimately, certain conditions seem to have a more pronounced influence depending on the resources presented, and the specific factors behind that will be a course of future research.

When accounting for the difference in background, ten of the thirty participants had a background that was deemed a biological or medical specialty. The average specialist correct response rate was 66%, as opposed to non-specialists with a correct response rate of 49.5%. Specialist response rates were consistent with general distribution of correct answer answers when comparing different resources, HOM 64.5% versus HET 68%.

When preforming a Mann–Whitney U Test in IBM® SPSS® Statistics, the U Value was 78.5 (Statistic Significance is 71 in this case) and the p value is .085 above .05, and, therefore, not significant. This does not come as a surprise, given the sample size and the mitigating factors present in the study design.

Discussion

These results suggest a slight improvement in recognition from being exposed to the diverse learning resources. When focussed on key areas of distinction, as is the case with the previously highlighted results, there is data to suggest that the more diverse resources resulted in a marked improvement in recognition rates. The impact of representation being exhibited has not been widely recorded or quantified and can be a vital piece of supporting data in future explorations of the consequences of unequal representation. While the hypothesis for the study may appear as intuitive, it can be valuable to have a specific piece of data to appraise and to reference in future discussions. As of now, this is one of the only lines of research exploring the impact that model bias has outside of qualitative evaluations of personal experience. The promising results of improved identification rates of key examples encourage the pursuit of a full-scale study. Being that the study in question was a pilot to identify any potential limitations or oversights on the part of the researcher, it will inform and improve a full-scale study using this framework and methodology.

One potential limitation is in viewing the results as a whole, divorced from specific intention, as several questions do not show a relevant comparison of the materials. The slides which depict a symptom on pale skin are only testing the individual’s personal ability to identify conditions, and not the impact of inclusive resources. This is evident as certain aspects that were absent from the HOM resources proved to be a deciding factor in the identification of specific clinical photographs.

Being that the selection was between five options, this result would not be achievable by random selection. It may be influenced by the structure of the study, as it is a multiple choice which allows for some element of the process of elimination. While this result does support the hypothesis, the small sample size and relative unevenness of the quality of the clinical images used may contribute to this inconclusive result, and there are multiple courses of action that will be taken to hopefully clarify that support in the full-scale study. One point that is immediately notable, is that certain clinical photographs were simply not suited for their purpose, as participants uniformly could not recognise them. This is the case with (Slide 9) and (Slide 3). Both instances, Psoriasis and erythema migrans, respectively, show visually confusing examples with unclear expressions of their symptoms, in a way that is not melanin dependent. Therefore, these slides are noted outliers. When the image quality is poor and the specific subject is unclear, it complicates the evaluation process. If participants, regardless of variable group, are unable to ascertain the correct answer, that may be seen to undermine the hypothesis. In the other regard, (Slide 2) showing textbook Atopic Dermatitis on pale skin was recognised by every participant in both groups; this may again be seen to undermine the hypothesis. However, this shows that a more diverse range of study materials does not need to interfere with general education. These outliers illustrate the issue that the variable quality of different clinical images, relying on different sources from different periods, leaves an uneven difficulty level for participants and muddies the results further.

When analysing the impact of a specialist background, participants with a biology or medical education scored substantially better, by 16.5% in total. This is understandable, as general concepts of skin conditions will likely have been taught at some point in their academic history. Additionally, the ability to quickly read and retain information from a diagram is skill that is honed through the higher education structure. Making up one third of the responses overall, there was an even distribution of specialist in group 2 (22% in both HET and HOM Groups) effectively equalising their contribution. However, in group HOM1, 60% of the participants were specialists, which may contribute to a less deceive result. With a response group this small, it is difficult to make evaluations of trends, but it is preferable for these issues to become apparent before releasing the full-scale study. Additionally, this full-scale study will be distributed directly to dermatological professionals specifically, through professional bodies like The Skin of Colour Society and LivDerm. These professional responses will serve as a useful control when comparing the responses of people with a bio-medical background and the general population. Ultimately, the limited number of responses (while not statistically significant) is acceptable, as the full-scale study will be distributed to a much larger population and the lessons learned through making this pilot will result in a more cohesive final product.

Given the outcome of the pilot study, there are a number of revisions that can be implemented in order to create a more cohesive and effective study in the future. Among these, a more discerning appraisal of the photographs used will hopefully reduce the likelihood of an image being too difficult for any participant to correctly identify. Previous examples of poor-quality images created outliers which interfere with the ability to properly appraise the results. Broadly, this inconsistent quality of images is a primary factor that complicates both the academic study of various conditions, as well as the practical use of self-identification pre-medical examination. Additionally, the nature of a clinical photograph means it is a case study of a specific individual at a specific point in their condition, which can result in a non-representative sample. These aspects serve to highlight some of the weaknesses of photography in medical education and stress the importance of the general population having access to high-quality medical images. However, the consistent quality between images and the ability to create a wider representative example serves to demonstrate the strengths of illustrated resources.

In the revisions to the resource that was created for the study, steps will be taken to give a more comprehensive understanding of the condition being presented. In addition to showing the condition in various stages of severity, more intuitive identification methods will be explored. This can take the form of visual metaphors to describe the texture of a given condition, like drawing comparisons to the texture and size of common objects, like fruit. The illustration may also take the form of an animated gif, showing the stages of condition as it progresses. This can maximise the information being presented in a single image. Supplementary elements like these are a unique aspect of illustration, as opposed to pure photography, as the space for creative understanding can provide a point of connection for a user that is not receptive to exacting clinical detail.

The scaling of the full study will include an updating of the supplementary artwork presented to be more cohesive and directly helpful to identification. Also, a fifth condition, Melanoma, will be added to the list of conditions that a participant must identify. Given the lack of resources, the high mortality rate, and the misinformation about how uncommon skin cancer is for people of colour, it is an aspect that would benefit from greater exploration. Not only is this study about garnering data regarding model bias, but also serves as self-reflection and practice for the researcher and artist in the context of being a professional medical illustrator.

It is important to note, that regardless of the outcome of the study data itself, the impact of representation in medical art reaches beyond the purely quantifiable and impacts the lives of those that interact with it in less tangible ways. While there is the desire to see quantifiable data to demonstrate the impact of disparity, the emotional and societal influences that led to this inequity are still present. The model bias present is not necessarily the root cause, but rather a symptom of an unequal history.

Upon reflection of this pilot study, the primary goal of recording and analysing two variable groups in an effort to look for trends has been successful. Though the pilot study had limitations which prevented a statistically significant result, several examples that support the hypothesis were identified. In looking at specific cases within the responses, there is evidence to suggest a positive impact on identification rates with inclusive resources. The secondary goal of determining what aspects need to be revised for a more effective final study has made a number of revisions apparent. An updated version of the study materials at the start of the questionnaire will be made in order to better reflect the conditions being depicted. An effort will be made to make use of the practical identifiers participants used to identify the various dermatological conditions. This full-scale study will also be distributed to a larger population, in an attempt to alleviate the issue of the data being non-statistically significant. Additionally, student and professional dermatologists specifically will be sought out, to serve as a sort of professional control to evaluate the quality of the bespoke resources and the clinical photo quality.

This update should take place over the next year, in order to allow enough time for a new set of responses to be gathered and analysed. Future studies will include the space for participants to respond regarding how representation has affected them personally, in order to gain some insight into the general consensus. Ideally, this full-scale study will generate a data set that will serve to support the calls for action to update the curriculum of medical programmes and create inclusive resources for professions and the public alike.

Disclaimer

The views expressed in the submitted article are the author’s own and not an official position of the institution or funder.

Acknowledgements

The author acknowledge their supervisors, Dr. Alan Prescott, Dr. Caroline Erolin, and Dr. Mick Peter.

Disclosure statement

The author reports there are no competing interests to declare.

Additional information

Funding

This work was supported by the US Federal Student Aid Direct Graduate Plus Loan.

Notes

1 Even using a primary source for clinical dermatological photography, a number of darker skin tone examples had to be retrieved from disparate studies and sources, in order to make up the difference. This aspect alone serves to highlight the issues facing a large portion of the population.

2 DermNetNZ is a New Zealand based database of dermatological conditions which are free to use in academic endeavours.

References

  • Bellicoso, E., & Quick, S., & Ayoo, K., & Beach, R., & Joseph, M., & Dahlke, E. (2021). Diversity in dermatology? An assessment of undergraduate medical education. Journal of Cutaneous Medicine and Surgery, 25, 409–417. doi:10.1177/12034754211007430
  • Bhate, C., & Schwartz, R. A. (2011). Lyme disease: Part I. Advances and perspectives. Journal of the American Academy of Dermatology, 64, 619–636. doi:10.1016/j.jaad.2010.03.046
  • Dennison, R., Novak, C., Rebman, A., Venkatesan, A., & Aucott, J. (2019). Lyme disease with erythema migrans and seventh nerve palsy in an African-American man. Cureus, 11, e6509. doi:10.7759/cureus.6509
  • Gupta, A.K., Bharadwaj, M., & Mehrotra, R. (2016). Skin cancer concerns in people of color: Risk factors and prevention. Asian Pacific Journal of Cancer Prevention, 17, 5257–5264. doi:10.22034/APJCP.2016.17.12.5257
  • Kurd, S.K., & Gelfand, J.M. (2009). The prevalence of previously diagnosed and undiagnosed psoriasis in US adults: Results from NHANES 2003–2004. Journal of the American Academy of Dermatology, 60, 218–224. doi:10.1016/j.jaad.2008.09.022
  • Louie, P., & Wilkes, R. (2018). Representations of race and skin tone in medical textbook imagery. Social Science & Medicine, 202, 38–42. doi:10.1016/j.socscimed.2018.02.023
  • Monk, E. (2015). The cost of color: Skin color, discrimination, and health among African-Americans. American Journal of Sociology, 121, 396–444. doi:10.1086/682162
  • Mukwende, M., Tamony, P., & Turner, M. (2020): Mind the gap: A handbook of clinical signs in black and brown skin. London: St George's, University of London.
  • Pandya, A.G., Alexis, A.F., Berger, T.G., & Wintroub, B.U. (2016). Increasing racial and ethnic diversity in dermatology: A call to action. Journal of the American Academy of Dermatology, 74, 584–587. doi:10.1016/j.jaad.2015.10.044
  • Parker, R., Larkin, T., & Cockburn, J. (2017). A visual analysis of gender bias in contemporary anatomy textbooks. Social Science & Medicine, 180, 106–113. doi:10.1016/j.socscimed.2017.03.032
  • Patel, S. (2021). Rocky Mountain spotted fever (RMSF). Infectious diseases. Retrieved from https://emedicine.medscape.com/article/228042-overview#showall
  • Shaw, T.E., Currie, G.P., Koudelka, C.W., & Simpson, E.L. (2011). Eczema prevalence in the United States: Data from the 2003 National Survey of Children’s Health. The Journal of Investigative Dermatology, 131, 67–73. doi:10.1038/jid.2010.251

Appendix

Additional references for clinical photographs used in pilot study.