Full article: Deep learning model to improve melanoma detection in people of color

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Melanoma is a type of skin cancer that is particularly dangerous to people with dark skin. This is due to the disease's late diagnosis and detection in people with dark skin. When melanoma is detected, the prognosis is often poor. Advancement in Artificial Intelligence (AI) technology and image classification has brought about tremendous progress and applications in medicine and diagnosis of skin cancer. Albeit, these techniques continue to produce unsatisfactory results when applied to people with dark skin. This work considered the trend in cancer detection using AI techniques. Dark skin melanoma clinical images were acquired and pre-processed to remove illumination and noise to aid the other stages such as segmentation and data augmentation. The acquired images were combined with a curated and augmented version of the Human Against Machines 10000 image (HAM10000) dataset and split into two classes: melanoma and non-melanoma. A pre-trained DenseNet121 was used as a base model for training and in addition, a transfer learning process was performed to exclude the top layer and fine-tune all the layers in the final Dense Block of the model pre-trained on ImageNet. The model achieved an accuracy of 99% for melanoma detection in white skin color and 98% for dark skin. The results show that the proposed model is effective in detecting melanoma in skin.

Keywords:

1. Introduction

Skin cancer is less common in people with darker skin, but when it does occur, the morbidity and mortality rates are higher than in Caucasians (Abdulhak & Moiin, Citation2020). This is due to the lack of diagnosis and treatment in the earlier stages of its development. It is usually discovered in the late stages of development in people of color (Gloster & Neal, Citation2006; Ramnani, Citation2020). Authors in (Gloster & Neal, Citation2006) recommended that clinicians focus on the preventive measures and examinations that reduce the significant threat posed by these cancers.

Basal cell carcinoma (BCC), Kaposi sarcoma, squamous cell carcinoma (SCC), and malignant melanoma (MM) are some of the cancer types that are most common in people with dark skin tone, especially in black people and people of African descent (Nthumba, Cavadas, & Landin, Citation2011). Also, they are some of the most prevalent skin problems in sub‐Saharan Africa (Gohara, Citation2015; Nthumba et al., Citation2011).

This work is primarily concerned with melanoma in black skin. Melanoma has the potential to spread at an alarming rate. This makes it a very disturbing form of cancer (Khan et al., Citation2019). Melanoma can occur anywhere on the body but it most commonly develops on the soles of the feet.

Melanoma incidence has increased steadily over the past 30 years amongst Africans, Asian Indians, Chinese Asians, and African Americans. The melanoma case analysis carried out in South Africa on dark-skinned people revealed that a majority of the cases were acral lentiginous melanoma (ALM), with the foot and hand being the most frequent location of occurrence (de Giorgi et al., Citation2006; Mikhail, Citation2020; Swerdlow, Citation1990). Although ALM accounts for less than 5% of all melanomas, making it extremely rare among white people, it is disproportionately represented in black people (de Giorgi et al., Citation2006; Mahendraraj et al., Citation2017). The diagnosis of melanoma is usually delayed, and hence the prognosis is worse than in people with lighter skin.

Artificial intelligence (AI) is a field in computer science in which machines are designed to imitate human intelligence and thinking (Jeffcock, Citation2018). Deep Learning (DL) is a technique in artificial intelligence that involves mimicking how the brain works to make machines process data, interpret patterns, and make decisions from perceived patterns. Usually, in deep learning, training is nonlinear and unsupervised, i.e. the data used for training a DL model is unlabelled.

In the past, DL models have generated huge successes in skin cancer diagnosis for people with lighter skin color. This is a result of the huge amount of computer vision studies in the area of dermatology.

Models have been known to outperform dermatology experts and professionals in the area of skin cancer classification which used datasets or controlled settings (Codella et al., Citation2017; Esteva et al., Citation2017; Haenssle et al., Citation2018).

Concerns have been raised over the lack of representation and data availability for clinical research, and in the field of dermatology with regards to darker skin (Nelson, Citation2020). In (Wong, Lin, Hocker, & Burgin, Citation2019), there were discussions about the potential for artificial intelligence to wrongly detect or classify skin cancer types in darker skin tone.

As with other forms of cancer, early detection of melanoma is essential to fast recovery. According to literature, the rate of death from melanoma can be reduced by 90% if it is detected at its initial stages. Therefore, it is of importance to carry out research to improve the artificial intelligence classification and detection of melanoma cancer in skin of color.

For this study, it is noteworthy to state that there are few differences between the dermatoscopic melanoma characteristics in the skin of white and dark people (Wong et al., Citation2019). Therefore, we included dark skin images of melanoma cancer with white skin images while training our model.

Advancement in artificial intelligence has equipped dermatologists and scientists with the tools for the early detection of cancer. Albeit, this has not translated across all ethnicities or races. A lack of technological development to address the threat it poses to the darker-skinned population could jeopardize AI's achievements in the fight against cancer.

There is a need to proffer techniques that will address the inadequacies of these technologies for the detection of skin cancer. For example, a network that would detect melanoma in white skin should have the same accuracy and correctness when detecting the same stage of cancer in a darker-skinned patient. People of color (POC) are less likely to develop skin cancer, but they are much more likely to die from it due to delay in detection or prevention. In POC, skin cancer is frequently discovered at a more advanced stage, making treatment challenging (Gupta et al., Citation2016).

The results of medical AI technologies should reflect robustness across ethnicity in order to be considered complete. Early detection of cancer improves the likelihood of recovery from the disease. In this work, the proposed model improved the prognosis of melanoma among patients with darker skin tones using medical technology that does not discriminate based on ethnicity.

The remaining parts of this paper are organized as follows. Section 2 presents a review of related works. Section 3 describes the methodology used in the development of the proposed DenseNet121 model, while Section 4 presents the results and discussion. Finally the paper is concluded in Section 5.

2. Related works

In (Brinker et al., Citation2019), a ResNet50 CNN model was trained with clinical images. The authors aimed to classify melanoma and Atypical nevi. Their work involved training the model with 402 melanomas and 402 nevi, as the previous sample size of 20 melanoma and 100 nevi initially used was too small. Their model showed remarkable results when compared with the classification performed by 157 board-certified dermatologists. Tests carried out in (Brinker et al., Citation2019) with the trained model shows that it has the same accuracy as 145 dermatologists, but with an improved sensitivity. In (Haenssle et al., Citation2018, Citation2020), an Inception V4 model was trained on a dataset of more than a hundred thousand melanoma and benign skin lesions. The results of the DL model, when compared to dermatologists, show that the model achieved a comparably high mean specificity without large amounts of information available.

The ResNet-152 deep learning technique was used for the classification of twelve skin malignancies from clinical images belonging to the Asan dataset (Han et al., Citation2018). Their model performs superbly for the classification of BCC, a common cancer type in darker skin. Their training dataset covers skin of people with Asian descent only. A similar approach was adopted in (Fujisawa et al., Citation2019), whose deep learning method was used to classify 140 clinical images. Authors in (Tschandl et al., Citation2019) trained the ResNet50 and the InceptionV3 DL architectures on a dataset consisting of 13,724 images. Their models performance resulted in an impressive diagnosis of cancer types of skin that wasn’t pigmented. Authors in Codella et al. (Citation2017) trained an ensemble network for the recognition of melanoma using the ISIC-2016 dataset.

Authors in (Goyal et al. Citation2020) investigated the challenges found in transferability of learned features across races, noting that authors in (Han et al., Citation2018) tested their model, trained on a predominantly Asian skin race, and received poorer results on white skin. Authors in (Sagar & Dheeba, Citation2020) utilized the concept of transfer learning in the classification of melanoma cancer. Their research shows that retraining a pre-trained model is not only more convenient and time-efficient, but it also results in a better performance. In their experiment, various top-of-the-shelf ConvNets were compared to establish the best for melanoma cancer detection. In (Adegun & Viriri, Citation2021), it was determined that ensemble learning and training ensemble models with the best performing DL models like SENet, Xception, InceptionV3 provides excellent results in classifying skin lesions and detecting cancer. However, the technique is computationally demanding. The research performed in (Akkoca-Gazioğlu & Kamasak, Citation2020) shows that of the four models, ResNet50, DenseNet121, VGG16, and AlexNet, DenseNet121 gives the best accuracy for melanoma classification, while ResNet50 performed relatively closer to DenseNet121. This information was useful in selecting a model for our research.

Most of the previous works used deep learning techniques to detect and classify cancer, and achieved high performances. Albeit, only authors in (Han et al., Citation2018) considered training their model to detect cancer in other skin types asides white skin. None of these techniques considered skin of color during training and evaluation. The method described in (Jojoa Acosta, et al., Citation2021) consists of two stages, the first which automatically crops the region of interest within a dermatoscopic image using the Mask and Region-based Convolutional Neural Network technique, and the second which is based on a ResNet152 structure and classifies skin lesions as either "benign" or "malignant." In (Dildar et al., Citation2021), the authors presented a thorough analysis of deep learning methods for early detection of skin cancer. In order to enhance medical professionals’ visual perception and diagnostic abilities to distinguish benign from malignant lesions, the authors in (Thapar, Rakhra, Cazzato, & Hossain, Citation2022) presented a trustworthy method for diagnosing skin cancer using dermoscopy images. Skin lesion region of interest (RoI) segmentation from dermoscopy images was achieved using swarm intelligence algorithms, and feature extraction from the RoI. The best segmentation result was achieved using the Grasshopper Optimization Algorithm.

The work of authors in (Brinker et al., Citation2019) is related to ours in the sense that we trained our model with most of the images obtained from the HAM10000 dataset. In this paper, knowledge obtained from the concepts outlined in (Sagar & Dheeba, Citation2020) and (Akkoca-Gazioğlu & Kamasak, Citation2020) were harnessed to develop a model with high accuracy in detecting and classifying melanoma in people of color. The developed DenseNet121 model was trained and a transfer learning process was used to fine-tune all the layers in the final Dense Block of the model that was already trained on ImageNet while excluding the top layer. In order to close the current gap in the underrepresentation of darker skin tones, this work is the first step toward increasing skin tone diversity in existing image databases. In (Rezket al., Citation2022a) and (Rezket al., Citation2022b), the authors worked on dark skin color and white skin color and proposed a deep learning method in their paper. The difference of the proposed work with (Rezk et al., Citation2022a) and (Rezk et al., Citation2022b) which are relatively similar are: In (Rezk et al., Citation2022a), the authors used two deep learning methods, style transfer (ST) and deep blending (DB) and used metrics such as accuracy and area under the curve (AUC), while we have used a pretrained DenseNet121 and used metrics such as Accuracy, Precision, Specificity, Recall, and F1 score. In (Rezk et al., Citation2022b), the authors presented a segmented model and also used metrics such accuracy, area under the curve (AUC), Precision, and Specificity for evaluation of their model. The proposed pretrained Dense Convolutional Network 121 (DenseNet121) offers a number of compelling benefits, including the elimination of the vanishing-gradient issue, improved feature propagation, promotes feature reuse, and significantly fewer parameters. DenseNets are used to keep increasing the depth of deep convolutional networks but CNNs have issues when they delve deeper. In (Rezk et al., Citation2022a, Citation2022b), DenseNet was not used, while in our study DenseNet121 was used and the proposed DenseNet121 model’s accuracy was increased by using the concept of early stopping.

3. Methodology

In this section, we describe how we generated our model, and showed the procedure performed in the three stages involved in our work. shows the block diagram of the entire setup involved in our work. Details of the three stages are explained in Sections 3.1 to 3.4.

Figure 1. Block diagram of proposed system.

Firstly, we acquired data to create our dataset. This was aimed at generating a unique dataset of dark skin melanoma cancer clinical cases. It included data from different online medical sources which is discussed in section 3.1.

Next, we used pre-processing techniques to reduce noise and unnecessary detail in the acquired clinical images. The dataset acquired from Kaggle, an online data resource platform, consisted of already curated and augmented dermatoscopic images. Hence, we didnt use this data in the pre-processing stage.

After the basic pre-processing processes were performed, we then performed data augmentation on the generated data. Data Augmentation is a technique in deep learning whereby copies of original images are slightly modified or new data is synthesized from the original. This creates variations of the original images which increases the dataset and also helps DL models to generalize better.

Finally, we combined the images from both datasets and fed them into our pre-trained convolutional neural network. We applied the concepts of transfer learning in this stage to fine-tune our network.

3.1. Data acquisition

As a result of the challenges associated with melanoma in darker skin which were discussed earlier, a dataset containing melanoma in dark skin types is extremely difficult to come by. Public image datasets available for cancer research such as the acquired dataset from Kaggle, are primarily filled with white skin dermoscopic images. So, we needed to manually acquire a more diverse dataset to avoid the limitations highlighted in (Goyal et al., Citation2020). Consequently, we began our task by performing a data collection to acquire certified melanoma cancer clinical images of dark skin cases.

The data used for this study was obtained from journal publications such as (Gloster & Neal, Citation2006; Gohara, Citation2015; Tschandl et al., Citation2018), medical documents (Dildar et al., Citation2021; Rezk et al., Citation2022a; Thapar et al., Citation2022), a newspaper publication (Rezk et al., Citation2022b), and an online medical image gallery (Cancer A-Z, Citation2022). About 100 clinical images were acquired from various certified health sources. These consist of 37 melanoma cancer images and 63 images of other common skin lesions (e.g. basal, and squamous cell carcinoma, cutaneous T-cell lymphoma, etc.) associated with darker-skinned people.

3.1.1. Ham10000

Some of the data used consisted of curated and normalized images from the HAM10k dermoscopic image dataset (Tschandl et al., Citation2018). Half of the diagnoses were validated by histopathology and the other half was based on a consensus by dermatologists. Instead of classifying seven different skin lesions, the dataset was split into two classes, melanoma and non-melanoma.

The groups contained 1,113 melanoma images, and 8902 images not affected by melanoma.

Data augmentation techniques were performed on the melanoma group to mitigate the effects of highly imbalanced data. For augmentation, Brightness range, rotation range, zoom range, width shift range, horizontal flip, vertical flip, height shift range operations were used. The result was a balanced dataset of 17,800 images consisting of 8,903 melanoma images and 8,902 non-melanoma images.

The data collected needed to be pre-processed to achieve an optimal performance. Clinical images in some cases are taken from low-quality cameras, different resolutions, poor and/or varying illumination, etc. For this reason, it was beneficial to perform some processes to segment lesion edges, remove unnecessary artifacts, remove noise, etc., from our images. The various pre-processing steps performed are explained in Section 3.2.

3.2. Data pre-processing

To ensure that our model receives correct information for the training stage, we performed the following stages of pre-processing such as cropping, denoising, illumination correction, segmentation, and data augumentation. These stages are discussed in Sections 3.2.1 to 3.2.5.

3.2.1. Cropping

In this stage, we eliminated unnecessary artifacts such as tapes, clothing items, hospital materials from our acquired images. For our task, the information we require in the images is the skin lesion region and the skin surrounding the lesion. Hence, we ensured that all the irrelevant information in the images were removed without affecting the important parts.

3.2.2. Denoising

Gaussian blur is an image processing application that results in the blurring of an image to reduce noise. We took the cropped images and performed an image smoothing process, where we blurred the images using a Gaussian filter (sigma = 2).

3.2.3. Illumination correction

Illumination correction was performed to reduce the effect of lighting sources on the images acquired. Clinical images are taken in different lighting conditions, and with different camera types. As a result, some of these images have light reflecting off their surfaces, and some areas of the images are exposed to more light than other areas. Therefore, it was necessary to correct the illumination, contrast, and sharpness of the images. To extract these features, the Hue, Saturation, and Value (HSV) percentages of the images was determined. Afterwards, the images were converted to HSV color space and sharp SV channel changes were detected similar to the procedure in (Giotis et al., Citation2015). We performed edge correction to ensure that the lesion edges are not lost during pre-processing.

3.2.4. Segmentation

Adaptive thresholding was used to segment the images, especially to detect and remove objects on the skin. Thereafter, the images were then converted back to the RGB color space and the output was superimposed bitwise with the original image to provide the CNN with images which have reduced noise and better-segmented lesion areas. With this, our network could better discriminate between normal skin areas and cancer-affected areas in the images.

3.2.5. Augmentation

Data augmentation was performed on the pre-processed images to generate similar images to assit the network during training. At the end of this stage, we had a total of 569 pre-processed clinical images consisting of 282 melanoma and 287 non-melanoma.

Therefore, in total 18374 images consisting of 9185 melanoma (8903 light skin, 282 colored skin), and 9189 (8902 light skin, 287 colored skin) non-melanoma images were used. 11,081 random images from this set was used in training the proposed model, while 3,647 images and 3646 images was used for validating and testing the model respectively. The combined dataset used for training the proposed model is presented in .

Table 1. Combined dataset used for training the proposed model.

Download CSV Display Table

3.3. Proposed CNN architecture

In this section, present the proposed DenseNet, a convolutional network architecture that was improved upon and deployed in this work. Extraction of a useful and discriminative feature set is a challenging task (Huang et al., Citation2017; Salau, Citation2021; Salau & Jain, Citation2019). As a result of this, we used the strength of CNN for feature extraction and for automatic melanoma detection. CNN examines various structures of images and extracts the appropriate aspects of input images for classification. In this work, a pretrained DenseNet121 model was proposed for the classification task due to its superior performance in blurred and noisy image detection as compared to the RESNet50. Authors in (Huang et al., Citation2017) developed the Dense Convolutional Network (DenseNet). One of the major characteristics of this ConvNet is that it does not need as much feature maps as other networks because it does not relearn the feature maps. DenseNets are narrow and reuse features learned from previous layers of the network to maximize the potential of the trained network. A major benefit of DenseNet is that it allows for gradient flow and information passage which ensures the ease of training across the network. All layers have access to the loss function and initial input’s gradient, which results in an implicit deep control that helps the training of deeper network architectures. The proposed DenseNet121 architecture including the DenseLayer, Transition Blocks, and Dense Blocks is shown in .

Figure 2. Architecture of the proposed DenseNet121 containing Dense Block (Dx), Transition Block (Tx), Dense Layer (DLx).

The DenseNet consists of 4 Dense Blocks with a varying number of layers. DenseNet improves the flow of information between the layers, and the connectivity is such that each layer receives feature maps from all previous layers as input. Consequently, the $l^{t h}$ layer in the network receives input from all the previous layers, $x_{0}, \dots, x_{l - 1} :$ (1) $x_{l} = H_{l} ([x_{0}, x_{1}, \dots, x_{l - 1}])$ (1) where [ $x_{0}, x_{1}, \dots, x_{l - 1}$ ] is the channel-wise concatenating operation of feature maps in previous layers, $0, \dots, l - 1 .$ The initial $H_{l} (.)$ convolution, which combines the multiple inputs in EquationEquation (1)(1) $x_{l} = H_{l} ([x_{0}, x_{1}, \dots, x_{l - 1}])$ (1) into a single tensor, is not viable since feature maps are constantly increasing. It was replaced by a combined operation that involves a trio of batch normalization (BN), Rectified Linear Unit (ReLu), and a 3×3 convolution (Conv). Down-sampling was performed by splitting the network into Dense Blocks separated by an intermediary layer where convolution and average pooling were performed. This intermediary is known as the transition layers.

3.3.1. Growth rate

The growth rate k is a hyperparameter found in the DenseNet. It represents the number of feature maps generated by the operation $H_{l} .$ This means that, for the $l^{t h}$ in a network, we would expect the input to that layer to be $k_{0} + k x (l - 1)$ number of feature maps. This requires very heavy computation. DenseNets have very narrow layers as compared to other network architecture.

3.3.2. Densenet-B

A 1×1 bottleneck layer was introduced within the transition layer (before the 3×3 Conv) so that the input of the feature maps to the next layer is reduced to $4 k .$ The new variant network with a bottleneck in the transition layer is referred to as a DenseNet-B. The function [BN-ReLU-Conv (1×1) − BN- ReLu-Conv (3×3)] replaces the composite function $H_{l}$ in a DenseNet-B.

3.3.3. Densenet-C

For DenseNet-C model compactness, the feature map input number is reduced in the transition layer. A transition layer, directly following a dense block containing a map of m number of features, generates $[θ_{m}]$ output feature maps, where $0 < θ \leq 1,$ whereas authors in (Huang et al., Citation2017; Saraiya, Citation2022) set $θ at 0.5$ in their experiment.

3.3.4. Densenet-BC

When both bottlenecking and compression are combined, the model is referred to as a DenseNet-BC. For the ImageNet, a variant called DenseNet-BC was used. The proposed model is built from the DenseNet121 pre-trained on ImageNet. So essentially, our implementation consists mainly of a DenseNet-BC architecture with the classification layer removed and replaced with an additional custom layer. The proposed architectures parameters are presented in .

Table 2. The architecture of the proposed DenseNet model used for ImageNet, all with a fixed growth rate of 32. The “conv” shows the BN-ReLU-Conv sequence in the layer.

Display Table

3.4. Transfer learning

Transfer learning is a method of learning where the skills learned from training one model are applied to the training of another model. In widely used deep learning technique, models that have already been trained on one task are used to start training new models for tasks involving vision or language. Transfer learning in deep learning is done to enable people to use their knowledge of tasks learned for one domain to solve new problems more quickly or more effectively. The common deep learning strategies for transfer learning are:

The use of pre-trained models as feature extractors.
Fine-tuning the pre-trained models.

In the first strategy, the final layer of the pre-trained model, which is used for prediction, is removed and replaced with a new layer for the task. The weights of the pre-trained model are used as feature extractors, and the layers in the pre-trained model are frozen (i.e. they are not allowed to learn) during training.

In the second strategy, the final is also replaced with a new layer, and this layer is then trained by freezing some of the initial layers and leaving others to learn during the training.

The reason for this is that the lower layers within the pre-trained network learn the features which are not task-specific from the pre-trained model, so we don’t update their weights. On the other hand, higher layers in the network learn more task-specific features, so we unfreeze these higher layers to ensure they update their weights and learn during training. This is called fine-tuning the pre-trained model.

We performed transfer learning on the DenseNet121 model pre-trained on ImageNet, a dataset consisting of 14 million images. We freeze and load the pre-trained weights into our base model (DenseNet121) and connect a global average pooling, dropout, and dense layer to the output. Next, we ran our dataset through the model. We ensured that we avoided overfitting by monitoring the valuation losses during training. After training the new model, we fine-tuned the network of our model by unfreezing the final 113 (out of 426) layers of the DenseNet121 and training the entire model on the dataset. We trained the network with a learning rate of 1e-2. The final classifier in the network decides the class of input based on the feature maps on the network.

3.4.1. Training hyperparameters

The training hyperparameters are discussed in this section and presented in .

Table 3. Hyperparameters used in the proposed model.

Download CSV Display Table

Batch size: The batch size is defined as the number of samples worked through before updating the internal model parameters during training. We performed our training using a batch size of 20.
Epoch: This is the amount of work the training process undergoes before it covers the whole training set. It consists of one or more batches. For training the final layer, we set the number of epochs to 10. We set an epoch of 15 while finetuning the entire model. We achieved the best performance at epochs 8 and 12 respectively.
Learning rate: Learning rate, also referred to as step size, is the rate at which training weights are updated. Smaller rates mean the training weights are updated slower and the model requires more epochs. On the other hand, a higher learning rate means a faster model convergence but could cause suboptimal convergence.
Momentum: Momentum can be used to smoothen the progression of a learning algorithm, which, in turn can accelerate the learning process. Using high momentum during training helps the training process. To achieve similar results of high momentum, it is important to increase the epochs if momentum is not used. We used a momentum of 0.9 in our training stage.
Adaptive Optimisers: Adaptive optimizers are algorithms that update the learning rate while the model is training. This helps the model to generalize better and balance during training. Some examples are Adagrad, Adadelta, Adam, rmsprop, etc. We used the rmsprop for training.

3.5. Model evaluation parameters

The proposed model was evaluated using metrics such as accuracy, precision, recall, specificity, and f-measure.

Precision—This is the ratio of the actual positive in the data to the predicted positives by the model. It is a measure of how precise a model is. (2) $Precision = \frac{true positive (T P)}{true positive (T P) + false positive (F P)}$ (2)

Recall—This is the measure of how well the model can determine all the appropriate cases in the dataset. It is the number of true positives divided by the total actual positives. (3) $Recall = \frac{T P}{T P + f a l s e negative (F N)}$ (3)

F measure (F1-score) is a metric that is used to check the balance between precision and recall. It is a function of the other two metrics. (4) $f 1 score = 2 x \frac{precision x recall}{precision + recall}$ (4)

Specificity (Spec) represents the prediction of the Normal class (non-melanoma) as expressed in EquationEquation (5)(5) $Spec = \frac{T N}{T N + F P}$ (5) . (5) $Spec = \frac{T N}{T N + F P}$ (5)

4. Results and discussion

shows the result of the illumination correction step during pre-processing. It was observed that the melanoma edges are more defined as a result of this correction. This suggests that we would have a performance upgrade if we compared the model performance when using raw images. The pre-processed image of melanoma is shown on the right in as compared to our output on the right-hand side of the graph. Our model performed very well with an accuracy of 99%. We trained the new model up to an epoch of 9. From the graph in , we observe that our model was a perfect fit for the data. It shows that the model stabilized its training losses and validation losses just beyond an epoch of 8.

Figure 3. Adaptive threshold output for performing illumination correction.

Figure 4. Clinical image of melanoma before (left) and after (right) the preprocessing stage.

Figure 5. Training and testing losses during the development of the custom melanoma classification model.

4.1. Model evaluation

The proposed method was tested through simulation using the MATLAB Programming Language and toolboxes for image processing, neural networks, and optimization on an Intel Core i5 3.4 GHz processor.

The CNN models were evaluated using other metrics including accuracy. Evaluation parameters such as precision, recall, specificity, and f-measure as given by EquationEquations (1)–Equation(5)(5) $Spec = \frac{T N}{T N + F P}$ (5) were used to assess the models performance.

We evaluated our model using 3646 images of size 124 × 124 which were randomly split from the dataset before training the model. The proposed model achieved the same values for precision, F1-score, and recall of 0.92 for melanoma detection and specificity of 0.98 in white skin color, and achieved 0.96 for precision, 0.93 for F1-score, recall of 0.90, specificity of 0.99, and an accuracy of 98% for dark skin color. This means the number of false positives our model predicts in the test data and those of the false negatives it predicts is the same for white skin color. The proposed DenseNet model’s accuracy was increased by using the idea of early stopping. The results for the models performance are presented in .

Table 4. Models performance on white and black skin.

Download CSV Display Table

We used the International Skin Imaging Collaboration (ISIC) archive, where the majority of our initial data was acquired, to compare our model to other melanoma detection works, such as (Hasan et al., Citation2019; Hosny et al., Citation2019; Sarkar et al., Citation2019). The results show that our model performed similarly to the work presented in (Hosny et al., Citation2019) which also used a custom model based on fine-tuning the AlexNet. The model in (Hosny et al., Citation2019) was trained on white skin images only. Thus, the goal of creating a model that performs better or similarly to existing models while predicting melanoma in dark skin was accomplished.

Transfer learning is a method in computer vision that enables us to build accurate models faster. We leveraged on this to use the resources and features learned from the DenseNet121 model to train our model. This is more efficient than starting the learning process from scratch. Also, with a smaller dataset, we were able to achieve excellent results with transfer learning. It was observed that having a larger dataset can improve the performance of AI in detecting melanoma and other skin cancer types in darker skin. In (Salau & Jain, Citation2019; Wei et al., Citation2019; Yi, Walia, & Babyn, Citation2019), the authors stated that Generative Adversarial Networks (GANs) can be used to generate high-quality replicas of an image and can also be used to create imaging data that appear real. We have established that GANs can be beneficial when generating synthetic melanoma images that look original to solve the issue of lack of data (Bissoto et al., Citation2018; Brennard & Buster, Citation2021). In , a comparative analysis was presented with existing related works. Its is noteworthy to state that our suggested strategy outperforms earlier work in numerous performance metrics. The proposed model was able to correctly categorize melanoma cancer in people with white skin faster than that of black skin achieving 1.2s and 2.3s respectively.

Table 5. Proposed model’s performance compared to other related works.

Download CSV Display Table

The dataset used for training the model in this work consists of images obtained as secondary data. The source of the data is a custom generated dataset that includes low-quality clinical images from online medical sources. As a result, this experiment, if reproduced, may achieve a different result with higher-quality images or a standard dataset.

5. Conclusion

One of the most prevalent diseases that can be initially identified by visual observation and further identified with the aid of dermoscopic examination and other tests is skin cancer. Melanoma detection in people with darker skin tones requires attention due to the identified gap in the literature. In this paper, we used cutting-edge computer vision and artificial intelligence methods for skin cancer image classification. We leveraged on the knowledge from previous works to create a robust AI solution for classifying melanoma. In this paper, we developed a model that can correctly categorize melanoma cancer in people with darker skin. With the suggested model, 99% accuracy rate was achieved. Data augmentation was used to increase the number of images needed for the experiment as deep learning methods require large amounts of data. As it is widely known, having enough data aids in improving a deep learning models performance. In the future, interested academics can conduct further research to ascertain how other data augmentation techniques, like generative adversarial networks, will enhance model performance to make up for the lack of clinical data for cancer in people of color.

Competing interests

The authors declare that they have no competing interests.

Funding

Authors declare no funding for this research.

Disclosure statement

No potential conflict of interest was reported by the authors.

Availability of data

The dataset generated during and/or analysed during the current study are not publicly available but are available from the corresponding author on reasonable request.

References

Abbes, W., & Sellami, D. (2019). Deep neural network for fuzzy automatic melanoma diagnosis. In VISIGRAPP; Setubal, Portugal: SCITEPRESS.
Google Scholar
Abdulhak, A., & Moiin, A. (2020). Cancers arising in black skin. In A. Moiin (Eds.), Atlas of black skin. Cham: Springer. doi:10.1007/978-3-030-31485-9_10
Google Scholar
Adegun, A., & Viriri, S. (2021). Deep learning techniques for skin lesion analysis and melanoma cancer detection: A survey of state-of-the-art. Artificial Intelligence Review, 54(2), 811–841. doi:10.1007/s10462-020-09865-y
Web of Science ®Google Scholar
Akkoca-Gazioğlu, B. S., & Kamasak, M. (2020). Effects of objects and image quality on melanoma classification using deep neural networks. Biomedical Signal Processing and Control, 67, 102530. doi:10.21203/rs.3.rs-35907/v1
Web of Science ®Google Scholar
Bi, L., Kim, J., Ahn, E., Feng, D., & Fulham, M. (2016). Automatic melanoma detection via multi-scale lesion-biased representation and joint reverse classification. In Proceedings of the 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI) (pp. 1055–1058). Prague: Czech Republic.
Google Scholar
Bissoto, A., Perez, F., Valle, E., & Avila, S. (2018). Skin lesion synthesis with generative adversarial networks. In OR 2.0 context-aware operating theaters, computer assisted robotic endoscopy, clinical image-based procedures, and skin image analysis (pp. 294–302). Springer. https://www.springerprofessional.de/en/skin-lesion-synthesis-with-generative-adversarial-networks/16165628
Google Scholar
Brennard, M., & Buster, K. (2021). Dermatology education melanoma. Retrieved from http://skinofcolorsociety.org/dermatology-education/melanoma/.
Google Scholar
Brinker, T. J., Hekler, A., Enk, A. H., Klode, J., Hauschild, A., Berking, C., … Frohling, S. (2019). A convolutional neural network trained with dermoscopic images performed on par with 145 dermatologists in a clinical melanoma image classification task. European Journal of Cancer, 111, 148–154. doi:10.1016/j.ejca.2019.02.005
PubMed Web of Science ®Google Scholar
Brinker, T. J., Hekler, A., Enk, A. H., Klode, J., Hauschild, A., Berking, C., … Holland-Letz, T. (2019). Deep learning outperformed 136 of 157 dermatologists in a head-to-head dermoscopic melanoma image classification task. European Journal of Cancer, 113, 47–54. doi:10.1016/j.ejca.2019.04.001
PubMed Web of Science ®Google Scholar
Cancer A-Z. (2022). Skin cancer image gallery. American Cancer Society. Retrieved from https://www.cancer.org/cancer/skin-cancer/skin-cancer-image-gallery.html.
Google Scholar
Codella, N. C. F., Nguyen, Q.-B., Pankanti, S., Gutman, D. A., Helba, B., Halpern, A. C., & Smith, J. R. (2017). Deep learning ensembles for melanoma recognition in dermoscopy images. IBM Journal of Research and Development, 61(4–5), 5:1–5:15. doi:10.1147/JRD.2017.2708299
Web of Science ®Google Scholar
de Giorgi, V., Trez, E., Salvini, C., Duquia, R., De Villa, D., Sestini, S., … Lotti, T. (2006). Dermoscopy in black people. The British Journal of Dermatology, 155(4), 695–699. doi:10.1111/j.1365-2133.2006.07415.x
PubMed Web of Science ®Google Scholar
Dildar, M., Akram, S., Irfan, M., Khan, H. U., Ramzan, M., Mahmood, A. R., … Mahnashi, M. H. (2021). Skin cancer detection: A review using deep learning techniques. International Journal of Environmental Research and Public Health, 18(10), 5479. doi:10.3390/ijerph18105479
PubMed Web of Science ®Google Scholar
Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., & Thrun, S. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115–118. doi:10.1038/nature21056
PubMed Web of Science ®Google Scholar
Fujisawa, Y., Otomo, Y., Ogata, Y., Nakamura, Y., Fujita, R., Ishitsuka, Y., … Fujimoto, M. (2019). Deep-learning based, computer-aided classifier developed with a small dataset of clinical images surpasses board-certified dermatologists in skin tumour diagnosis. The British Journal of Dermatology, 180(2), 373–381. doi:10.1111/bjd.16924
PubMed Web of Science ®Google Scholar
Giotis, I., Molders, N., Land, S., Biehl, M., Jonkman, M. F., & Petkov, N. (2015). MED-NODE: A computer-assisted melanoma diagnosis system using non-dermoscopic images. Expert Systems with Applications, 42(19), 6578–6585. doi:10.1016/j.eswa.2015.04.034
Web of Science ®Google Scholar
Gloster, H. M., & Neal, K. (2006). Skin cancer in skin of color. Journal of the American Academy of Dermatology, 55(5), 741–760. doi:10.1016/j.jaad.2005.08.063
PubMed Web of Science ®Google Scholar
Gohara, M. (2015). Skin cancer: An African perspective. British Journal of Dermatology, 173(2), 17–21. doi:10.1111/bjd.13380
PubMedGoogle Scholar
Goyal, M., Knackstedt, T., Yan, S., & Hassanpour, S. (2020). Artificial intelligence-based image classification methods for diagnosis of skin cancer: Challenges and opportunities. Computers in Biology and Medicine, 127, 104065. doi:10.1016/j.compbiomed.2020.104065
PubMed Web of Science ®Google Scholar
Gupta, A. K., Bharadwaj, M., & Mehrotra, R. (2016). Skin cancer concerns in people of color: Risk factors and prevention. Asian Pacific Journal of Cancer Prevention : APJCP, 17(12), 5257–5264. doi:10.22034/APJCP.2016.17.12.5257
PubMedGoogle Scholar
Haenssle, H. A., Fink, C., Schneiderbauer, R., Toberer, F., Buhl, T., Blum, A., … Uhlmann, L. (2018). Man against machine: Diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Annals of Oncology : Official Journal of the European Society for Medical Oncology, 29(8), 1836–1842. doi:10.1093/annonc/mdy166
PubMed Web of Science ®Google Scholar
Haenssle, H. A., Fink, C., Toberer, F., Winkler, J., Stolz, W., Deinlein, T., … Zutt, M. (2020). Man against machine reloaded: Performance of a market-approved convolutional neural network in classifying a broad spectrum of skin lesions in comparison with 96 dermatologists working under less artificial conditions. Annals of Oncology : Official Journal of the European Society for Medical Oncology, 31(1), 137–143. doi:10.1016/j.annonc.2019.10.013
PubMed Web of Science ®Google Scholar
Han, S. S., Kim, M. S., Lim, M. S., Park, G. H., Park, I., & Chang, S. E. (2018). Classification of the clinical images for benign and malignant cutaneous tumors using a deep learning algorithm. The Journal of Investigative Dermatology, 138(7), 1529–1538. doi:10.1016/j.jid.2018.01.028
PubMed Web of Science ®Google Scholar
Hasan, M., Barman, S. D., Islam, S., & Reza, A. W. (2019). Skin cancer detection using convolutional neural network. In Proceedings of the 2019 5th International Conference on Computing and Artificial Intelligence (pp. 254–258).
Google Scholar
Hosny, K. M., Kassem, M. A., & Foaud, M. M. (2019). Classification of skin lesions using transfer learning and augmentation with Alex-net. PloS One, 14(5), e0217293. doi:10.1371/journal.pone.0217293
PubMed Web of Science ®Google Scholar
Huang, G., Liu, Z., Van Der, M. L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708.
Google Scholar
Jain, S., Singhania, U., Tripathy, B., Nasr, E. A., Aboudaif, M. K., & Kamrani, A. K. (2021). Deep learning-based transfer learning for classification of skin cancer. Sensors, 21(23), 8142. doi:10.3390/s21238142
PubMed Web of Science ®Google Scholar
Jeffcock, P. (2018). What’s the difference between AI, machine learning, and deep learning? Oracle Big Data Blog. Retrieved from https://blogs.oracle.com/bigdata/difference-ai-machine-learning-deep-learning.
Google Scholar
Jojoa Acosta, M. F., Caballero Tovar, L. Y., Garcia-Zapirain, M. B., & Percybrooks, W. S. (2021). Melanoma diagnosis using deep learning techniques on dermatoscopic images. BMC Medical Imaging, 21(1), 6. doi:10.1186/s12880-020-00534-8
PubMed Web of Science ®Google Scholar
Khan, M. Q., Hussain, A., Rehman, S. U., Khan, U., Maqsood, M., Mehmood, K., & Khan, M. A. (2019). Classification of melanoma and nevus in digital images for diagnosis of skin cancer. IEEE Access, 7, 90132–90144. 2019. doi:10.1109/ACCESS.2019.2926837
Web of Science ®Google Scholar
Mahendraraj, K., Sidhu, K., Lau, C., McRoy, G. J., Chamberlain, R. S., & Smith, F. O. (2017). Malignant melanoma in African–Americans. Medicine, 96(15), e6258. doi:10.1097/MD.0000000000006258
PubMedGoogle Scholar
Mikhail, M. (2020). Skin cancer in people of color: Statistics, pictures, and prevention. Retrieved from https://www.goodrx.com/conditions/skin-cancer/skin-cancer-people-of-color-pictures-prevention.
Google Scholar
Nelson, B. (2020). How dermatology is failing melanoma patients with skin of color. Cancer Cytopathology, 128(1), 7–8. doi:10.1002/cncy.22229
PubMedGoogle Scholar
Nthumba, P. M., Cavadas, P. C., & Landin, L. (2011). Primary cutaneous malignancies in sub‐Saharan Africa. Annals of Plastic Surgery, 66(3), 313–320. doi:10.1097/SAP.0b013e3181e7db9a
PubMed Web of Science ®Google Scholar
Ramnani, D. (2020). Melanoma - clinico-pathologic subtypes. Retrieved from https://www.webpathology.com/case.asp?case=704.
Google Scholar
Rezk, E., Eltorki, M., & El-Dakhakhni, W. (2022a). Improving skin color diversity in cancer detection: Deep learning approach. JMIR Dermatology, 5(3), e39143. doi:10.2196/39143
Google Scholar
Rezk, E., Eltorki, M., & El-Dakhakhni, W. (2022b). Leveraging artificial intelligence to improve the diversity of dermatological skin color pathology: Protocol for an algorithm development and validation study. JMIR Research Protocols, 11(3), e34896. doi:10.2196/34896
PubMed Web of Science ®Google Scholar
Sagar, A., & Dheeba, J. (2020). Convolutional neural networks for classifying melanoma images. Cold Spring Harbour Laboratory. bioRxiv. doi:10.1101/2020.05.22.110973
Google Scholar
Salau, A. O. (2021). Detection of corona virus disease using a novel machine learning approach. In 2021 International Conference on Decision Aid Sciences and Application (DASA) (pp. 587–590). doi:10.1109/DASA53625.2021.9682267
Google Scholar
Salau, A. O., & Jain, S. (2019). Feature extraction: A survey of the types, techniques, and applications. In 5th IEEE International Conference on Signal Processing and Communication (ICSC), Noida, India, pp. 158–164. doi:10.1109/ICSC45622.2019.8938371
Google Scholar
Saraiya, M. (2022). Melanoma research alliance. Retrieved from https://www.curemelanoma.org/about-melanoma/types/acral-melanoma/.
Google Scholar
Sarkar, R., Chatterjee, C. C., & Hazra, A. (2019). Diagnosis of melanoma from dermoscopic images using a deep depthwise separable residual convolutional network. IET Image Processing, 13(12), 2130–2142. doi:10.1049/iet-ipr.2018.6669
Web of Science ®Google Scholar
Swerdlow, A. J. (1990). International trends in cutaneous melanoma. In D. L. Davis & D. Hoel (Eds.), Trends in cancer mortality in industrial countries, (pp. 235–251). New York, NY: New York Academy of Sciences. doi:10.1111/j.1749-6632.1990.tb32071.x
Google Scholar
Thapar, P., Rakhra, M., Cazzato, G., & Hossain, S. M. (2022). A novel hybrid deep learning approach for skin lesion segmentation and classification. Journal of Healthcare Engineering, 2022, 1709842. doi:10.1155/2022/1709842
PubMed Web of Science ®Google Scholar
Tschandl, P., Rosendahl, C., Akay, B. N., Argenziano, G., Blum, A., Braun, R. P., … Lallas, A. (2019). Expert-level diagnosis of nonpigmented skin cancer by combined convolutional neural networks. JAMA Dermatology, 155(1), 58–65. doi:10.1001/jamadermatol.2018.4378
PubMed Web of Science ®Google Scholar
Tschandl, P., Rosendahl, C., & Kittler, H. (2018). The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific Data, 5(1), 180161. doi:10.1038/sdata.2018.161
PubMedGoogle Scholar
Wei, J., Suriawinata, A., Vaickus, L., Ren, B., Liu, X., Wei, J., & Hassanpour, S. (2019). Generative image translation for data augmentation in colorectal histopathology images. In Proceedings of Machine Learning for Health Workshop at NeurIPS.
Google Scholar
Wong, V., Lin, W. M., Hocker, S., & Burgin, S. (2019). Acral ME. Acral lentiginous melanoma. Retrieved from https://www.visualdx.com/visualdx/diagnosis/acral+lentiginous+melanoma?diagnosisId=53033&moduleId=101.
Google Scholar
Yi, X., Walia, E., & Babyn, P. (2019). Generative adversarial network in medical imaging: A review. Medical Image Analysis, 58, 101552. doi:10.1016/j.media.2019.101552
PubMed Web of Science ®Google Scholar

Deep learning model to improve melanoma detection in people of color

Abstract

1. Introduction

2. Related works