1,682

Views

CrossRef citations to date

Altmetric

(conference) IFAC MIM

Few-shot learning for defect detection in manufacturing

Patrik Zajeca Jožef Stefan International Postgraduate School, Ljubljana, Slovenia;b Jožef Stefan Institute, Ljubljana, Slovenia

https://orcid.org/0000-0002-6630-3106 View further author information

Jože M. Rožaneca Jožef Stefan International Postgraduate School, Ljubljana, Slovenia;b Jožef Stefan Institute, Ljubljana, Slovenia;c Qlector d.o.o., Ljubljana, SloveniaCorrespondence[email protected]

https://orcid.org/0000-0002-3665-639X View further author information

Spyros Theodoropoulosd Department of Digital Systems, University of Piraeus, Piraeus, Greece;e Department of Electrical and Computer Engineering, National Technical University of Athens, Athens, GreeceView further author information

Mihail Fontulf Iber-Oleff S.A., Coimbra, PortugalView further author information

Erik Koehorstg Philips Consumer Lifestyle BV, Drachten, The NetherlandsView further author information

Blaž Fortunab Jožef Stefan Institute, Ljubljana, Slovenia;c Qlector d.o.o., Ljubljana, Slovenia

https://orcid.org/0000-0002-8585-9388 View further author information

Dunja Mladenićb Jožef Stefan Institute, Ljubljana, Slovenia

https://orcid.org/0000-0002-0360-6505 View further author information

show all

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Quality control is being increasingly automatised in the context of Industry 4.0. Its automatisation reduces inspection times and ensures the same criteria are used to evaluate all products. One of the challenges when developing supervised machine learning models is the availability of labelled data. Few-shot learning promises to be able to learn from few samples and, therefore, reduce the labelling effort. In this work, we combine this approach with unsupervised methods that learn anomaly maps on unlabelled data, providing additional information to the model and enhancing the classification models' discriminative capability. Our results show that the few-shot learning models achieve competitive results compared to those trained in a classical supervised classification setting. Furthermore, we develop novel active learning data sampling strategies to label an initial support set. The results show that using sampling strategies to create and label the initial support set yields better results than selecting samples at random. We performed the experiments on four datasets considering real-world data provided by Philips Consumer Lifestyle BV and Iber-Oleff - Componentes Tecnicos Em Plástico, S.A.

Keywords:

1. Introduction

The increasing connectivity capabilities and adoption of digital technologies have enabled the digitalisation of manufacturing and originated new manufacturing paradigms known as Industry 4.0 and Industry 5.0 (Benbarrad et al. Citation2021; Lenka, Parida, and Wincent Citation2017). While Industry 4.0 is concerned with leveraging new technologies (e.g. the internet of things, cloud computing, and artificial intelligence, among others) to increase productivity across the value chain and enable the efficient production of goods (Lim, Zheng, and Chen Citation2020), Industry 5.0 is concerned with how such technologies can be used and applied to achieve a human-centric workspace, and thus changing the role of the operator (EC2 Citationn.d.; Lied, Mogos, and Powell Citation2020; Nahavandi Citation2019).

Quality control is a key phase of the manufacturing process, ensuring the products conform to specific requirements and specifications (Yang et al. Citation2020) and therefore is a precondition to building a brand's reputation, build trust with the consumer, and loyalty. While such inspection has been frequently performed manually, there is an increasing trend to automate the quality inspection process. Some of the advantages of such automation are increased scalability (Chin and Harlow Citation1982; Chouchene et al. Citation2020), homogeneous defect inspection criteria (See Citation2012), and the ability to trace defects' root causes to solve issues in the production process proactively. Research shows that Industry 4.0 technologies applied to defect inspection have the potential to realise a substantial increase in productivity Tortorella et al. (Citation2023). Automated visual quality inspection is one such approach (Abd Al Rahman and Mousavi Citation2020).

Many approaches have been tried to build automated visual inspection models. Most recent approaches leverage advances in machine learning to determine whether a defect exists and eventually determine the type of defect. While unsupervised approaches detect whether a manufactured piece is defective, they do not provide information on the defect detected. Supervised methods can provide such information, which requires manually labelling samples of good and defective manufactured pieces. Data labelling is a costly operation. While certain approaches (e.g. active learning) can alleviate the labelling effort, annotating a few hundred images per defect is usually necessary to ensure the machine learning model learns appropriately. Few-shot learning is a recent approach that aims to reduce the number of labelled instances required to train a classifier. Using such an approach, we extend the experiments performed in our work described in ‘Towards a Comprehensive Visual Quality Inspection for Industry 4.0’ (Rožanec, Zajec, Trajkova et al. Citation2022).

This research aims to determine how few-shot learning can be used and enhanced in the context of defect detection, ensuring the least amount of data is used while maximising the models' discriminative performance. This would reduce the effort required to develop new defect detection models and the time to train them, increasing agility while reducing development costs in manufacturing and conforming to the required quality levels for multiple products. In particular, the goals pursued were:

based on the findings by Rožanec, Zajec, Theodoropoulos et al. (Citation2022), assess how combining images and DRAEM (Discriminatively trained Reconstruction Anomaly Embedding Model) (Zavrtanik, Kristan, and Skočaj Citation2021) anomaly maps (which signal potential defects) enhances the classification quality and generalises to few-shot learning scenarios;
contrast traditional supervised learning with an artificially induced imbalance with few-shot learning to assess the effectiveness and shortcomings of both approaches;
study how dataset impurity levels affect DRAEM models (to avoid labelling data, which would defeat the few-shot learning purpose);
develop novel active learning strategies that can assist in creating better support sets

The main innovation points of this research are:

the use of the information extracted from unsupervised methods to enhance supervised few-shot classification learning and performance;
the development of a novel active learning strategy that leverages explainable artificial intelligence insights for data selection

The experiments were performed with real-world data provided by Philips Consumer Lifestyle BV and Iber-Oleff - Componentes Tecnicos Em Plástico, S.A.. The machine learning models were evaluated with the AUC ROC metric to inform the discriminative power of the machine learning models.

This paper is organised as follows. First, Section 2 describes related work, Section 3 describes the Philips Consumer Lifestyle BV and Iber-Oleff - Componentes Tecnicos Em Pláistico, S.A. use cases and datasets. Section 4 describes the experiments we performed, and Section 5 informs the results we obtained. Finally, Section 7 concludes and describes future work.

2. Related work

2.1. Automated visual inspection

Traditional visual inspection involves human operators inspecting the manufactured pieces to determine whether they are defective. Many drawbacks to this approach have been observed. Among them, manufacturing companies are concerned about the limited scalability of the approach (e.g. given that human inspectors can work for a limited amount of time, and the resources required to train new human inspectors usually grow proportionally to the production scale). Furthermore, the inspection performed by each human inspector is subjective. Therefore, an inherent inspector-to-inspector inconsistency exists regardless of the human inspectors' proficiency in the process. Such discrepancies are influenced and magnified by factors related to the task (e.g. defect rate, the complexity of visual inspection), the inspector (e.g. visual acuity or experience), the environment in which the inspection is performed (e.g. lighting, shift duration and time of the day), and organisational (e.g. management support or incentives) and social aspects (e.g. isolation or opportunity for consultation) (Cullinane et al. Citation2013; Kujawińska, Vogt, and Hamrol Citation2016; See Citation2012).

2.1.1. Background

The automated visual inspection aims to address the abovementioned issues. It guarantees scalability by creating software capable of inspecting manufactured products and determining whether they are defective. Furthermore, inspector-to-inspector inconsistency is eliminated, given a single criterion for product quality is established. An automated visual inspection enables non-destructive testing in quality control to identify functional and cosmetic defects (Chin and Harlow Citation1982). Cameras provide visual input, which can be processed with different techniques (Czimmermann et al. Citation2020). State-of-the-art (SOTA) automated visual inspection techniques leverage deep learning techniques (Aggour et al. Citation2019; Pouyanfar et al. Citation2018), which have demonstrated super-human performance on many machine vision tasks (O'Mahony et al. Citation2020). Such models can be either supervised or unsupervised. The unsupervised methods allow for the discrimination of defective manufactured pieces without any labelled data. While such an approach is attractive given that no data labelling is required, it does not provide information on the defect type and is, therefore, unsuitable for all manufacturing processes. On the other hand, supervised models can discriminate between different types of defects and, therefore, can be helpful in production when different levels of quality must be satisfied. For example, some imperfections can be cosmetic, while others may affect product functionalities. Therefore, different thresholds can be used for them. Furthermore, information on the type of defect can be used in many settings to determine the root causes of such defects and take appropriate action. Nevertheless, supervised models require data labelling, which is a time-consuming and error-prone task that must be performed by humans (Y. Wang et al. Citation2018).

Multiple artificial intelligence approaches have been researched to reduce the labelling effort. One such approach is the active learning paradigm, which assumes a constrained capacity to provide learning samples to a machine learning model and that the learning process can be improved by carefully selecting the data instances to maximise learning towards a given objective (Settles Citation2009). Such data instances can be either sampled from actual data or artificially generated. A second paradigm is transfer learning, which aims to transfer knowledge acquired from another source or domain where data is abundant and apply it in a different setting where data regarding the origin or domain is scarce. Furthermore, domain adaptation is a variation of this approach, where the source and target tasks are the same, but the source and target domains differ. Fourth, meta-learning aims to learn meta-knowledge across tasks and apply it to a concrete task based on task-specific information. Finally, few-shot learning compensates for the lack of supervised data by reframing the classification problem and learning how close the data instances between classes are. Furthermore, it leverages the lack of data using meta-learning, generating synthetic samples, or recurring to transfer learning (using a data representation learned on a different dataset and training a new classifier) (Parnami and LeeCitation2022).

2.1.2. Machine learning approaches to automated visual inspection

There have been many works from various industrial sectors on the automation of visual quality inspection relying on machine learning and deep learning methods. For instance, in an early example of an inspection of Printed Circuit Boards (Duan et al. Citation2012), statistical shape models micro-drill bit defects were combined with dimensionality reduction techniques (Principal Component Analysis and Linear Discriminant Analysis) to create input features for various models, including Support Vector Machines and shallow Multi-layer perceptrons. The promising results of Support Vector Machines on custom extracted features were identified even earlier in the inspection of rolled steel (Jia et al. Citation2004), which managed to integrate them in a fast (six seconds per 1MB image) real-time system. More recently, Support Vector Machines and genetic algorithms were successfully used to detect porosity defects in the welding process of aluminum by combining extracted features from various sources such as spectral and X-ray data (Huang et al. Citation2017). Gobert et al. (Citation2018) examined the metallic power bed fusion process in additive manufacturing also through the SVM-based classification of features originating from a digital single-lens reflex (DSLR) camera and labelled in a semi-automatic way with the help of CT scans. Despite the success of methods based on custom feature extraction combined with a traditional machine learning classifier (such as SVMs), later approaches use deep learning, especially Convolutional Neural Networks (CNNs), which operate directly on images, adaptively extracting features during their training process. While many pre-trained CNNs on large datasets can be used off the shelf and finetuned to a specific use case, Villalba-Diez et al. (Citation2019) found it more advantageous to train a custom shallow CNN from scratch, specifically tailored to their Printing Industry use case. What appeared challenging to them was the standardisation of input image conditions, especially regarding controlling image brightness. Liqun, Jiansheng, and Dingjin (Citation2020), on the other hand, followed the path of transfer learning and found that the classification of vehicle parts via fine-tuning a pre-trained VGG16 model produced higher accuracy in comparison to a Support Vector Machine over Histogram of Gradients (HoG) features. It could well be the case that more complex products such as vehicle parts need these complicated but versatile conditions to conform to modern inspection application requirements such as scalability, agnosticity to different inputs, and quick retraining process as outlined by Chouchene et al. (Citation2020). Yu et al. (Citation2023) proposed the Cascaded Adaptive Global Location Network. This novel deep neural network combines residual, feature pyramid, and cascade adaptive tree-structure region proposal networks for feature extraction and uses a global localisation regression to perform defect detection. The authors applied the model to defect detection on steel surfaces. Zhao et al. (Citation2023) proposed a multi-surface defect detection method that performs region segmentation, feature extraction, and defect detection, enabling an efficient quality control of universal joint bearings. Beltrán-González, Bustreo, and Del Bue (Citation2020) successfully combined CNNs with Long Short-Term Memory Networks (LSTMs) to identify the presence of debris in avionic component ducts. Finally, Shahin et al. (Citation2023) reported how the YOLO v7 model was used to discriminate defective packages and prevent them from moving into shipping operations. For detailed systematic literature reviews on this field, we encourage the authors to read the excellent works by Ren et al. (Citation2022), Konstantinidis et al. (Citation2023), and Abd Al Rahman and Mousavi (Citation2020).

2.2. Few-shot learning

Few-shot learning is a machine learning approach where the learner aims to acquire experience to solve a specific task with only a few data samples. As for any machine learning approach, the success of such learning is measured with a particular metric suitable to the specific goal (Y. Wang et al. Citation2020).

2.2.1. Background

When considering how previous experience is deemed to enable learning from a few data instances, few-shot learning approaches can either adjust the data (e.g. augment the data set with samples from other datasets or use unlabelled data), the model (e.g. acquire knowledge on another dataset), or algorithm (e.g. adapt hyperparameters based on prior meta learned knowledge) (Y. Wang et al. Citation2020).

Parnami and Lee (Citation2022) categorise few-shot learning approaches into meta-learning-based and non-meta-learning-based few-shot learning approaches. The non-meta-learning-based approaches consider few-shot learning approaches derived from transfer learning. On the other hand, meta-learning-based approaches are divided into two categories: hybrid and main approaches. Among the main approaches, we find metric-based, optimisation-based, and model-based meta-learning. In a classification setting, metric-based approaches attempt to learn a mapping from input data to an embedding space, ensuring that data instances from the same class remain close to each other and distant from different classes. Therefore, the distance to the nearest neighbours can be used to determine the class of a particular instance in test time. Optimisation-based few-shot learning techniques aim to optimise the limited training data while still achieving good generalisation. They usually do so by learning in two stages: a task-specific learner is used to solve a specific task, and a non-task-specific meta-learner is used to learn from the experience acquired through multiple tasks and direct further learning. In episodic training, the meta-learner updates the learner model's parameters based on the experience acquired through the many tasks it trained on. Finally, model-based meta-learning does not make any assumptions on priors but focuses on architectures tailored for fast learning. Among such architectures, we find memory-based architectures (Cai et al. Citation2018), rapid-adaptation architectures (Munkhdalai and Yu Citation2017), and other approaches (Mishra et al. Citation2017).

When dealing with classification, few-shot learning tries to compensate for the lack of data by framing the learning problem to learn similarities and differences between classes. This approach is fundamentally different from traditional machine learning approaches, where the algorithm is trained to learn what constitutes a particular class. While the classification outcomes are the same (decide whether a data instance corresponds to a given class), the learning process is not. Furthermore, few-shot learning classification requires a slightly different training setup. The train set (a.k.a. support set) comprises $M * K$ data instances corresponding to M classes and K examples per class. The classes present in the support set are usually referred to as base classes. Furthermore, a query set contains the images to be classified, which can correspond to base classes and novel classes (not seen in the support set). Nevertheless, this task definition does not consider class-imbalance scenarios, which are frequent in the real world. How to mitigate performance drops where a class imbalance is present remains an open challenge (Ochal et al.Citation2021).

2.2.2. Few-shot learning for automated visual inspection

The advantage of few-shot learning has made it an interesting approach when developing machine learning models for automated visual inspection. Lv and Song (Citation2019) developed a few-shot learning approach to detect defects on bar surfaces. The model involved a convolutional neural network (CNN) in extracting image features and a relation network to compute a similarity score between pairs of images. The authors used a Squeeze-and-Excitation Network as an attention module to enhance features describing defects. They employed Mean-Pooling to preserve background information and distinguish better between pseudo and real defects. A similar approach was developed later by Takimoto et al. (Citation2022), who performed anomaly detection using a convolutional neural and Siamese network with an attention mechanism to detect defects on the MVTec dataset. They proposed using a pair-balanced contrastive loss to account for the effect of data imbalance. Furthermore, the attention mechanism aimed to increase the distance between data instances of different classes in the embedding space. The Siamese network was used to perform metric learning and learn to discriminate defective and non-defective products based on the learned metrics. H. Wang, Li, and Wang (Citation2021) proposed an incremental few-shot learning framework and executed experiments using the Faster R-CNN as a backbone model. The authors aimed to detect defects on steel surfaces. To enhance the model's training performance, the authors considered a diverse set of input images must be used and, therefore, resorted to performing data augmentation to guarantee such diversity by applying image transformations. Wu et al. (Citation2021) described a few-shot learning approach for defect detection in lithium batteries. In particular, they considered exposure fusion to capture batteries' reflectivity and convey 3D information in a 2D image. Furthermore, they used data augmentation to enrich the datasets and label propagation to overcome the shortage of labelled data. The few-shot learning model was based on a ResNet-10 feature extractor and a fully connected layer to perform classification. Furthermore, Zhan, Zhou, and Xu (Citation2022) described how prototypical networks were used to perform automated fabric defect classification. Furthermore, the authors used class activation mapping to visualise and interpret the regions relevant to the classification of a particular defect class. Zhang et al. (Citation2020) described using few-shot with model-agnostic meta-learning to detect defects on bearings. The implementation considered convolutional neural networks treating the identification of the various types of defects as different tasks and then a meta-learner to learn the best parameters across the classification tasks to accelerate learning. Xu and Ma (Citation2022) applied few-shot learning for auto parts defect detection. The authors compared the ProtoNet, the FEAT, and a custom network based on the ProtoNet and ECA-Net with an attention mechanism.

2.3. Research gap

While few-shot learning has been applied to defect detection, research has been mainly focused on developing novel deep-learning architectures that would issue better classification results. Furthermore, little research inquired into how few-shot learning models can benefit from carefully selected samples used to train each episode and input data enriched with cues about possible defects. This research aims to bridge this gap while researching active learning strategies that consider explainable artificial intelligence insights to select relevant data instances. In the context of current research, our studied approaches follow the trend towards data-centric instead of model-centric solutions. As explained in Singh (Citation2023), as models gain in sophistication and their implementations become readily availably through different machine learning libraries, the predictive performance returns on model optimisation diminish, while the costs of developing and improving such models increase both regarding development effort and computational resources. This has led researchers to seek more impactful improvements in techniques that improve the quality or saliency of the input data. The use of heatmap-enhanced image inputs and active learning follows this trend in the context of few-shot visual quality inspection.

3. Use cases and datasets

We performed the experiments on real-world data provided by Philips Consumer Lifestyle BV (The Netherlands) and Iber-Oleff - Componentes Tecnicos Em Plástico, S.A. (Portugal). Philips Consumer Lifestyle BV manufacturing plant in Drachten is considered one of Europe's most important Philips development centres and produces many household appliances. The three datasets provided by them correspond to different products: (a) logo prints on shavers (see Figure ), (b) deco cap (covers the centre of the metal shaving head and leaves room for a print to identify it from other types - see Figure ), and (c) shaft (toothbrush part that transfers the motion from the handle to the actual brush - see Figure ).

Figure 1. Sample from the Philips Consumer Lifestyle BV shavers dataset.

Three images showing samples of logos printed on shavers. The three examples correspond to the Philips Consumer Lifestyle BV shavers dataset's good, double print, and interrupted print cases.

Figure 2. Sample from the Philips Consumer Lifestyle BV deco cap dataset.

Three images showing a well-manufactured product and two defective ones. The three examples correspond to good, flow-lines, and marks cases from the Philips Consumer Lifestyle BV deco cap dataset.

Figure 3. Sample from the Philips Consumer Lifestyle BV shaft dataset.

Four images showing a well-manufactured product and three defective ones (big dents, small dents, and stripes). The examples correspond to cases from the Philips Consumer Lifestyle BV shaft dataset.

The shavers dataset contains 3.518 images with the heaviest imbalance among the datasets (the defective products account for almost 24% of the dataset). Two defects were labelled: double-printed logos and those with interrupted printing. The deco cap dataset contained 592 images and was labelled for two imperfections: flowlines and marks. The defects account for almost two-thirds of the dataset. Finally, the shaft dataset has 4.249 images and was labelled for three kinds of defects: big dents, small dents, and stripes. The images regarding defective items account for 38% of the dataset's images. We provide a more detailed description of the datasets in Table . Regardless of the product inspected, the manual inspection of the abovementioned products requires inspectors to spend several seconds handling and inspecting the product and determining whether it is defective.

Few-shot learning for defect detection in manufacturing

Abstract

1. Introduction

2. Related work

2.1. Automated visual inspection

2.1.1. Background

2.1.2. Machine learning approaches to automated visual inspection

2.2. Few-shot learning

2.2.1. Background

2.2.2. Few-shot learning for automated visual inspection

2.3. Research gap

3. Use cases and datasets

Table 1. Datasets description, describing label types and the number of data instances per label for each dataset.

4. Experiments

Table 2. Brief description of experiments performed: their aim, method utilised, and relevant metrics.

Table 3. Description of machine learning models used across the experiments.

4.1. Experiment 1: few-shot learning on product images

4.2. Experiment 2: do DRAEM anomaly maps improve few-shot learning classifiers' performance?

4.3. Experiment 3: can we train DRAEM on impure datasets to generate anomaly maps without degrading the classifiers' performance?

Table 4. Dataset composition for different degrees of imbalance. The rate describes the number of defective samples we consider w.r.t. the original dataset.

4.4. Experiment 4: how can we construct a support set that maximises models' learning?

5. Results

5.1. Experiment 1: few-shot learning outperformed a classical supervised machine learning model

Table 5. AUC ROC measured for classified images in a binary classification setting (defective vs. non-defective) for one-shot learning. The best results for each input type are bolded, and the second-best results are highlighted in italics. We report the 95% confidence intervals.

Table 6. AUC ROC measured for classified images in a binary classification setting (defective vs. non-defective) for five-shot learning. The best results for each input type are bolded, and the second-best results are highlighted in italics. We report the 95% confidence intervals.

Table 7. AUC ROC (one vs. rest) measured for classified images in a multiclass classification setting for one-shot learning. The best results for each input type are bolded, and the second-best results are highlighted in italics. We report the 95% confidence intervals.

Table 8. AUC ROC (one vs. rest) measured for classified images in a multiclass classification setting for five-shot learning. The best results for each input type are bolded, and the second-best results are highlighted in italics. We report the 95% confidence intervals.

5.1.1. How are these results relevant to production systems?

5.2. Experiment 2: DRAEM anomaly maps improve few-shot learning classifiers' performance

Table 9. AUC ROC measured for models in a binary classification setting (defective vs. non-defective) for one-shot learning. The best results are bolded, and the second-best results are highlighted in italics. We report the 95% confidence intervals.

Table 10. AUC ROC measured for models in a binary classification setting (defective vs. non-defective) for five-shot learning. The best results are bolded, and the second-best results are highlighted in italics. We report the 95% confidence intervals.

Table 11. AUC ROC (one-vs-rest) measured for models in a multiclass classification setting for one-shot learning. The best results are bolded, and the second-best results are highlighted in italics. We report the 95% confidence intervals.

Table 12. AUC ROC (one-vs-rest) measured for models in a multiclass classification setting for five-shot learning. The best results are bolded, and the second-best results are highlighted in italics. We report the 95% confidence intervals.

5.2.1. How are these results relevant to production systems?

5.3. Experiment 3: training DRAEM anomaly maps on impure datasets can affect the classifier's performance

5.3.1. How are these results relevant to production systems?

5.4. Experiment 4: active learning techniques provide effective means to construct support sets that maximise the models' learning

Table 14. AUC ROC measured for models in a multiclass classification setting. The best results are bolded, and the second-best results are highlighted in italics. We report the 95% confidence intervals.

5.4.1. How are these results relevant to production systems?

6. Discussion

Table 15. Comparison of best models, considering Image (I) and Image+Heatmap (I+H) input features. The results were taken from Table 5 and Table 6 regarding the binary classification case.

Table 16. Comparison of best models, considering Image (I) and Image+Heatmap (I+H) input features. The results were taken from Table 7 and Table 8 regarding the multiclass classification case.

7. Conclusions and future work

Disclosure statement

Data availability statement

Additional information

Funding

Notes on contributors

Patrik Zajec

Jože M. Rožanec

Spyros Theodoropoulos

Mihail Fontul

Erik Koehorst

Blaž Fortuna

Dunja Mladenić

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Table 15. Comparison of best models, considering Image (I) and Image+Heatmap (I+H) input features. The results were taken from Table and Table regarding the binary classification case.

Table 16. Comparison of best models, considering Image (I) and Image+Heatmap (I+H) input features. The results were taken from Table and Table regarding the multiclass classification case.