706
Views
0
CrossRef citations to date
0
Altmetric
Research Article

CPW-DICE: a novel center and pixel-based weighting for damage segmentation

, , &
Article: 2259115 | Received 07 Mar 2023, Accepted 10 Sep 2023, Published online: 26 Sep 2023

Abstract

Reliable evaluation of damage in vehicles is a primary concern in the insurance industry. Consequently, solutions enhanced with Artificial Intelligence (AI) have become the norm. During the assessment, precise damage segmentation plays a crucial role. Dent is a type of damage that can commonly occur in vehicles. It is difficult to pinpoint and tends to blend in with the background. This paper proposes a novel loss function to improve dent segmentation accuracy in vehicle insurance claims. Centre and Pixel-based Weighted DICE (CPW-DICE) is a loss function that performs pixel-based weighting. The CPW-DICE aims to concentrate on the centre of the dent damage to lessen faulty segmentations. CPW-DICE generates a weight mask during training by employing ground truth (GT) and prediction masks. Simultaneously, the weight mask is incorporated into DICE loss. Experiments conducted on our comprehensive internal dataset show a 3% improvement in Intersection over Union (IoU) score for three state-of-the-art (SOTA) approaches compared to DICE loss. Finally, CPW-DICE is evaluated in similar tasks to demonstrate its benefits beyond car damage segmentation.

Introduction

Insurance industry is a prime application area for artificial intelligence-based solutions. Solutions proposed so far have been focused on; Fraud detection (Dimri et al., Citation2022; Lu et al., Citation2023; Waqas et al., Citation2020), document management (Guha et al., Citation2022), car damage classification (Patil et al., Citation2017; Waqas et al., Citation2020), car damage segmentation (Mallios et al., Citation2023; Pasupa et al., Citation2022) are examples of these domains.

Car insurance is the leading business for insurance companies. Damage evaluation process is generally complex and time-consuming (Pasupa et al., Citation2022). AI-based intelligent damage assessment systems are beneficial from various perspectives for insurance companies. Thus, this paper mainly focused on car dent damage segmentation.

Image segmentation is one of the main objectives of computer vision. The task can be defined as assigning specific class labels to sub-partitions of the image. Image segmentation can be categorised as semantic (Hatamizadeh et al., Citation2022; Huang et al., Citation2023; Wang et al., Citation2022), instance (Dong & Wang, Citation2022; Gu et al., Citation2022b; Mao et al., Citation2023), and panoptic segmentation (Li & Chen, Citation2022; Mohan & Valada, Citation2021). Semantic segmentation aims to assign every pixel in an image to a label. Instance segmentation, unlike semantic segmentation, is an approach that evaluates each instance of an object separately (Minaee et al., Citation2021). Panoptic segmentation is a mix of instance and semantic segmentation, which combines category-wise labelling from semantic segmentation and instance-wise labelling from instance segmentation. The problem in this study is the segmentation of the dented region. Each instance of the dent has been considered the same. Consequently, for this problem, semantic segmentation is determined as the most suitable technique.

Image segmentation models are used in various fields of computer vision. These models use different loss functions depending on the problem's nature and the target object's shape. Loss functions can be categorised as distribution-based, region-based, boundary-based, and compounded (Jadon, Citation2020). Loss functions frequently employed in segmentation tasks are; Cross-entropy (Yi-de et al., Citation2004), Focal Loss (Lin et al., Citation2018), DICE loss, and Tversky Loss (Salehi et al., Citation2017). Cross-entropy and focal loss are distribution-based loss functions. Due to the dent's shape being crucial for segmentation performance, a region-based solution is needed in this paper. DICE loss and Tversky loss are region-based loss functions. However, DICE loss does not apply any weighting. Tversky loss weighting technique ignores the object's shape. Therefore, it is not suitable for problems where object shapes are crucial. Consequently, researchers have experimented with various domain-specific loss functions to improve their results (Jadon, Citation2020). MRI and CT segmentation (Fang et al., Citation2023), thyroid tumour segmentation (Yang et al., Citation2023), lumbar spine segmentation (He et al., Citation2023), microscopy cell segmentation (Y. Zhu et al., Citation2023), point cloud segmentation (J. Zhang et al., Citation2023) are examples of studies that proposed new domain-specific loss functions. Although domain-specific loss functions differ in execution, they generally aim to add significance to imbalanced classes. However, tasks such as dent segmentation in vehicles where importance is unequal among pixels are also present. In this paper, the CPW-DICE loss function is proposed to address such problems. Furthermore, the proposed loss function is examined over the dent segmentation in cars task.

There is no definite shape or pattern of dents in cars. Therefore, situations such as the size of the impact area and the magnitude of the damage will lead to dissimilar instances of damaged regions. Additionally, due to the aggressive design of current vehicle models, parts of the vehicle may be perceived as dented. The mentioned problems negatively affect the performance of segmentation algorithms and lead to the generation of many false positives (FP). Furthermore, the deepest point of the dent is generally expected to occur in the central region, and the size of this region can express the extent of the damage. Consequently, the weight of correctly detecting pixels in dents decreases as it shies away from the centre. Moreover, the uncertain nature of the estimated object signified that the use of the IoU score (Rezatofighi et al., Citation2019) alone is inadequate for evaluating the task. Therefore, the predictions were analyzed individually alongside the IoU score for a better assessment.

The main principle of the proposed method is the weighting of the dented region. The commonly used loss functions in image segmentation are lacking in terms of pixel-wise weighting. In the border of dents, it is difficult to distinguish the damaged region and the background. Additionally, the central region of dents contain valuable information about the damage. Accordingly, this paper proposes a novel loss function that performs pixel-wise weighting for reducing FPs in borders of dent damages. In the proposed method, a weight matrix is first calculated to add importance to the central region. Subsequently, the calculated matrix is added to DICE and presented as a novel variant of DICE loss. The additional matrix weights the GT and prediction results obtained during training on a pixel basis.

The idea of the proposed method originated from the need for a pixel-wise weighting loss function. Originally the task was treated as a binary segmentation problem. Firstly, various CNN architectures were trained with popular segmentation loss functions. DeeplabV3+, LinkNet and U-Net were selected for their success in other domains. Furthermore, Binary cross-entropy and DICE loss were used as the loss functions. Although certain method combinations yielded better results than others, the occurrence and abundance of FPs couldn’t be solved with these existing techniques. The importance of FP reduction is exceptionally valuable in the insurance industry because high FP counts have a detrimental influence on the result of the damage cost calculation. The necessity of a method that is immune to generating many FPs and adding extra weight to critical regions inspired the creation of the CPW-DICE loss function. Results presented in Figure  show that CPW-DICE can produce more precise segmentations compared to other DICE loss-based contenders. The novel contribution of this paper is an adaptive loss function for objects with no well-defined edges. The pixel-wise weighting of the proposed method helps reduce FPs, expanding on prior weighting techniques. Contributions of this paper are as follows:

  • A novel loss function that performs pixel-wise weighting has been developed.

  • The proposed loss function improves the segmentation performance of ambiguous-shaped objects such as dents.

  • The proposed loss function is robust to FPs by readapting itself in each batch.

The remainder of the article is organised as follows. The “Related Work” section presents popular segmentation approaches and loss functions. Furthermore, similar work in the car damage analysis domain is presented. The “Methodology” section presents information about the datasets, augmentation techniques, the implemented segmentation models, training, and evaluation metrics. Finally, the “Results” are illustrated, followed by a “Discussion” section and a “Conclusion” section summarising the paper and presenting possible future work.

Related work

Segmentation techniques with deep learning

Due to their ability to extract and process more features, Deep Learning (DL) approaches recently surpassed classic segmentation techniques. Long et al. (Citation2015) propose the Fully Convolutional Networks (FCNs) to adapt image segmentation tasks into DL. With FCNs, a structure capable of generating a segmentation map with the same dimensions as the input image is presented. The approach attained SOTA performance when tested on benchmark datasets upon its publication. However, the limitation of not benefitting from shallow features deems the architecture insufficient due to the development of more complex DL techniques.

Following FCNs, combining the first 13 convolutional layers as encoder and the decoder layers corresponding to these layers, the SegNet architecture is proposed by Badrinarayanan et al. (Citation2016). Ronneberger et al. (Citation2015) proposed the U-Net architecture. The success of U-Net led to the use of the architecture in both medical and non-medical tasks and other variants of U-Net being proposed in the literature. Rearranging the skip pathways in the U-Net architecture Zhou et al. (Citation2018) proposed the U-Net++ architecture. Fare Garnot and Landrieu (Citation2021) implemented Spatio-temporal Encoding to add temporal attention to the U-Net architecture, allowing the process of sequenced images in parallel in the encoder step. Similar to the U-Net model, Chaurasia and Culurciello (Citation2017) proposed the LinkNet architecture. In LinkNet, the convolutional structure is replaced with Residual blocks, and the concatenations used in U-Net are replaced with additions. Using dilated convolutional layers, Chen et al. (Citation2017) proposed the DeepLabV3 architecture. Expanding on the DeepLabV3 architecture, Chen et al. (Citation2018) proposed DeepLabV3+, adding a simple but effective decoder module to improve the segmentation results across object boundaries.

Transformer architectures were first introduced in the natural language processing (NLP) domain by Vaswani et al. (Citation2017). Although the attention mechanism in transformers is suitable for NLP tasks, it cannot be adapted for images in the same way. Images can contain millions of pixels. Working between all pixels requires high computational power. In computer vision, attention mechanisms have been implemented through CNN blocks. Dosovitskiy et al. (Citation2021) proposed a Vision Transformer (ViT) independent from CNNs. In this work, the image is divided into 16 patches that are embedded into vectors. Similar to NLP, the embeddings are then fed to the transformer model as inputs. Zheng et al. (Citation2021) expand on the ViT architecture by adding new CNN-based decoder structures. W. Wang et al. (Citation2021) introduced the Pyramid Vision Transformer (PVT). The PVT adds a pyramid structure after the encoder step of ViT, allowing PVT to be ported into various dense prediction tasks. Gu et al. (Citation2022a) proposed HrViT, a structure that combines HrNet's (J. Wang, Sun, et al., Citation2020) ability to extract multi-scale high-resolution features with the VIT architecture. Liu et al. (Citation2021) applied hierarchical concatenation to the divided patches from ViT (4 × 4, 8 × 8, and 16 × 16) in the decoder phase. Furthermore, the SWIN transformer architecture is proposed by applying a Shifted Window to these patches. Regardless of their performance, employing a transformer model for computer vision tasks still requires considerable processing power and a large amount of data (Dosovitskiy et al., Citation2021).

Car damage analysis

Detection of vehicle damage can be a complex problem. One of the main problems in vehicle damage studies is the dataset. The lack of a publicly published dataset poses a challenge for future studies. Studies about car damage segmentation are provided in the literature. Patel et al. (Citation2020) attempted to segment damaged regions in a binary dataset (Damaged and undamaged images). However, there is no information about the types of damage. Parhizkar and Amirfakhrian (Citation2022) proposed a deep learning-based approach to vehicle part detection and damage segmentation in the detected parts. First, the features belonging to image textures are extracted with the Local Binary Pattern (LBP) (Ojala et al., Citation1996) and Local Directional Number (LDN) (Ramirez Rivera et al., Citation2012). Then, both parts and damages are estimated by LBP, LDN, and the original image. In the proposed approach, two different CNN structures are created for both part detection and damage segmentation. For part detection, 70 × 70 patches from LBP, LDN, and the original image are fed separately to the CNN for feature extraction. Then, the extracted features are flattened, concatenated, and fed into fully-connected layers. For damage segmentation, a similar process to part detection is followed. The only difference between CNN structures is the input dimensions. However, the authors did not compare their work with up-to-date methods to analyze the effectiveness of the proposed method. Balci et al. (Citation2019) studied the classification problem of damaged and undamaged vehicles. First, the SSD object detection model is used to detect the car's position. Then, deep features are extracted with Inceptionv3 and are fed into SVM for classification. van Ruitenbeek and Bhulai (Citation2022) developed a damage assessment model to detect and classify vehicle damages into twelve categories.

Li et al. (Citation2018) proposed a system to detect damages with YOLO to prevent fraud claims in the insurance area. Zhang et al. (Citation2020) proposed a study in the damage detection field by improving the backbone of the Mask RCNN architecture. The results show that the improved Mask RCNN outperformed the Mask RCNN architecture. Singh et al. (Citation2019) proposed an end-to-end damage insurance system. An ensemble of Mask RCNN and PaNet models is used to segment car parts. An improved result was reached by the ensemble model compared to individual models. Furthermore, another Mask RCNN model was trained for the segmentation of damaged areas. The results of car parts provided by the ensemble model are fed into a trained VGG16 model for the damage classification task. The parts predicted to be damaged are then joined with the damage Mask RCNN output to provide the final results.

Loss functions

Various loss functions have been proposed over time for image segmentation with DL. Jadon (Citation2020) proposed the Log-Cosh Dice loss. Log-Cosh Dice is a variant of DICE loss that implements the Log-Cosh approach for better optimisation. L. Wang et al. (Citation2020) proposed WS Dice for information mining from weighted background areas. Sudre et al. (Citation2017) proposed a robust and accurate deep-learning loss function named GDL (Generalised Dice Loss) for unbalanced tasks. DICE loss measures how well the prediction and GT overlap. It is a useful function for datasets that contain imbalance problems. However, DICE loss works by ignoring background zones. To address this issue, WS Dice is aimed to reserve background information while calculating the dice loss by weighting the background class. The paper explains that although negative images do not contain object regions, information loss is not desired since the background regions contain characteristic information. Due to the fact that our study mainly focused on dent damage, it was not necessary to add any weight to the background class. In addition, DICE Loss may not achieve optimum results due to its non-convex structure (Jadon, Citation2020). Thus, a loss function that incorporates the regression Log-Cosh approach in order to be applied to skewed datasets is presented. However, since the same problem with the dataset was not experienced in our study, it was decided that the aforementioned approach might not be suitable for our work. In our study, it is emphasised that even positive pixels can have different weights, and GDL, Log-Cosh, and WS Dice loss functions are all class-based.

Combining the DICE and Cross-entropy loss functions, Taghanaki et al. (Citation2021) proposed the Combo Loss to handle data imbalance problems. Wong et al. (Citation2018) proposed the Exponential Logarithmic Loss function, fusing the DICE and Cross-entropy loss functions to handle highly unbalanced object sizes in 3D segmentation. Yeung et al. (Citation2021) proposed a loss function that generalises various DICE and Cross-entropy-based loss functions to address the class imbalance problem. Xie et al. (Citation2023) proposed a balanced loss function to address issues of label imbalance, easy-hard example imbalance, and imbalance of label sizes for the Surface Defect segmentation. However, they specified that the proposed loss function is an area-specific loss function for surface defect segmentation and may not be suitable for every problem. Yuan and Xu (Citation2021) proposed a pixel-based weighting loss function that considers the neighbour pixels. In their implementation, they calculated the weight of the pixel by considering the eight neighbouring pixels. However, our developed method differs in terms of the weighting technique. Our weighting technique evaluates a wider area and considers the object's orientation.

Methodology

DICE loss

The DICE coefficient is a metric that is widely used in computer vision to evaluate segmentation performance. It measures how well the prediction and GT overlap. After being adapted as a loss function in 2016, it has been used frequently for segmentation tasks. A distinct feature of dice loss is its ability to handle imbalanced data. The binary class DICE loss is presented in Equation 1. (1) DICELOSS=1(2i=1nynpn)(i=1nyn+pn)(1) In the formula, y and p denote the GT and prediction, respectively, and n denotes the number of rows and columns for the n x n image. The expression after the minus sign represents the dice coefficient. It is calculated by dividing the intersection region by the union and multiplying it by two. The value of the presented mathematical operation is between zero and one. DICE loss is expressed as one minus dice coefficient. That means the result of the dice loss ranges from zero to one as well.

Proposed method

Dents in vehicles generally do not have a specific form. Furthermore, there is an imbalance problem between the background and the dented area. Depending on the nature of the dent, one section may have deeper damage than the others. Detection of the aforesaid areas can provide a general idea of the damage size. In addition, attempting to detect less deep regions may cause non-damaged pixels to be estimated as damaged. In order to alleviate these problems, CPW-DICE is proposed aiming to magnify the importance of the central region during training. First, a weight mask is created based on the GT masks of the images. For each dented region, weights are assigned to the weight mask with elliptical weights that increase from the contour of the dented area to the centre. Thus, focus on the centre of the labelled region is ensured.

A distinct advantage of CPW-DICE is that it reduces the weight of the outer area. Thus, decreasing the expected FP count. The steps for generating the weight mask are as follows:

  1. The number of ellipses to be drawn and the minimum weight which are the hyperparameters of the proposed loss function are determined experimentally.

  2. The contours of the labelled areas are calculated from the GT mask.

  3. For each calculated region, parameters (centre point, angle, x, and y axis lengths) to create an ellipse are computed surrounding the labelled region according to the orientation and size of the labelled region.

  4. The specified minimum weight value is assigned to the ellipse in the outermost region and the weight value is stably increased. The weight increase amount is calculated according to the determined number of ellipses and the initial weight value. The weights are limited between the minimum value determined in the outermost region and 1 in the central region.

  5. For the new ellipse to be drawn, the last ellipse drawn is reduced to the centre and the newly calculated weight value is assigned.

  6. Step 5 is repeated until the specified number of ellipses is reached.

The elliptical shape of the weights is chosen in consideration of the dent areas. In addition, care was taken to draw the ellipses according to the orientation in the formation of the ellipses. Thus, the weighting was done by considering the extent of the dent. The ratio of how much the ellipse will shrink as it gets closer to the centre is computed dynamically according to the x and y-axis lengths of the first generated ellipse. A new ellipse is created with the current calculated x and y-axis lengths, centre point, and angle. An illustration of the weight mask is shown in Figure .

Figure 1. Two examples of the GT mask, weight mask and the multiplication of GT and weight mask from left to right, respectively.

Figure 1. Two examples of the GT mask, weight mask and the multiplication of GT and weight mask from left to right, respectively.

After the weight mask is created, it is added as a new parameter to the DICE loss function to obtain the proposed loss function. The CPW-DICE loss function can be defined as: (2) CPWDICE=1(2i=1nynpnwn)(i=1nynwn+pnwn)(2) y and p denote GT and predictions, respectively. w parameter refers to the created weight mask. Weighting is done by multiplying the weights with both GT and predictions. However, since the weight mask is created according to the GT mask, all pixels except the calculated ellipse regions are weighted with 0. Consequently, when multiplied by predictions, it may result in unwanted FPs outside the ellipse regions to be multiplied by 0. To prevent this issue, FPs are assigned a value of 1 in the weight mask. In the training phase, after the weights are calculated according to the GT, the probabilities greater than a certain threshold value in the prediction mask are evaluated as FPs. Algorithm 1 further illustrates the equation given in Formula 2. (3) Weight Mask={1,y=0andpthOriginal Weight,y=1orp<th(3)

In the equation, y, p, and th represent GT, prediction, and threshold value, respectively. For the labelled areas in the GT mask or prediction probabilities that are less than the threshold, the value of the weight mask is valid. For the regions that do not have any objects in the GT mask and prediction probabilities that are greater than the threshold value, 1 is assigned.

The general training diagram of the deep learning model is shown in Figure (a). In order to calculate CPW-DICE, prediction, and GT are used in calculating the weight mask. The calculated weight mask, model prediction, and GT are then provided as input to CPW-DICE.

Figure 2. (a) Overall training diagram. (b) Weight mask calculation. The process of weight mask calculation in (a) has been detailed in (b).

Figure 2. (a) Overall training diagram. (b) Weight mask calculation. The process of weight mask calculation in (a) has been detailed in (b).

In Figure (b), the formation flow of the weight mask is shown by generating a random prediction. Firstly, the prediction is obtained from the DL model. Then, weights are created from the GT mask as described above in 6 steps. The final version of the weight mask is obtained by comparing the GT and predictions according to the weight mask (formula (3)). All probabilities in the prediction mask are accepted as 0 or 1. Thus, for the threshold value to be determined, all predictions other than GT will be evaluated as FPs.

During the training of the model, predictions will begin to give sharper results. In the example of FPs shown in Figure (b), they are expected to disappear in later epochs. In this case, the weight mask updates according to the FPs. The proposed loss function has a dynamic structure, which adapts according to the model results. Examples of combined weight masks and prediction results from training are given in Figure .

Figure 3. Changing of the weight mask and predictions in different epochs.

Figure 3. Changing of the weight mask and predictions in different epochs.

Databases

The proposed method has been tested on an internal car dent dataset. Furthermore, the method was tested with ISBDA (X. Zhu et al., Citation2020) and FLAME (Shamsoshoara et al., Citation2020) public datasets to evaluate usability. ISBDA is a dataset consisting of damaged buildings after disasters, and FLAME is a dataset constructed from forest fires. ISBDA and FLAME were deemed appropriate due to their vague label shapes.

Car damage dataset

In the first stage of our experiments, Anadolu Insurance’s internal vehicle data was employed. The dataset consists of 16,155 images of vehicles taken by clients and experts, after accidents. The average image resolution is 1024 × 768. The dataset was divided into three parts, 12,462 for training, 2449 for validation, and 1244 for testing. Masks are labelled as binary. Sample images of the dataset are shown in Figure .

Figure 4. Example images from the internal dataset.

Figure 4. Example images from the internal dataset.

ISBDA dataset

ISBDA dataset consists of ten videos, collected from social media platforms recording severe hurricanes and tornado disasters. Recordings were acquired by Hurricane Harvey in 2017, Hurricane Michael and Hurricane Florence in 2018, and three other hurricanes in 2017, 2018, and 2019 (X. Zhu et al., Citation2020).

The dataset consists of a total of 1029 images in 1920 × 1080 dimensions. The dataset was originally labelled with three classes: mild, severe, and residual. However, in our study, the classes were combined into a single damage class to handle the problem as a binary segmentation task. In the training phase, the dataset was split into three parts, 789 for training, 120 for validation, and 120 for testing.

FLAME dataset

The FLAME dataset consists of footage from pile-cut burning operations in a forest in Observatory Mesa (Shamsoshoara et al., Citation2020). The dataset consists of a total of 2003 images with 3840 × 2160 dimensions. In the training phase, the dataset was split into three parts, 1602 for training, 201 for validation, and 200 for testing.

Data augmentation

DL methods generally require a significant amount of data to be effective. Consequently, in our study, a set of data augmentation approaches were employed to improve robustness. Augmentation approaches employed during the training process are; random brightness, random gamma Gaussian noise, blurring, sharpening, horizontal flip, shift scale rotate, and random crop.

Results

Experimental setup

The configuration of the computer used for training and testing DL models are as follows: Intel Core i9-9900KF (3.60 GHz) processor, 64 GB DDR4 RAM, and single RTX 2080TI graphics card. Experiments have been conducted by utilising the Tensorflow framework.

Evaluation metrics

In our experiments, IoU (Rezatofighi et al., Citation2019) and DICE (Eelbode et al., Citation2020) were chosen as the main evaluation metrics. Furthermore, the following metrics which are used to evaluate the image-based performance are presented.

  • TP: The pixel belongs to a desired object and the model predicted it correctly

  • FP: The pixel does not belong to a desired object and the model has done a wrong prediction

  • FN: The pixel does not belong to a desired object and the model has done a wrong prediction

  • TN: The pixel does not belong to a desired object and the model has predicted correctly. Formulas of precision, recall, and f1 score are given in Equations 4–6, respectively. (4) Precision=TPTP+FP(4) (5) Recall=TPTP+FN(5) (6) F1Score=2PrecisionRecallPrecision+Recall(6)

Performance evaluation

The SOTA architectures U-Net, LinkNet, and DeepLab were chosen to conduct the evaluation. The selection was made based on the computational capacity of our hardware and the inference time expected (Krestenitis et al., Citation2019). The proposed loss function was tested on these three architectures, and the results were evaluated. First, a search was conducted to find the best-performing hyperparameters in the dataset. Accordingly, the three SOTA models were compared with base hyperparameters (ellipse count: 10, minimum weight: 0.1) to decide which was more successful. In the tests, U-Net was more successful than the others, and the ablation study was conducted by employing the U-Net architecture. The optimal value of the minimum weight and the total ellipse count is discussed, and the results are presented in Table .

Table 1. A comparison over the U-Net architecture of different ellipse counts and minimum weight threshold values used while creating the weight matrix.

As presented in Table , the optimal ellipse count is observed to be 15 while the optimal value of the minimum weight is 0.5. Moreover, a pattern was not observed between the parameters and the IoU or DICE results. Increasing the number of ellipses or the minimum weight value may decrease the performance.

The ellipse count, minimum weight value, and the prediction’s FP threshold were set to 15, 0.5, and 0.8, respectively, for the comparison of methods. These parameters were chosen according to the results of the ablation study. A score threshold value of 0.8 was set for all models in the inference phase. The results are listed in Table . Additionally, the training and validation graphs are provided in Figure .

Figure 5. Training and validation graphs. Epoch-loss plot on the right, Epoch-IoU plot on the left.

Figure 5. Training and validation graphs. Epoch-loss plot on the right, Epoch-IoU plot on the left.

Table 2. Comparison of DICE loss and CPW-DICE loss over different architectures.

A comparison of various metrics over different architectures. All models were trained with an input size of 480 × 480. The starting learning rate was initialised as 1e–4. Adam optimiser was used with epsilon 1e–07, beta_1 0.9, and beta_2 0.999 for the training of the models. A noticeable improvement in precision and IoU can be observed in all architectures. Furthermore, the proposed method preserves the recall scores leading to an increase in F1 scores as well. Additionally, Due to the extra processing requirements of CPW-DICE, the training time is also compared. The results show that CPW-DICE is capable of calculating the loss of a batch simultaneous to DICE loss. Considering both seconds per batch and IoU scores, U-Net was the superior model among these three architectures. Besides, in the visual comparison, it was observed that the proposed approach produced more accurate results than the DICE loss. The results are shown in Figure .

Figure 6. A comparison between loss functions

Figure 6. A comparison between loss functions

Comparison with other methods in different datasets

The impact of CPW-DICE was also compared with other DICE loss variants over different public datasets. In the training phase, the optimal values (15 ellipse count and 0.5 minimum weight value) were used for the comparison. The results are presented in Table .

Table 3. A comparison of DICE, Log-Cosh DICE, WS DICE and CPW-DICE over various datasets.

The proposed method produces a higher IoU score than other contenders in all datasets. Additionally, CPW-DICE outperforms all other loss functions in the F1 score as well. Obtained results showed that CPW-DICE is suitable for problems where the objective is decreasing the FPs while maintaining the true positive counts. Moreover, a comparison of U-Net models, trained with various loss functions is presented in Figure . The first example in Figure  shows that CPW-DICE provides tighter predictions than the other loss functions. Additionally, the proposed method does not compromise the true positive pixels. The second example shows that DICE loss tends to predict a larger mask than the GT. Conversely, CPW-DICE provides a solution to the aforementioned problem by making more accurate predictions compared to the GT masks. The Log-Cosh DICE and WS DICE provide a better prediction than normal DICE but are still insufficient compared to CPW-DICE. Furthermore, as shown in Figure ’s last three rows, CPW-DICE is also capable of making better predictions in contrast to other losses.

The proposed method is challenged in situations where the dent area has more than one deep centre point. This case proves difficult for CPW-DICE since the weighting process is conducted based on one centre point. An example of the aforementioned case is presented in Figure . The aforementioned problem will be further explored in our follow-up work.

Figure 7. An example of a multi-centre dent case.

Figure 7. An example of a multi-centre dent case.

Discussion

Damage assessment is a crucial part of the insurance field. Therefore, AI is being incorporated to provide efficient and accurate results. The problem of dent damage segmentation is one of the main pillars of damage assessment. Dent damages generally grow deeper as they reach the centre. Consequently, the importance of pixels near the centre is higher than that of outer pixels.

In this paper, we introduce CPW-DICE loss, a custom loss function specialised in unequal pixel importance segmentation tasks. The main advantage of CPW-DICE loss is that it adds weight to the centre region of the object in an elliptic manner, allowing for better results and reduced FP counts. CPW-DICE has proven itself to be beneficial in real-life scenarios, including but not limited to dent damage segmentation. AI-enhanced solutions require high accuracy to be cost-effective. CPW-DICE fulfils the prior mentioned objective by minimising FPs.

Furthermore, a comparison between CPW-DICE and conventional DICE loss is conducted. The main difference between CPW-DICE and DICE is pixel-based weighting as opposed to class-based weighting. Pixel-based weighting is advantageous in scenarios where the target object has no well-defined shape. During our experiments, CPW-DICE surpassed all contenders on multiple datasets in IoU and F1 score. The results are further illustrated in Table .

In the case of objects with more than one centre point, the proposed method is challenged when creating the weight mask of the object. An example of the aforementioned problem can be seen in car dents where the dented area has more than one deep centre. Segmentations generated on objects with multiple centre points are shown in Figure . Our future work aims to improve multi-centre object segmentation by assigning different classes to centre points.

Conclusion

Vehicle damage evaluation is crucial for applied AI in the insurance industry. This paper investigated the task of efficient dent segmentation in vehicles. Unfortunately, dent damage tends to blend in with the background, making the task naturally challenging. Valuable information is generally clustered in the centre of the dent, making pixels crowded around it secure high importance. It is difficult to pinpoint a loss function in the literature for this particular scenario. To address the shortage, we propose a novel centre-based and pixel-wise weighting loss function. The proposed loss function adds an additional weight parameter to the conventional DICE loss function to improve accuracy in challenging segmentation objectives.

During training, CPW-DICE weights pixels in GT by fitting an elliptical shape in the object's centre. Simultaneously, pixel importance is measured in an elliptical manner during the weighting process, depending on the distance from the centre. Finally, optimum values for two hyperparameters denoting the number of ellipses to fit and the minimum weight value are determined.

To demonstrate the capabilities of CPW-DICE, we compare it with conventional DICE between three SOTA segmentation approaches. During experiments, the precision score was chosen as the primary evaluation metric due to the cost of incorrectly estimating positive samples. Our results show that CPW-DICE outperforms conventional DICE loss in precision by 6%. In addition, CPW-DICE kept high recall scores, as shown in Table . Moreover, the ablation study concluded that the best-suited segmentation method for CPW-DICE was U-Net, achieving an IoU score of 0.69. Optimal custom hyperparameters for CPW-DICE were calculated as 15 ellipses and 0.5 minimum weight value.

Finally, we inspect examples in the test set visually. Segmentations illustrated in Figure  show that CPW-DICE is more precise than other loss functions. CPW-DICE is a loss function that aims to segment challenging objects efficiently. However, the current iteration of CPW-DICE is suited to work with a single centre point. Segmentations generated on objects with multiple centre points are shown in Figure . In the future, we aim to improve multi-centre object segmentation by assigning different classes to centre points.

Contributorship

Yunus Abdi: Co-conceptualization of the idea, Co-development of the idea, Data collection, Coding, Experiments, Results analysis, Initial draft version of the manuscript. Omer Küllü: Co conceptualization of the idea, Co-development of the idea, Supervision of the project, Finalizing the manuscript. Mehmet Kıvılcım Keleş: Supervision of the project. Finalizing the manuscript. Berk Gökberk: Assistance in the revision process both conceptually and linguistically.

Acknowledgements

Authors would like to thank Anadolu Insurance Company of Turkey for providing the data and equipment for this research.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The participants of this study did not give written consent for their data to be shared publicly, so due to the sensitive nature of the research supporting data is not available. As for the FLAME and ISBDA datasets, the datasets are available in Fire-Detection-UAV-Aerial-Image-Classification-Segmentation-UnmannedAerialVehicle and MSNET at FLAME and ISBDA, respectively.

Additional information

Funding

This work was supported by Anadolu Sigorta.

References

  • Badrinarayanan, V., Kendall, A., & Cipolla, R. (2016). SegNet: A deep convolutional encoder-decoder architecture for image segmentation (arXiv:1511.00561). arXiv. http://arxiv.org/abs/1511.00561
  • Balci, B., Artan, Y., Alkan, B., & Elihos, A. (2019). Fraud (p. 198). https://doi.org/10.5220/0007724600002179
  • Chaurasia, A., & Culurciello, E. (2017). Linknet: Exploiting encoder representations for efficient semantic segmentation. 2017 IEEE Visual Communications and Image Processing (VCIP), 1–4. https://doi.org/10.1109/VCIP.2017.8305148
  • Chen, L.-C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation (arXiv:1706.05587). arXiv. http://arxiv.org/abs/1706.05587
  • Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation (arXiv:1802.02611). arXiv. http://arxiv.org/abs/1802.02611
  • Dimri, A., Paul, A., Girish, D., Lee, P., Afra, S., & Jakubowski, A. (2022). A multi-input multi-label claims channeling system using insurance-based language models. Expert Systems with Applications, 202, 117166. https://doi.org/10.1016/j.eswa.2022.117166
  • Dong, H., & Wang, G. (2022). Disf: Dynamic instance segmentation with semantic features. In 2022 26th international conference on pattern recognition (ICPR) (pp. 3772–3778). https://doi.org/10.1109/ICPR56361.2022.9956531
  • Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An image is worth 16 (16 words: Transformers for image recognition at scale (arXiv:2010.11929). arXiv. http://arxiv.org/abs/2010.11929
  • Eelbode, T., Bertels, J., Berman, M., Vandermeulen, D., Maes, F., Bisschops, R., & Blaschko, M. B. (2020). Optimization for medical image segmentation: Theory and practice when evaluating with dice score or jaccard index. IEEE Transactions on Medical Imaging, 39(11), 3679–3690. https://doi.org/10.1109/TMI.2020.3002417
  • Fang, X., Xu, X., Xia, J. J., Sanford, T., Turkbey, B., Xu, S., Wood, B. J., & Yan, P. (2023). Shape description losses for medical image segmentation. Machine Vision and Applications, 34(4), 57. https://doi.org/10.1007/s00138-023-01407-0
  • Fare Garnot, V. S., & Landrieu, L. (2021). Panoptic segmentation of satellite image time series with convolutional temporal attention networks. In 2021 IEEE/CVF International conference on computer vision (ICCV) (pp. 4852–4861). https://doi.org/10.1109/ICCV48922.2021.00483
  • Gu, J., Kwon, H., Wang, D., Ye, W., Li, M., Chen, Y.-H., Lai, L., Chandra, V., & Pan, D. Z. (2022a). Multi-scale high-resolution vision transformer for semantic segmentation. In 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 12084–12093). https://doi.org/10.1109/CVPR52688.2022.01178.
  • Gu, W., Bai, S., & Kong, L. (2022b). A review on 2D instance segmentation based on deep neural networks. Image and Vision Computing, 120, 104401. http://dx.doi.org/10.1016/j.imavis.2022.104401
  • Guha, A., Alahmadi, A., Samanta, D., Khan, M. Z., & Alahmadi, A. H. (2022). A multi-modal approach to digital document stream segmentation for title insurance domain. IEEE Access, 10, 11341–11353. https://doi.org/10.1109/ACCESS.2022.3144185
  • Hatamizadeh, A., Tang, Y., Nath, V., Yang, D., Myronenko, A., Landman, B., Roth, H. R., & Xu, D. (2022). UNETR: Transformers for 3D medical image segmentation. In 2022 IEEE/CVF winter conference on applications of computer vision (WACV) (pp. 1748–1758). https://doi.org/10.1109/WACV51458.2022.00181.
  • He, S., Li, Q., Li, X., & Zhang, M. (2023). An optimized segmentation convolutional neural network with dynamic energy loss function for 3D reconstruction of lumbar spine MR images. Computers in Biology and Medicine, 160, 106839. https://doi.org/10.1016/j.compbiomed.2023.106839
  • Huang, Z., Wang, X., Wei, Y., Huang, L., Shi, H., Liu, W., & Huang, T. S. (2023). Ccnet: Criss-cross attention for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(6), 6896–6908. https://doi.org/10.1109/TPAMI.2020.3007032
  • Jadon, S. (2020). A survey of loss functions for semantic segmentation. In 2020 IEEE conference on computational intelligence in bioinformatics and computational biology (CIBCB) (pp. 1–7). https://doi.org/10.1109/CIBCB48159.2020.9277638
  • Krestenitis, M., Orfanidis, G., Ioannidis, K., Avgerinakis, K., Vrochidis, S., & Kompatsiaris, I. (2019). Oil spill identification from satellite images using deep neural networks. Remote Sensing, 11(15), 1762. https://doi.org/10.3390/rs11151762
  • Li, P., Shen, B., & Dong, W. (2018). An anti-fraud system for car insurance claim based on visual evidence (arXiv:1804.11207). arXiv. http://arxiv.org/abs/1804.11207
  • Li, X., & Chen, D. (2022). A survey on deep learning-based panoptic segmentation. Digital Signal Processing, 120, 103283. https://doi.org/10.1016/j.dsp.2021.103283
  • Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2018). Focal loss for dense object detection (arXiv:1708.02002). arXiv. http://arxiv.org/abs/1708.02002
  • Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows (arXiv:2103.14030). arXiv. http://arxiv.org/abs/2103.14030
  • Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation (arXiv:1411.4038). arXiv. http://arxiv.org/abs/1411.4038
  • Lu, J., Lin, K., Chen, R., Lin, M., Chen, X., & Lu, P. (2023). Health insurance fraud detection by using an attributed heterogeneous information network with a hierarchical attention mechanism. BMC Medical Informatics and Decision Making, 23(1), 62. https://doi.org/10.1186/s12911-023-02152-0
  • Mallios, D., Xiaofei, L., McLaughlin, N., Rincon, J. M. D., Galbraith, C., & Garland, R. (2023). Vehicle damage severity estimation for insurance operations using in-the-wild mobile images. IEEE Access, 11, 78644–78655. https://doi.org/10.1109/ACCESS.2023.3299223
  • Mao, L., Ren, F., Yang, D., & Zhang, R. (2023). Chainnet: Deep chain instance segmentation network for panoptic segmentation. Neural Processing Letters, 55(1), 615–630. https://doi.org/10.1007/s11063-022-10899-2
  • Minaee, S., Boykov, Y. Y., Porikli, F., Plaza, A. J., Kehtarnavaz, N., & Terzopoulos, D. (2021). Image segmentation using deep learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–1. https://doi.org/10.1109/TPAMI.2021.3059968
  • Mohan, R., & Valada, A. (2021). Efficientps: Efficient panoptic segmentation. International Journal of Computer Vision, 129(5), 1551–1579. https://doi.org/10.1007/s11263-021-01445-z
  • Ojala, T., Pietikäinen, M., & Harwood, D. (1996). A comparative study of texture measures with classification based on featured distributions. Pattern Recognition, 29(1), 51–59. https://doi.org/10.1016/0031-3203(95)00067-4
  • Parhizkar, M., & Amirfakhrian, M. (2022). Car detection and damage segmentation in the real scene using a deep learning approach. International Journal of Intelligent Robotics and Applications, 6(2), 231–245. https://doi.org/10.1007/s41315-022-00231-5
  • Pasupa, K., Kittiworapanya, P., Hongngern, N., & Woraratpanya, K. (2022). Evaluation of deep learning algorithms for semantic segmentation of car parts. Complex & Intelligent Systems, 8(5), 3613–3625. https://doi.org/10.1007/s40747-021-00397-8
  • Patel, N., Shinde, S., & Poly, F. (2020). Automated damage detection in operational vehicles using mask R-CNN (pp. 563–571). https://doi.org/10.1007/978-981-15-3242-9_54
  • Patil, K., Kulkarni, M., Sriraman, A., & Karande, S. (2017). Deep learning based Car damage classification. In 2017 16th IEEE international conference on machine learning and applications (ICMLA) (pp. 50–54). https://doi.org/10.1109/ICMLA.2017.0-179
  • Ramirez Rivera, A., Castillo, J., & Chae, O. (2012). Local directional number pattern for face analysis: Face and expression recognition. IEEE Transactions on Image Processing: A Publication of the IEEE Signal Processing Society, 22, 1740–1752. https://doi.org/10.1109/TIP.2012.2235848
  • Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., & Savarese, S. (2019). Generalized intersection over union: A metric and a loss for bounding box regression (arXiv:1902.09630). arXiv. http://arxiv.org/abs/1902.09630
  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In N. Navab, J. Hornegger, W. M. Wells, & A. F. Frangi (Eds.), Medical image computing and computer-assisted intervention – MICCAI 2015 (Vol. 9351, pp. 234–241). Springer International Publishing. https://doi.org/10.1007/978-3-319-24574-4_28
  • Salehi, S. S. M., Erdogmus, D., & Gholipour, A. (2017). Tversky loss function for image segmentation using 3D fully convolutional deep networks (arXiv:1706.05721). arXiv. http://arxiv.org/abs/1706.05721
  • Shamsoshoara, A., Afghah, F., Razi, A., Zheng, L., Fulé, P. Z., & Blasch, E. (2020). Aerial imagery pile burn detection using deep learning: The FLAME dataset (arXiv:2012.14036). arXiv. http://arxiv.org/abs/2012.14036
  • Singh, R., Ayyar, M. P., Sri Pavan, T. V., Gosain, S., & Shah, R. R. (2019). Automating car insurance claims using deep learning techniques. In 2019 IEEE fifth international conference on multimedia big data (BigMM) (pp. 199–207). https://doi.org/10.1109/BigMM.2019.00-25
  • Sudre, C. H., Li, W., Vercauteren, T., Ourselin, S., & Cardoso, M. J. (2017). Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations (Vol. 10553, pp. 240–248). https://doi.org/10.1007/978-3-319-67558-9_28
  • Taghanaki, S. A., Zheng, Y., Zhou, S. K., Georgescu, B., Sharma, P., Xu, D., Comaniciu, D., & Hamarneh, G. (2021). Combo loss: Handling input and output imbalance in multi-organ segmentation (arXiv:1805.02798). arXiv. http://arxiv.org/abs/1805.02798
  • van Ruitenbeek, R. E., & Bhulai, S. (2022). Convolutional neural networks for vehicle damage detection. Machine Learning with Applications, 9, 100332. https://doi.org/10.1016/j.mlwa.2022.100332
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need (arXiv:1706.03762). arXiv. http://arxiv.org/abs/1706.03762
  • Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., Liu, W., & Xiao, B. (2020). Deep high-resolution representation learning for visual recognition (arXiv:1908.07919). arXiv. http://arxiv.org/abs/1908.07919
  • Wang, L., Li, R., Zhang, C., Fang, S., Duan, C., Meng, X., & Atkinson, P. M. (2022). UNetformer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 190, 196–214. https://doi.org/10.1016/j.isprsjprs.2022.06.008
  • Wang, L., Wang, C., Sun, Z., & Chen, S. (2020). An Improved Dice Loss for Pneumothorax Segmentation by Mining the Information of Negative Areas. IEEE Access, 8, 167939–167949. http://dx.doi.org/10.1109/Access.6287639
  • Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D., Lu, T., Luo, P., & Shao, L. (2021). Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 548–558. https://doi.org/10.1109/ICCV48922.2021.00061
  • Waqas, U., Akram, N., Kim, S., Lee, D., & Jeon, J. (2020). Vehicle damage classification and fraudulent image detection including moiré effect using deep learning. In 2020 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE) (pp. 1–5). https://doi.org/10.1109/CCECE47787.2020.9255806
  • Wong, K. C. L., Moradi, M., Tang, H., & Syeda-Mahmood, T. (2018). 3D segmentation with exponential logarithmic loss for highly unbalanced object sizes (Vol. 11072, pp. 612–619). https://doi.org/10.1007/978-3-030-00931-1_70
  • Xie, Z., Shu, C., Fu, Y., Zhou, J., & Chen, D. (2023). Balanced loss function for accurate surface defect segmentation. Applied Sciences, 13(2), 826. https://doi.org/10.3390/app13020826
  • Yang, D., Li, Y., & Yu, J. (2023). Multi-task thyroid tumor segmentation based on the joint loss function. Biomedical Signal Processing and Control, 79, 104249. https://doi.org/10.1016/j.bspc.2022.104249
  • Yeung, M., Sala, E., Schönlieb, C.-B., & Rundo, L. (2021). Unified focal loss: Generalising dice and cross entropy-based losses to handle class imbalanced medical image segmentation (arXiv:2102.04525). arXiv. http://arxiv.org/abs/2102.04525
  • Yi-de, M., Qing, L., & Qian, Z. (2004). Automated image segmentation using improved PCNN model based on cross-entropy. Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004, 743–746. https://doi.org/10.1109/ISIMP.2004.1434171
  • Yuan, W., & Xu, W. (2021). Neighborloss: A loss function considering spatial correlation for semantic segmentation of remote sensing image. IEEE Access, 9, 75641–75649. https://doi.org/10.1109/ACCESS.2021.3082076
  • Zhang, J., Jiang, H., Shao, H., Song, Q., Wang, X., & Zong, D. (2023). Semantic segmentation of in-vehicle point cloud with improved RANGENET++ loss function. IEEE Access, 11, 8569–8580. https://doi.org/10.1109/ACCESS.2023.3238415
  • Zhang, Q., Chang, X., & Bian, S. B. (2020). Vehicle-damage-detection segmentation algorithm based on improved mask RCNN. IEEE Access, 8, 6997–7004. https://doi.org/10.1109/ACCESS.2020.2964055
  • Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., Torr, P. H. S., & Zhang, L. (2021). Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers (arXiv:2012.15840). arXiv. http://arxiv.org/abs/2012.15840
  • Zhou, Z., Siddiquee, M. M. R., Tajbakhsh, N., & Liang, J. (2018). UNet++: A nested U-net architecture for medical image segmentation (arXiv:1807.10165). arXiv. http://arxiv.org/abs/1807.10165
  • Zhu, X., Liang, J., & Hauptmann, A. (2020). MSNet: A multilevel instance segmentation network for natural disaster damage assessment in aerial videos (arXiv:2006.16479). arXiv. http://arxiv.org/abs/2006.16479
  • Zhu, Y., Yin, X., & Meijering, E. (2023). A compound loss function with shape aware weight map for microscopy cell segmentation. IEEE Transactions on Medical Imaging, 42(5), 1278–1288. https://doi.org/10.1109/TMI.2022.3226226