Full article: Uni-temporal Sentinel-2 imagery for wildfire detection using deep learning semantic segmentation models

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Wildfires are common disasters that have long-lasting climate effects and serious ecological, social, and economic effects due to climate change. Since Earth observation (EO) satellites were launched into space, remote sensing (RS) has become a more efficient technique that can be used in agriculture, environmental protection, geological exploration, and wildfires. The increasing number of EO satellites orbiting the earth provides huge amounts of data, such as Sentinel-2 with its Multi Spectral Instrument (MSI) sensor. Using uni-temporal Sentinel-2 imagery, we proposed a workflow based on deep learning (DL) semantic segmentation models to detect wildfires. In particular, we created a new big wildfire dataset suitable for semantic segmentation models. We tested our dataset using DL models such as U-Net, LinkNet, DeepLabV3+, U-Net++, and Attention ResU-Net. The results are analysed and compared in terms of the F1 score, the intersection over union (IoU) score, the precision and recall metrics, and the amount of training time each model possesses. The best results were achieved using U-Net with the ResNet50 encoder, with F1-score of 98.78% and IoU of 97.38%, and we developed it into a pre-trained DL Package (DLPK) model that is able to detect and monitor the wildfire from Sentinel-2 images automatically.

Keywords:

Introduction

Wildfire seasons have become more widespread due to climate change, resulting in new dynamic scenarios. Knowing where, how big, and how often wildfires occur is vital to managing emergency response activities, determining economic and ecological damage, and assessing recovery. Since Earth observation (EO) satellites were launched into space, remote sensing has become the best way to monitor the timely coverage of wildfires locally and globally (Chuvieco et al. Citation2019). The European Commission’s new data policy, established in partnership with the European Space Agency (ESA), offers unrestricted access to high-resolution, multitemporal, and multispectral data collected by the Sentinel-2 satellites (Drusch et al. Citation2012). On a global scale, satellite imagery offers valuable data about the earth with minimum cost and time. In addition to better data quality, the dataset’s temporal resolution has significantly improved with the Sentinel-2 satellite imagery series (Drusch et al. Citation2012). As a result, remote sensing has been used in many applications, including wildfire monitoring (Wang et al. Citation2022), flood mapping (Kalantar et al. Citation2021), and damage mapping (ElGharbawi and Zarzoura Citation2021). High-resolution datasets have spawned numerous methods for wildfire mapping in the past few years (Barboza Castillo et al. Citation2020). These methods concentrate primarily on change detection by generating curated features. Detection of wildfires using methods based on deep learning (DL) has been considered and has become a trending topic (Zhao et al. Citation2022).

The main objectives of this work are:

Creating a big dataset for Turkey’s wildfires using Sentinel-2 multiband images suitable for DL semantic segmentation models.
Conducting a series of experiments to determine the best loss function that is suitable for DL models to test our dataset.
Testing our dataset with a series of experimental semantic segmentation models.
Developing a DL model package (DLPK) that can detect any wildfire from Sentinel-2 imagery to support decision-making.

Related work

Since EO satellites were launched into space, remote sensing (RS) has become a more efficient technique that can be used in agriculture, environmental protection, geological exploration, and wildfires. Spectral indices are the most commonly used approaches in remote sensing to characterize wildfire and burn severity (Key and Benson Citation2006), spectral unmixing, and models of radiative transfer (Chuvieco et al. Citation2007), which are used with multispectral or hyperspectral data. Wildfires can be detected by sensors such as the MSI on the Sentinel-2 satellite, which has multispectral bands with band02 (visible), band08 (Near-Infrared [NIR]), and band12 (short-wave infrared (SWIR)). For instance, burnt areas absorb more NIR radiation than unburnt areas, whereas burnt areas reflect more radiation in the visible and SWIR bands (Quintano et al. Citation2011). Therefore, many spectral indices have been proposed to detect burnt areas, such as Normalized Burn Ratio (NBR) (Key and Benson Citation2006), Relative Differenced NBR (RdNBR) (Cardil et al. Citation2019), and Burned Area Index for Sentinel 2 (BAIS2) (Filipponi Citation2018). It is possible to differentially normalize these indices when using pre-wildfire and post-wildfire satellite images to differentiate burned areas. However, these methods require cloud cover-free satellite images. The threshold levels are often set based on their appearance, the types of land, and the amount of tree cover (Loboda et al. Citation2007). The problem with these methods is that they do not work well with different weather conditions when satellite images are taken. Also, using indices to detect the damage usually requires manual or semi-manual methods, setting thresholds that depend on the soil type and cannot be easily set. Nonetheless, frequent issues must be resolved when using satellite imagery for wildfire mapping, such as atmosphere opacity, which is a common issue (Nolde et al. Citation2020). The existence of fire smoke and clouds makes it impossible to observe the areas that have been burned, and the shadows cast by clouds may even lead to false detections. In addition, several problems arise due to the features of the sensors. For instance, it has been noted that sensors with a coarse resolution tend to underestimate the size of burned areas, mostly when the fires are small and sporadic (Chuvieco et al. Citation2019). Typically, burned areas do not cover every pixel in an area; therefore, they are combined with other land cover types into a single pixel in terms of spatial and spectral aggregation (Laris Citation2005). We used the Sentinel-2 satellite imagery, which has a high spatial resolution (10 m) compared with other satellite imagery such as Landsat (30 m) and MODIS (250–500 m), which can discover more area details, and the wildfire border could be detected easily. Still, Landsat and MODIS satellite images have the advantage of the thermal bands that are important for measuring land surface temperature.

DL algorithms are able to automatically detect object properties of various sizes without requiring additional human input for certain hyper-parameters (Reichstein et al. Citation2019). When using convolutional neural networks (CNN) for semantic segmentation models, spatial information must be preserved to identify each image pixel. Autonomous cars, social contact, robotic systems, health research, and precision farming (Hu et al. Citation2021) are all examples of image-processing activities that are becoming critical. 2D and 3D satellite images scenes have also been subjected to semantic segmentation algorithms (Ma et al. Citation2019). Fully convolutional networks were trained with high-resolution (VHR) optical satellite imagery using Sentinel-2 and SAR data (Wurm et al. Citation2019). Using EO images with a low spatial resolution, the DL model was able to map the burnt area (Pinto et al. Citation2020). Deep convolutional autoencoders (U-Net and ResUnet) were used on uni-temporal Landsat images and proposed a sample window size of 256 × 256 pixels in the DL model training (Langford et al. Citation2018). Sentinel-2 satellite images have been widely used in a variety of remote sensing applications, including a method of cloud masking (Kristollari and Karathanassi Citation2020), detecting urban changes (Papadomanolaki et al. Citation2019), land use and land cover classification (Helber et al. Citation2019), marine debris detection (Kikaki et al. Citation2022), human settlement mapping (Corbane et al. Citation2021), dam detection (Balaniuk et al. Citation2020), smoke classification (Wang et al. Citation2022), and wildfire detection and monitoring (Alencar et al. Citation2022; Seydi et al. Citation2022). The availability of training data is a significant obstacle in developing a DL segmentation model for the burned area. DL data analysis uses algorithms to improve itself continuously, but data quality is required for these models to work efficiently. The accuracy of the data needed to solve a specific problem, known as features, is critical to the learning outcome for problem-solving. Automatic, semi-automatic, and manual (human intervention) methods are available for creating datasets. However, there is a problem with the precision of the automatically generated creations. Even if many models are available for automatic creation, they cannot achieve a high level of accuracy. As a result, we created a manual dataset for Turkey’s wildfires in this work.

The main contributions of this work are:

Proposing a complete workflow to detect and monitor wildfires.
Creating a manual big dataset of Turkey’s wildfires using Sentinel-2 multiband images suitable for DL semantic segmentation models.
Conducting a series of experiments to evaluate the efficiency of DL semantic segmentation models for monitor and detect wildfires.
Developing a pre-trained DLPK that can detect wildfire from Sentinel-2 imagery.

Dataset

In this work, our dataset consists of two parts: images and masks. gives a general overview of the dataset’s specifications and will be explained in more detail in each subsection.

Table 1. Dataset description.

Download CSV Display Table

Sentinel-2 multispectral data

Sentinel-2 is a wide-swath, high-resolution, multispectral imaging mission supporting land monitoring studies. Sentinel-2 is based on two identical satellites (Sentinel-2A and Sentinel-2B) that move in a sun-synchronous orbit with an average altitude of 786 km. Sentinel-2A was launched in 2015 with the MSI sensor. Sentinel-2A has two levels of processing (level-1C and level-2A). The top of atmosphere (TOA) reflectance measurements and the parameters for converting them into radiances are included in the level-1C product and multispectral registration at the sub-pixel level. A subpixel multispectral registration is provided by level-2A, which provides orthorectified bottom-of-atmosphere reflectance (Gascon et al. Citation2017).

Sentinel-2 imagery for wildfire

In the solar domain, the spectral wavelength ranges from 0.4 to 2.5 μm, which includes visible light (red, green, and blue), NIR, and SWIR. Numerous studies have shown that the NIR and SWIR spectral bands are more susceptible to fire-induced changes in vegetation and soil, while visible bands are less sensitive to fire effects (Roy et al. Citation2019), as shown in . The reduction in moisture leads to an increase in SWIR reflectance. In contrast, the reduction in leaf area index and chlorophyll leads to a decrease in the NIR reflectance after burning (Chuvieco et al. Citation2019), and thus they are well-suited to being used for false-colour images that emphasize wildfire areas.

Figure 1. Sentinel-2 images for monitoring wildfires (a) true colour and (b) false colour.

Preparing dataset

Study area

In July and August 2021, more than 200 forest fires burned 1700 square kilometres in the Mediterranean Region of Turkey (Kiliçaslan Citation2022), in the country’s worst wildfire season in history. On 28 July 2021, wildfires broke out in Manavgat, Antalya Province, with a temperature of approximately 37 °C (99 °F). As a result of the fires’ impact on forests and residential areas, several neighbourhoods and villages have been evacuated. According to the most recent data from the Disaster and Emergency Management Presidency (AFAD) (Turkish Red Crescent Society https://www.kizilay.org.tr), many animals have died due to the wildfires. On 31 July 2021, Sentinel-2 took images of wildfires near the coastal towns of Alanya and Manavgat (Eke et al. Citation2022).

Preparing Sentinel-2 images

The Sentinel-2 images were obtained from the official website of the United States Geological Survey (USGS) (USGS. Sentinel-2 Missions: Sentinel-2 Levels of Processing. Available online: https://earthexplorer.usgs.gov/). The scenes were selected by checking one by one from Sentinel-2 processing level 1 C imagery, to select images that were taken when the affected area had less than 1% cloud cover, and the pre-processing for each image was done using SNAP software. As shown in , five images were selected in September 2021 from several provinces in Turkey with various climatic zones and ecoregions backgrounds, such as grasslands, settlements, water bodies, and shrubs. shows the study area locations.

Figure 2. Turkey’s provinces location.

Table 2. The information of S-2 satellite images for study area.

Download CSV Display Table

Wildfire labelling

The wildfire polygons were labelled using ArcGIS Pro 3.1 software. The labelling process was done manually on Sentinel-2 images using band-12 SWIR, band-8 NIR, and band-2 Blue, the wildfires appear in red colour. The labelling process was performed by polygons that were drawn as vector data along the border of wildfires without drawing any class of the background, such as grasslands, settlements, water bodies, and shrubs. In the final, a small patch of size 128 × 128 from the Sentinel-2 multiband images and the label data were extracted for the dataset and sorted by removing any empty or broken patches.

Sentinel-2 Turkey’s wildfire dataset

In our wildfire dataset, each image has a spatial resolution of 10 m and consists of thirteen bands. The image is saved using the Universal Transverse Mercator (UTM) as the coordinate system and GeoTiff as a format file. The dataset has 21,690 images containing the burned area’s pixels. The mask is a binary image of the burned area that consists of two categories: the burned area in the foreground and the non-burned area in the background. The values of each pixel are saved in an 8-bit unsigned integer with a value of 1 for the burned area and 0 for the non-burned area. and show the distribution numbers of images and masks depending on wildfire areas, and depicts the dataset’s image and mask samples.

Figure 3. Samples of four images and masks of the dataset, (a, b, c, d) showing the images in false-colour composite (Red = B12, Green = B08, Blue = B02), (e, f, g, h) showing the binary mask of wildfire with red colour and black.

Table 3. Distribution of dataset images.

Download CSV Display Table

Table 4. Distribution of dataset masks.

Download CSV Display Table

Materials and methods

A complete workflow for wildfire mapping using Sentinel-2 multi-spectral data is demonstrated from the DL perspective, as shown in . Training data from Sentinel-2 is fed into these DL models (U-Net, LinkNet, Attention ResU-Net, U-Net++, and DeepLabV3+). The best-trained DL model will be developed into a (DLPK) file to be used with ArcGIS Pro software for detecting the wildfire automatically.

Figure 4. Workflow for wildfire mapping.

shows five important models, LinkNet with a changing encoder, U-Net++, U-Net with a changing encoder, DeepLabV3+, and Attention ResU-Net, which are compared in wildfire detection.

Table 5. Deep learning models parameter.

Download CSV Display Table

U-Net model architecture

The U-Net (Ronneberger et al. Citation2015) architecture, as shown in , is a U-shaped network architecture that consists of contracting (encoder) paths on the left side and expansive (decoder) paths on the right side, with each path consisting of four blocks connected via a bridge. The U-Net model is an example of a typical model based on upsampling and deconvolution. The encoder follows the typical architecture of a convolutional network, consisting of the repeated application of two 3*3 convolutional blocks, each followed by a rectified linear unit (ReLU) and a 2*2 max pooling operation with stride 1 to down-sample the input. The decoder consists of upsampling layers and Conv (convolution). The decoder up-samples the output from the encoder and regenerates it to the input image size. The decoder path in each step consists of an upsampling of the feature map followed by a 2*2 Convs (convolution) that halves the number of feature channels, a concatenation with the correspondingly cropped feature map from the contracting path, and two 3*3 Convs, each followed by a rectified linear unit (ReLU). Each block will concatenate feature maps from the encoder part corresponding to the encoder. The 1*1 Conv (convolution) final layer results from calculating the probability of a burned area pixel.

Figure 5. U-Net model architecture.

LinkNet model architecture

The LinkNet architecture (Chaurasia and Culurciello Citation2017) is a network that uses an encoder-decoder structure to focus on fast prediction. The LinkNet architecture is a U-shaped version with two differences from the U-Net architecture. First, it uses a residual module (res-block) instead of the ordinary convolution structure that U-Net used. Second, the synthesis of deep and shallow features by adding instead of stacking that U-Net used. This LinkNet can guarantee the network’s high accuracy and efficient forward propagation. The encoder part of LinkNet can be changed into ResNet with different depths and representations. So, the number of layers on the encoder can be changed to measure operational accuracy and efficiency. ResNet18 is the encoder for the LinkNet architecture, one of the lightest Res-Nets. As shown in , Using the abbreviation ‘conv’ to refer to convolution and ‘full-conv’ to refer to full convolution (Long et al. Citation2015). In addition, the notation/2 indicates the downsampling of a signal by a factor of 2, which can be done by performing stridden convolution, and the notation *2 means an upsampling by a factor of 2. Between each convolutional layer, batch normalization was used, followed by ReLU non-linearity (Ioffe and Szegedy Citation2015). In , the encoder is on the left side of the network, and the decoder is on the right side. As shown in , the encoder begins by convolution on the input image with a stride of 2 and a kernel size of 7 × 7. Spatial max-pooling in a 3 × 3 area is performed using a 2-step stride.

Figure 6. LinkNet model architecture.

DeepLabV3+ model architecture

The DeepLab family of network architectures is a series of incremental improvements upon a first architecture called DeepLab, first published in 2014 (Chen et al. Citation2014), followed by its second version in 2017 (Chen, Papandreou, Schroff, et al. Citation2017), and its third declination in 2017 (Chen, Papandreou, Kokkinos, et al. Citation2017). The latest flavour of DeepLab, called DeepLabV3+, arrived in 2018 (Chen et al. Citation2018). DeepLabV3+, as shown in , uses an encoder-decoder to combine information from different scales. It also keeps the atrous spatial pyramid pooling (ASPP) layers used in earlier series versions. DeeplabV3+ is a new version of DeeplabV3 that includes a decoder module that is both straightforward and efficient. This module helps enhance the segmentation results, particularly at the object boundaries. The encoder and the decoder are both components of this model. The primary functions of the encoder are the extraction of features and the reduction of the dimensionality of the feature map. The decoder’s primary purpose is to recover the edge information and resolution of the feature map to acquire the results of the semantic segmentation process. The convolution process of the encoder’s final few convolutional layers is changed to hole convolution to expand the receptive field, and at the same time, the resolution of the feature map can be preserved.

Figure 7. DeepLabV3+ model architecture.

Attention ResU-Net model architecture

The Attention ResU-Net (Zhang et al. Citation2020) is a segmentation model that developed the U-Net architecture model by constructing attention mechanisms and residual blocks into the architecture. This makes the model suited for stabilizing images with a limited dataset. The Attention ResU-Net model uses a 2D neural network convolutional training method to obtain multimodal classification at the pixel level. The input of sentinel-2 images for the network is 128*128*3, and the output is 128*128*1. As shown in , the model architecture has a classical encoder-decoder structure designed to combine shallow and deep features to maintain high-level information in the deep layers. The encoder part (feature learning) contains multiple residual blocks and a bottom convolutional layer equipped with a dropout function to acquire features of a high-level of context semantic information. The decoder part (feature recovery) utilizes three similar upsampling residual blocks (SRes) in order to achieve accurate positioning and feature recovery. To acquire information on both a low level and a high level that is more comprehensive, the attention and squeeze excitation block (ASE) has been included as a connection horizontally, which improves the way that downsampling features and upsampling information are represented as feature information. In the end, a SoftMax layer was integrated to generate the result of model segmentation.

Figure 8. Attention ResU-Net model architecture (a) feature learning module, (b) contextual fusion module, and (c) feature recovery module.

U-Net++ model architecture

U-Net++ (Zhou et al. Citation2018), also known as Nested U-Net, is an extension of the U-Net architecture to improve segmentation accuracy. As shown in -Net++ with nested dense skip pathways is an excellent way to get multi-scale feature maps from multi-level convolution pathways. The standard U-Net++ architecture consists of downsampling and upsampling modules, convolution units, and skip connections between convolution units. The main difference between U-Net++ and U-Net architectures is the skip pathways in U-Net++, which uses the dense connection method (Huang et al. Citation2017). In the U-Net architecture, the encoder’s feature maps are sent directly to the decoder, while in the U-Net++ architecture, the feature maps are sent via a dense convolution block whose number of convolution layers depends on the pyramid level.

Figure 9. U-Net++ model architecture.

DLPK model

A DLPK is a pre-trained model that is used for image classification and object detection. The DLPK model has many extensions depending on the framework that is used for the trained model, such as (.h5) for Keras and (.pb) for TensorFlow, and can be saved locally or stored on a portal. The DLPK model is able to detect wildfire from Sentinel-2 images automatically using standard DL environments or any GIS software that deals with DL models. In this work, the DLPK model will be pre-trained using Keras and based on the model that will achieve the highest results on our dataset.

DLPK model data

In order to evaluate the performance of the DLPK model, several wildfires were chosen that were close to the climate conditions in Turkey, as shown in . The Greece wildfire season in 2021 was the worst in the last 13 years, 130,000 ha of land were burned, and five wildfires started in early August (Giannaros et al. Citation2022). Also, in 2022, many countries were affected by wildfires, such as Spain, Croatia, and the United States, and a single wildfire event was selected for each country as a case study.

Table 6. Case studies to evaluate DLPK model.

Download CSV Display Table

Accuracy assessment

Assessing the detection’s accuracy is a crucial part of mapping wildfire areas. It can be analysed by comparing it to a reference mask with standard measurement indices. This comparison is based on how the results look and how the numbers measure them. Different metrics, such as the F1 score, which is also known as the Sorensen dice coefficient (SDC), shown in EquationEquation (1)(1) $F 1 score = \frac{2 T P}{2 T P + FP + FN}$ (1) , and the intersection over union (IoU), which is also known as the Jaccard Index, shown in EquationEquation (2)(2) $IoU = \frac{TP}{(TP + FP + FN)}$ (2) , are often used to judge how accurate the segmentation results are: (1) $F 1 score = \frac{2 T P}{2 T P + FP + FN}$ (1) (2) $IoU = \frac{TP}{(TP + FP + FN)}$ (2)

We also calculated the precision as shown in EquationEquation (3)(3) $Precision = \frac{TP}{TP + FP}$ (3) , which describes how many burned pixels are detected as burned. Additionally, the recall was calculated, which describes how many of the truly burned pixels have been detected, as shown in EquationEquation (4)(4) $Recall = \frac{TP}{TP + FN}$ (4) . (3) $Precision = \frac{TP}{TP + FP}$ (3) (4) $Recall = \frac{TP}{TP + FN}$ (4) where TP is true positive, FP is false positive, and FN is false negative.

We did not calculate the overall accuracy (OA) in this work because the number of unburned pixels mainly influences it. When the classes of the dataset are imbalanced, the unburned class dominates while the burned class is a small portion of the image. For example, if the burned area is 5% of the total image and the unburned area is 95%, the model is therefore 95% accurate, and at the same time, the model did not detect any burned area.

Loss function

The loss is the sum of the errors made by each batch in the training or validation sets, which shows how properly or badly a trained model works after each optimization step. The wildfire dataset consists of images with burned areas in the foreground pixels; therefore, we chose loss functions that prioritize foreground pixels and the samples that were difficult to segment. Experiments were conducted using binary cross-entropy (BCE) loss functions (Pihur et al. Citation2007), dice loss (Sudre et al. Citation2017), focal loss (Lin et al. Citation2017), and a hybrid loss consisting of contributions from both dice loss and focal loss (Zhu et al. Citation2019).

BCE loss functions implemented as EquationEquation (5)(5) ${BCE}_{} = - \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} l og {\hat{y}}_{i} + (1 - y_{i}) l og (1 - {\hat{y}}_{i}))}^{}$ (5) : (5) ${BCE}_{} = - \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} l og {\hat{y}}_{i} + (1 - y_{i}) l og (1 - {\hat{y}}_{i}))}^{}$ (5) where $y_{i}$ is the label and ${\hat{y}}_{i}$ is predicted output.

Dice loss functions implemented as EquationEquation (6)(6) $Dice Loss = \frac{2 \sum_{i}^{N} p_{i} g_{i}}{\sum_{i}^{N} p_{i}^{2} + \sum_{i}^{N} g_{i}^{2}} = \frac{2 \hat{i}}{i}$ (6) : (6) $Dice Loss = \frac{2 \sum_{i}^{N} p_{i} g_{i}}{\sum_{i}^{N} p_{i}^{2} + \sum_{i}^{N} g_{i}^{2}} = \frac{2 \hat{i}}{i}$ (6) where $i$ and $\hat{i}$ are the corresponding ground truth mask and predicted mask.

Focal loss functions implemented as EquationEquation (7)(7) $Focal = - a_{t} {(1 - p_{t})}^{γ} \log (p_{t})$ (7) : $p_{t} = {\begin{array}{l} \hat{y}, if y \geq 1 \\ 1 - \hat{y}, o therwise \end{array}$ (7) $Focal = - a_{t} {(1 - p_{t})}^{γ} \log (p_{t})$ (7) where $γ = 2$

Hybrid loss functions implemented as EquationEquation (8)(8) $Hybrid = Dice Loss + (1 *Focal Loss)$ (8) : (8) $Hybrid = Dice Loss + (1 *Focal Loss)$ (8)

Models training

We used Keras with the TensorFlow backend as a framework, with the adaptive moment estimation (Adam) optimization algorithm (Kingma and Ba Citation2014). The dataset was split into 17,352 training tiles, 2169 validation tiles, and 2169 testing tiles. During 300 epoch training, validation data was sent across the network, and its loss was estimated and monitored. The network was trained in 32-batches until it reached convergence at an initial learning rate of 1e-5. Three techniques were used during the training: reducing the learning rate, early stopping, and saving the best model. To avoid overfitting during the training, we reduced the learning rate by a factor of 0.5 if the validation loss did not improve after three epochs. The training will be stopped if it does not improve after five epochs, and the model that had the lowest validation loss will be saved.

Experiments and result

This section summarizes the different performances of DL network semantic segmentation models for mapping wildfires with the Sentinel-2 imagery dataset. The experiments are based on Python version 3.9.12 and Jupyter Notebook version 6.4.11 using the Windows system. The hardware used is an Intel i7-8700 central processing unit (CPU) @3.20 GHz and an NVIDIA GTX 1660ti with 32 GB of memory.

Testing model performance

Testing model performance using deferent loss functions

We perform initial experiments using four selected loss functions with the DL model. We selected the best loss function for further experiments based on the initial experiments’ results. We train U-Net with the ResNet50 encoder over 300 epochs using different loss functions, such as BCE, focal loss, dice loss, and hybrid loss.

As shown in , the best result depending on the F1 score and IoU for U-Net with the ResNet50 encoder, is the dice loss function, which means it is suitable for the DL segmentation models for our dataset. Thus, the dice loss function will be used in further experiments.

Table 7. Results of U-Net with ResNet50 encoder using deferent loss functions.

Download CSV Display Table

Testing model performance using GPU and CPU

We also perform initial experiments using the Attention ResU-Net model to train over 300 epochs, to select the best performance using CPU and graphics processing unit (GPU).

The performance of the Attention ResU-Net model with GPU has resulted in a precision of 99.31%, a recall of 97.71%, an F1-score of 98.54%, and an IoU of 97.76%. The training has an average speed of 693 s per epoch and a total of 1 d, 10 h, 5 min, and 47 s in the training of 177 epochs, as shown in and .

Figure 10. Attention ResU-Net (a) performance with GPU; (b) performance with CPU.

Table 8. Attention ResU-Net model results using GPU and CPU.

Download CSV Display Table

The performance of the Attention ResU-Net model with the CPU has resulted in a precision of 97.25%, a recall of 96.62%, an F1-score of 97.47%, and an IoU of 97.61%. The training has an average speed of 9941 s per epoch and a total of 2 d, 51 min, and 13 s in the training of 26 epochs, as shown in and . Thus, the GPU hardware will be used in further experiments.

Compared to the reference mask of the burned area, the Attention ResU-Net model prediction with GPU is entirely the same as the reference mask. The model can detect the burned area, but we found a few images, such as the first image on the left in , that had partially affected areas that the model detected as not being completely burned and ignored. The model performance with the CPU achieved lower than the GPU but also suggested excellent accuracy, confirmed by an IoU score of 97.61% and an F1 score of 97.47%, as shown in .

Figure 11. Attention ResU-Net prediction results.

Segmentation models result

As shown in -Net with the ResNet50 encoder model has the best result among all models that were tested, with a precision of 98.91%, a recall of 98.55%, an F1-score of 98.78%, and an IoU of 97.98%, with an average training speed of 165 s per epoch and a total time of 10 h, 35 min, and 7 s in the training of 231 epochs. While DeepLabV3+ model results are lower than other models in the corresponding metrics, with a precision of 88.98%, a recall of 82.76%, an F1 score of 85.91%, and an IoU of 79.13%. The training speed has an average speed of 317 s per epoch, with a total time of 5 h, 44 min, and 23 s in the training of 62 epochs.

Table 9. DL models results.

Download CSV Display Table

DLPK model results

Our experiments achieved the highest F1-score of 98.78% and the highest IoU of 97.98% using U-Net with the ResNet50 encoder. Thus, we chose the U-Net with the ResNet50 encoder model to develop into a pre-trained DLPK model based on our dataset. The DLPK model has been able to achieve a high level of detection accuracy, as shown in and .

Figure 12. DLPK model results (a) Istiaia, Greece; (b) Vitoria Gusties, Spain; (c) Sibenik, Croatia; (d) California, United States; (e, f, g, h) model prediction.

Table 10. DLPK model results.

Download CSV Display Table

Discussion

The DL segmentation models require reference data to detect wildfires with Sentinel-2 imagery. (Knopp et al. Citation2020) used reference data from three data sources, which are the Portuguese Institute for Nature Conservation and Forests (ICNF), the California Department of Forestry and Fire Protection (CAL FIRE), and the German Aerospace Centre (DLR). (Florath and Keller Citation2022) relied on OSM data to generate reference data. Since the wildfire data in Turkey was lacking, we created a manual dataset using a post-wildfire image from the Sentinel-2 satellite. The manual creation could take a long time, but it provides high accuracy, as confirmed in Section 5.2. Prabowo et al. (Citation2022 created a manual dataset for wildfires in Indonesia that contains 227 images with a 512 × 512 size. However, our dataset contains 21,690 images with a 128 × 128 size, which is more suitable for segmentation models because the number of images in the dataset is significant for segmentation models. Sentinel-2 has the highest spatial resolution for publicly available multi-spectral optical remote sensing data, and there is a great need to utilize this dataset to produce wildfire perimeter data at a higher resolution, especially for countries and regions that are less investigated in wildfire ecology (e.g. Turkey).

DL solves the challenging needs of satellite image processing, and the interest of the RS community in DL methods is growing fast, and many architectures have been proposed recently to address RS problems. Most of these methods detect wildfires using two images for pre- and post-event (El Mendili et al. Citation2020), which takes more time than using uni-temporal images for post-event only. In our work, we used the uni-temporal images for the post-wildfire dataset to train 14 deep-learning models. As shown in -Net with ResNet50 encoder, Attention ResU-Net, and U-Net with ResNet101 encoder was the best three models from all tested models. In general, the trend is that false positive pixel can be reduced slightly by using encoders with deeper architectures, but in our work, the ResNet50 encoder achieved a higher result than deeper encoders such as ResNet101 and ResNet152 with the U-Net model. Attention ResU-Net performances were slightly better in f1-score and IoU metrics than U-Net with a ResNet101 encoder, but training time was two times slower. U-Net with ResNet50 achieved high performance in the corresponding metrics and the training time.

Table 11. Models comparison.

Download CSV Display Table

Conclusions

One of the biggest challenges to DL models is the need for available public training datasets to identify features and extract them for making accurate decisions, especially for countries and regions that are less investigated in wildfire ecology (e.g. Turkey). In this work, a big dataset for Turkey’s wildfires using Sentinel-2 multiband images was created to aid the development of remote sensing or computer vision-based models for image segmentation, object detection, and classification concerning wildfires. Binary classification for burned-area and non-burned-area detections is supported in this dataset. Various experiments were conducted in this work to evaluate the performance of wildfire monitoring and detection using our dataset, and they achieved great results with the DL segmentation models. We compared 14 deep-learning models based on combinations between five architectures (U-Net, U-Net++, Attention ResU-Net, LinkNet, and DeepLabV3+) and four encoders (ResNet101, ResNet50, ResNet152, and MobileNet) for U-Net and LinkNet. U-Net with a ResNet50 encoder, Attention ResU-Net, and U-Net with a ResNet101 encoder models achieved the best results in IoU and F1-score metrics. We developed a pre-trained DLPK model that has the ability to detect wildfire automatically from Sentinel-2 imagery, which can be used by organizations or wildfire protection associations, and it will decrease decision-making time. Our DLPK model is based on a U-Net with a ResNet50 encoder that achieved high detection of burned area. In general, the proposed DLPK model offers significant benefits:

Effective and straightforward in comparison to other state-of-the-art techniques.
It works for a retrospective analysis purpose after the wildfire is contained to determine damage.
Can be a promising approach to producing operational products after X hours of Sentinel-2 data for rapid mapping.

The future work will use a multitemporal series of satellite images to detect more details about post-wildfire conditions and provide information about the types of land cover that have low or high effects on spreading the wildfire.

Disclosure statement

The authors report no conflict of interest.

Data availability statement

The data that support the findings of this study are available from the corresponding author, [Al-Dabbagh], upon reasonable request.

References

Alencar AAC, Arruda VLS, Silva WVd, Conciani DE, Costa DP, Crusco N, Duverger SG, Ferreira NC, Franca-Rocha W, Hasenack H, et al. 2022. Long-term landsat-based monthly burned area dataset for the Brazilian biomes using deep learning. Remote Sens. 14(11):2510.
Google Scholar
Balaniuk R, Isupova O, Reece S. 2020. Mining and tailings dam detection in satellite imagery using deep learning. Sensors. 20(23):6936.
PubMed Web of Science ®Google Scholar
Barboza Castillo E, Turpo Cayo EY, de Almeida CM, Salas López R, Rojas Briceño NB, Silva López JO, Barrena Gurbillón MÁ, Oliva M, Espinoza-Villar R. 2020. Monitoring wildfires in the Northeastern Peruvian amazon using landsat-8 and sentinel-2 imagery in the GEE platform. IJGI. 9(10):564.
Google Scholar
Cardil A, Mola-Yudego B, Blázquez-Casado Á, González-Olabarria JR. 2019. Fire and burn severity assessment: calibration of Relative Differenced Normalized Burn Ratio (RdNBR) with field data. J Environ Manage. 235:342–349.
PubMed Web of Science ®Google Scholar
Chaurasia A, Culurciello E, editors. 2017. Linknet: exploiting encoder representations for efficient semantic segmentation. 2017 IEEE Visual Communications and Image Processing (VCIP); Dec; Piscataway (NJ):IEEE. p. 1–4.
Google Scholar
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL. 2014. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:14127062. https://arxiv.org/abs/1412.7062
Google Scholar
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL. 2017. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell. 40(4):834–848.
PubMed Web of Science ®Google Scholar
Chen LC, Papandreou G, Schroff F, Adam H. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:170605587. https://arxiv.org/abs/1706.05587
Google Scholar
Chen LC, Zhu Y, Papandreou G, Schroff F, Adam H. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision (ECCV). p. 801–818.
Google Scholar
Chuvieco E, De Santis A, Riaño D, Halligan K. 2007. Simulation approaches for burn severity estimation using remotely sensed images. Fire Ecol. 3(1):129–150.
Google Scholar
Chuvieco E, Mouillot F, van der Werf GR, San Miguel J, Tanase M, Koutsias N, García M, Yebra M, Padilla M, Gitas I, et al. 2019. Historical background and current developments for mapping burned area from satellite Earth observation. Remote Sens Environ. 225:45–64.
Web of Science ®Google Scholar
Corbane C, Syrris V, Sabo F, Politis P, Melchiorri M, Pesaresi M, Soille P, Kemper T. 2021. Convolutional neural networks for global human settlements mapping from Sentinel-2 satellite imagery. Neural Comput Applic. 33(12):6697–6720.
Web of Science ®Google Scholar
Drusch M, Del Bello U, Carlier S, Colin O, Fernandez V, Gascon F, Hoersch B, Isola C, Laberinti P, Martimort P, et al. 2012. Sentinel-2: ESA's optical high-resolution mission for GMES operational services. Remote Sens Environ. 120:25–36.
Web of Science ®Google Scholar
Eke M, Cingiroglu F, Kaynak B. 2022. Impacts of summer 2021 wildfire events in Southwestern Turkey on air quality with multi-pollutant satellite retrievals. EGU General Assembly Conference Abstracts. p. EGU22–12134.
Google Scholar
El Mendili L, Puissant A, Chougrad M, Sebari I. 2020. Towards a multi-temporal deep learning approach for mapping urban fabric using sentinel 2 images. Remote Sens. 12(3):423.
Google Scholar
ElGharbawi T, Zarzoura F. 2021. Damage detection using SAR coherence statistical analysis, application to Beirut, Lebanon. ISPRS J Photogramm Remote Sens. 173:1–9.
Web of Science ®Google Scholar
Filipponi F. 2018. BAIS2: burned area index for sentinel-2. Proceedings. 2:364.
Google Scholar
Florath J, Keller S. 2022. Supervised machine learning approaches on multispectral remote sensing data for a combined detection of fire and burned area. Remote Sens. 14(3):657.
Google Scholar
Gascon F, Bouzinac C, Thépaut O, Jung M, Francesconi B, Louis J, Lonjou V, Lafrance B, Massera S, Gaudel-Vacaresse A, et al. 2017. Copernicus sentinel-2A calibration and products validation status. Remote Sens. 9(6):584.
Google Scholar
Giannaros TM, Papavasileiou G, Lagouvardos K, Kotroni V, Dafis S, Karagiannidis A, Dragozi E. 2022. Meteorological analysis of the 2021 extreme wildfires in Greece: lessons learned and implications for early warning of the potential for pyroconvection. Atmosphere. 13(3):475.
Web of Science ®Google Scholar
Helber P, Bischke B, Dengel A, Borth D. 2019. Eurosat: a novel dataset and deep learning benchmark for land use and land cover classification. IEEE J Sel Top Appl Earth Observ Remote Sens. 12(7):2217–2226.
Web of Science ®Google Scholar
Hu X, Ban Y, Nascetti A. 2021. Uni-temporal multispectral imagery for burned area mapping with deep learning. Remote Sens. 13(8):1509.
Google Scholar
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. 2017. Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. p. 4700–4708.
Google Scholar
Ioffe S, Szegedy C. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift. International conference on machine learning; Proceedings of Machine Learning Research. p. 448–456.
Google Scholar
Kalantar B, Ueda N, Saeidi V, Janizadeh S, Shabani F, Ahmadi K, Shabani F. 2021. Deep neural network utilizing remote sensing datasets for flood hazard susceptibility mapping in Brisbane, Australia. Remote Sens. 13(13):2638.
Google Scholar
Key CH, Benson NC. 2006. Landscape assessment (LA). In: Lutes DC, Keane RE, Caratti JF, Key CH, Benson NC, Sutherland S, Gangi LJ, editros. FIREMON: fire effects monitoring and inventory system Gen Tech Rep RMRS-GTR-164-CD. Fort Collins (CO): US Department of Agriculture, Forest Service, Rocky Mountain Research Station; p. LA 155–164.
Google Scholar
Kikaki K, Kakogeorgiou I, Mikeli P, Raitsos DE, Karantzalos K. 2022. MARIDA: a benchmark for Marine Debris detection from Sentinel-2 remote sensing data. PLoS One. 17(1):e0262247.
PubMed Web of Science ®Google Scholar
Kiliçaslan E. 2022. Wildfire resilient village of the future. https://repository.tudelft.nl/islandora/object/uuid:302583c2-1d88-476d-ab4d-7cc2fd577f48
Google Scholar
Kingma DP, Ba J. 2014. Adam: a method for stochastic optimization. arXiv preprint arXiv:14126980. https://arxiv.org/abs/1412.6980
Google Scholar
Knopp L, Wieland M, Rättich M, Martinis S. 2020. A deep learning approach for burned area segmentation with sentinel-2 data. Remote Sens. 12(15):2422.
Google Scholar
Kristollari V, Karathanassi V. 2020. Artificial neural networks for cloud masking of Sentinel-2 ocean images with noise and sunglint. Int J Remote Sens. 41(11):4102–4135.
Web of Science ®Google Scholar
Langford Z, Kumar J, Hoffman F. 2018. Wildfire mapping in interior Alaska using deep neural networks on imbalanced datasets. 2018 IEEE International Conference on Data Mining Workshops (ICDMW); Nov 17–20, 2018. IEEE; p. 770–778.
Google Scholar
Laris PS. 2005. Spatiotemporal problems with detecting and mapping mosaic fire regimes with coarse-resolution satellite data in savanna environments. Remote Sens Environ. 99(4):412–424.
Web of Science ®Google Scholar
Lin TY, Goyal P, Girshick R, He K, Dollár P, editors. 2017. Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision. p. 2980–2988.
Google Scholar
Loboda T, O'Neal KJ, Csiszar I. 2007. Regionally adaptable dNBR-based algorithm for burned area mapping from MODIS data. Remote Sens Environ. 109(4):429–442.
Web of Science ®Google Scholar
Long J, Shelhamer E, Darrell T. 2015. Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. p. 3431–3440.
Google Scholar
Ma L, Liu Y, Zhang X, Ye Y, Yin G, Johnson BA. 2019. Deep learning in remote sensing applications: a meta-analysis and review. ISPRS J Photogramm Remote Sens. 152:166–177.
Web of Science ®Google Scholar
Nolde M, Plank S, Riedlinger T. 2020. An adaptive and extensible system for satellite-based, large scale burnt area monitoring in near-real time. Remote Sens. 12(13):2162.
Google Scholar
Papadomanolaki M, Verma S, Vakalopoulou M, Gupta S, Karantzalos K. 2019. Detecting urban changes with recurrent neural networks from multitemporal sentinel-2 data. IGARSS 2019–2019 IEEE International Geoscience and Remote Sensing Symposium; July 28, Aug 2, 2019.
Google Scholar
Pihur V, Datta S, Datta S. 2007. Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach. Bioinformatics. 23(13):1607–1615.
PubMed Web of Science ®Google Scholar
Pinto MM, Libonati R, Trigo RM, Trigo IF, DaCamara CC. 2020. A deep learning approach for mapping and dating burned areas using temporal sequences of satellite images. ISPRS J Photogramm Remote Sens. 160:260–274.
Web of Science ®Google Scholar
Prabowo Y, Sakti AD, Pradono KA, Amriyah Q, Rasyidy FH, Bengkulah I, Ulfa K, Candra DS, Imdad MT, Ali S. 2022. Deep learning dataset for estimating burned areas: case study, Indonesia. data. Data. 7(6):78.
Web of Science ®Google Scholar
Quintano C, Fernández-Manso A, Stein A, Bijker W. 2011. Estimation of area burned by forest fires in Mediterranean countries: a remote sensing data mining perspective. For Ecol Manage. 262(8):1597–1607.
Web of Science ®Google Scholar
Reichstein M, Camps-Valls G, Stevens B, Jung M, Denzler J, Carvalhais, N, Prabhat. 2019. Deep learning and process understanding for data-driven Earth system science. Nature. 566(7743):195–204.
PubMed Web of Science ®Google Scholar
Ronneberger O, Fischer P, Brox T. MICCAI 2015. U-Net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells W, Frangi A, editors. Medical image computing and computer-assisted intervention – MICCAI 2015. 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Cham, Switzerland: Springer International Publishing.; p. 234–241.
Google Scholar
Roy DP, Huang H, Boschetti L, Giglio L, Yan L, Zhang HH, Li Z. 2019. Landsat-8 and Sentinel-2 burned area mapping - A combined sensor multi-temporal change detection approach. Remote Sens Environ. 231:111254.
Web of Science ®Google Scholar
Seydi ST, Hasanlou M, Chanussot J. 2022. Burnt-Net: wildfire burned area mapping with single post-fire Sentinel-2 data and deep learning morphological neural network. Ecol Indic. 140:108999.
Google Scholar
Sudre CH, Li W, Vercauteren T, Ourselin S, Jorge Cardoso M. 2017. Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. Deep learning in medical image analysis and multimodal learning for clinical decision support. DLMIA ML-CDS 2017 2017. Lecture notes in computer science. Vol. 10553. Cham, Switzerland: Springer.
Google Scholar
Wang Z, Yang P, Liang H, Zheng C, Yin J, Tian Y, Cui W. 2022. Semantic segmentation and analysis on sensitive parameters of forest fire smoke using smoke-unet and landsat-8 imagery. Remote Sens. 14:45.
Google Scholar
Wurm M, Stark T, Zhu XX, Weigand M, Taubenböck H. 2019. Semantic segmentation of slums in satellite images using transfer learning on fully convolutional neural networks. ISPRS J Photogramm Remote Sens. 150:59–69.
Web of Science ®Google Scholar
Zhang J, Lv X, Zhang H, Liu B. 2020. AResU-Net: attention residual u-net for brain tumor segmentation. Symmetry. 12(5):721.
Google Scholar
Zhao F, Sun R, Zhong L, Meng R, Huang C, Zeng X, Wang M, Li Y, Wang Z. 2022. Monthly mapping of forest harvesting using dense time series Sentinel-1 SAR imagery and deep learning. Remote Sens Environ. 269:112822.
Web of Science ®Google Scholar
Zhou Z, Rahman Siddiquee MM, Tajbakhsh N, Liang J, et al. 2018. UNet++: a nested u-net architecture for medical image segmentation. Deep learning in medical image analysis and multimodal learning for clinical decision support. DLMIA ML-CDS 2018 2018. Lecture notes in computer science. Vol. 11045. Cham, Switzerland: Springer.
Google Scholar
Zhu W, Huang Y, Zeng L, Chen X, Liu Y, Qian Z, Du N, Fan W, Xie X. 2019. AnatomyNet: deep learning for fast and fully automated whole-volume segmentation of head and neck anatomy. Med Phys. 46(2):576–589.
PubMed Web of Science ®Google Scholar

Uni-temporal Sentinel-2 imagery for wildfire detection using deep learning semantic segmentation models

Abstract

Introduction

Related work

Dataset

Table 1. Dataset description.

Sentinel-2 multispectral data

Sentinel-2 imagery for wildfire

Preparing dataset

Study area

Preparing Sentinel-2 images

Table 2. The information of S-2 satellite images for study area.

Wildfire labelling

Sentinel-2 Turkey’s wildfire dataset

Table 3. Distribution of dataset images.

Table 4. Distribution of dataset masks.

Materials and methods

Table 5. Deep learning models parameter.

U-Net model architecture

LinkNet model architecture

DeepLabV3+ model architecture

Attention ResU-Net model architecture

U-Net++ model architecture

DLPK model

DLPK model data

Table 6. Case studies to evaluate DLPK model.

Accuracy assessment

Loss function

Models training

Experiments and result

Testing model performance

Testing model performance using deferent loss functions

Table 7. Results of U-Net with ResNet50 encoder using deferent loss functions.

Testing model performance using GPU and CPU

Table 8. Attention ResU-Net model results using GPU and CPU.

Segmentation models result

Table 9. DL models results.

DLPK model results

Table 10. DLPK model results.

Discussion

Table 11. Models comparison.

Conclusions

Disclosure statement

Data availability statement

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date