Full article: Black and Odorous Water Detection of Remote Sensing Images Based on Improved Deep Learning

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Black and odorous water seriously affects the ecological balance of rivers and the health of people living nearby. Satellite remote sensing technology with its advantages of a large range, long-time series, low cost, and high efficiency, has provided a new area for water quality detection. Much archived remote sensing satellite data can be further processed and used as a data source for black and odorous water detection. In this paper, Gaofen-2 remote sensing data with a spatial resolution of 1 m is leveraged as the data source. To enrich the data source in the northern coastal zone of China, we have built a high-quality remote sensing dataset, called the remote sensing images for black and odorous water detection (RSBD) dataset, which is collected from the Gaofen-2 satellite in Yantai, China. In addition, we propose a network with an encoder-decoder discriminant structure for black and odorous water detection. In the network, an augmented attention module is designed to capture a more comprehensive semantic feature representation. Further, the median balancing loss function is adopted to solve the imbalance issues. Experimental results demonstrate that the network is superior to other state-of-the-art semantic segmentation methods on our dataset.

RÉSUMÉ

Les eaux noires et malodorantes affectent gravement l’équilibre écologique des rivières et la santé des personnes vivant à proximité. La technologie de télédétection par satellite, avec ses avantages de grande couverture, de longues séries temporelles, de faible coût et d’efficacité élevée, a ouvert un nouveau domaine pour la détection de la qualité de l’eau. De nombreuses données de télédétection par satellite archivées peuvent être traitées et utilisées comme source de données pour la détection des eaux noires et malodorantes. Dans cet article, les données du satellite Gaofen-2 avec une résolution spatiale de 1 m sont utilisées comme source de données. Pour enrichir la source de données dans la zone côtière du nord de la Chine, nous avons créé un ensemble de données de télédétection de haute qualité, appelées images de télédétection pour la détection des eaux noires et odorantes (RSBD), collectées par le satellite Gaofen-2 à Yantai, en Chine. De plus, nous proposons un réseau avec une structure discriminante encodeur-décodeur pour la détection des eaux noires et malodorantes. Dans le réseau, un module d’attention augmentée est conçu pour capturer une représentation sémantique plus complète des caractéristiques. De plus, la fonction de perte d’équilibrage médian est adoptée pour résoudre les problèmes de déséquilibre. Les résultats expérimentaux démontrent que le réseau est supérieur à d’autres méthodes de segmentation sémantique de pointe sur notre ensemble de données.

Introduction

Black and odorous water (BOW) has two main manifestations, the color is typically black, dark green, or brown, and the smell is usually described as a metallic, fishy, or sewage-like odor (Duan et al. Citation2014). BOW can be extremely harmful to both the environment and human health. The high levels of organic materials can lead to a decrease in the dissolved oxygen levels in the water, which can lead to the death of aquatic organisms. Additionally, BOW can contain bacteria and other microorganisms that can cause water-borne illnesses if ingested by humans (Meng et al. Citation2020; Saha et al. Citation2017). The pollution sources in the external environment and the production of internal sediments together aggravate the adverse effect of BOW. The "Great Stink" in the London Thames in 1858 is a typical historical case. Thousands of people died of cholera after drinking from the sewage-filled river (Luckin Citation2006). Since then, the prevention and detection of BOW have attracted the attention of many experts and scholars. Horbe and Santos (Citation2009) analyzed the water quality components of the main blackwater branches in the western Amazon rivers of Brazil at low tide. Yu et al. (Citation2009) studied the occurrence pattern of odor in numerous rivers in China. Therefore, it is of great necessity for economic and social development and ecological progress to effectively detect BOW.

In previous research, the identification methods of BOW were mainly done based on site visit and field test. Meanwhile, the evaluation index of BOW was constructed by combining some organic pollutant indicators to identify the degree of black odor in water bodies (C. Liu et al. Citation2011; G.-H. Lu et al. Citation2011). However, when facing a large-scale water body for monitoring, the traditional BOW detection method requires a lot of human and financial resources. And it is easy to miss information and wrong detection of water bodies. Nowadays, remote sensing photography technology has achieved milestone breakthroughs, which shed new light on BOW detection. It can detect the water body in a wide range and continuously, and obtain information of the water body in an all-around way. This progress makes up for the difficulty of obtaining information about on large scale water bodies by conventional detection methods (Kutser et al. Citation2016; J. Zhao et al. Citation2013).

However, for centimeter-to-meter resolution water pollution detection, few image datasets are suitable to this task. The existing methods (Olmanson et al. Citation2011) based on low and medium-resolution remote sensing data are not suitable for accurately detecting BOW in narrow rivers due to their limited resolution. With the continuous launch of high-resolution satellites, the sub-meter and meter-level spatial resolution remote sensing has provided data resources for accurate BOW detection (Moortgat et al. Citation2022). On August 19, 2014, China launched the Gaofen-2 (GF-2) satellite, which is the first Chinese sub-meter level earth observation satellite (Tong et al. Citation2016). At present, several kinds of research have been conducted to use GF-2 satellite data to detect river water quality. Yao et al. (Citation2019) designed a BOW index (BOI) combining red and green bands. This index was developed by leveraging the remote sensing reflectance characteristics of BOW in Shenyang, China. Shen et al. (Citation2019) proposed a color purity index derived from remotely sensed reflectance and analyzed the BOW of Shenyang, China using GF-2 data. In this paper, GF-2 satellite images are used to detect BOW.

Now, deep convolutional neural network (DCNN) has achieved increasingly development (Z. Liu et al. Citation2023; Wambugu et al. Citation2021). It is mostly applied to extract all kinds of information about ground objects in the remote sensing field. DCNN utilizes the spectral properties and texture information of remote sensing images to detect ground objects. Shao et al. (Citation2022) adopted semantic segmentation models based on DCNN to detect BOW, and incorporated attention blocks into the network. Many studies have shown that DCNN is an effective method for BOW detection (Pu et al. Citation2019; Wang et al. Citation2022). However, DCNN requires a substantial amount of accurately labeled data for training, which can be both difficult and time-consuming. There is still a scarcity of open research data resources available for BOW detection (Nambiar et al. Citation2022).

In recent years, Chinese northern coastal areas have experienced increasing water and energy shortages and offshore environmental pollution. This has limited the sustainable development of the regional economy and society. Yantai City, a major port urban in the Bohai Sea region, is bordered by the Yellow Sea to the south and the Bohai Sea to the north. The BOW in the urban area have become a major obstacle to the development of Yantai City. Unfortunately, there is a lack of research data resources for detecting BOW in Yantai City (R. Liu et al. Citation2021; Y. Lu et al. Citation2020; Xu et al. Citation2016). In this study, we explore BOW detection using DCNN model, and establish a new dataset based on the GF-2 remote sensing data of Yantai City. This dataset serves as a benchmark resource for evaluating and improving water pollution detection. BOW always distribute in small rivers, which can make BOW detection more difficult. To increase performance of BOW detection, we design an improved U-Net semantic segmentation network (Ronneberger et al. Citation2015), which adds an augmented attention module to capture global context information. There are three conspicuous contributions are summed as follows:

A GF-2 remote sensing image dataset is built to detect BOW, called RSBD dataset. All GF-2 images are collected in Yantai, China and cover multiple rivers. The dataset is the first dataset that can be applied to detect BOW in Yantai City.
To increase the detection accuracy of BOW, we propose a novel BOW detection network (BDNet). In this network, the encoder part introduces an augmented attention module to focus on BOW spatial features and channel features. The decoder part fuses the low-resolution semantic features and high-resolution semantic features. Additionally, the median balancing loss (MBL) function is adopted to address the imbalance issues of BOW features and other features.
For displaying the segmentation performance of BDNet, several verification experiments are conducted. Compared with several most common deep learning methods, the experimental results show that our proposed method has better segmentation effectiveness on the RSBD dataset.

Study area and materials

Study area

Yantai City, located in the northeast of the Shandong Peninsula, China, is bordered by the Yellow Sea to the south and the Bohai Sea to the north. It is one of the first 14 coastal cities in China and is one of the top three core development cities in Shandong Province. Its Gross Domestic Product ranked third in the province in 2022. The city has a well-developed river network with many small and medium-sized rivers. Industrialization and urbanization have led to serious pollution in many rivers, resulting in complex pollution sources and black odor phenomena. As a result, these rivers are suitable data sources for detecting BOW (Wang et al. Citation2013). The study area is represented in and . marks all the rivers and BOW in the Laishan District. marks all the rivers and BOW in the Muping District.

Figure 1. Geographic distribution of black and odorous water sampling points in Laishan District, Yantai City, China.

Figure 2. Geographic distribution of black and odorous water sampling points in Muping District, Yantai City, China.

Yantai City has taken a proactive approach to implement BOW treatment and environmental protection measures. The city has also released a list of rivers with black odor characteristics in urban areas, in order to better protect the water resources within the city. According to the list of rivers with black odor characteristics in Yantai City published by the Yantai Urban Administration Bureau (http://cgj.yantai.gov.cn/art/2016/1/11/art_160_451908.html) in 2016, there were 22 rivers with black odor characteristics in Yantai City. We select Laishan District and Muping District as the study area, containing a total of 6 rivers with black odor characteristics, namely the Dongdu River, Dongfeng River, Yuniao River, Xiaozhang River, Sanba River, and Furong River. These rivers are polluted due to various reasons such as industrial water discharge, river obstruction, and garbage dumping. They vary in length, ranging from 0.75 km to over 6.36 km. The rivers in Laishan District and Muping District with black odor characteristics are more representative, and hence, we have chosen them as the research objects. shows the study area of specific information.

Table 1. Detailed information on the distribution of rivers.

Download CSV Display Table

BOW has the following characteristics, abnormal water color, river blockage, and harsh environment on the shore. Polluted rivers are typically gray-black or dark green in color, while normal water bodies are usually pure green or blue, with a clean and impurity-free surface. BOW often presents in small branches of rivers, which have narrow channels that can easily become blocked on both sides, leading to increased water pollution. Furthermore, BOW is often located near factories or garbage dumps (Zhang et al. Citation2022). These features provide the basis for the visual interpretation of the BOW annotation in this paper.

Dataset and analysis

Data acquisition and refinement

GF-2 was lifted off on August 19, 2014. It carries a high-resolution camera with a panchromatic (PAN) resolution of 1 m and a multispectral (MS) resolution of 4 m, achieving high spatial resolution. The under-satellite measurement points have a spatial resolution of 0.8 m, and the swath width is 45 km. It has sub-meter spatial resolution and more accurate positioning capability, which significantly improves the comprehensive observation effectiveness of the satellite. GF-2 provides data support for a variety of research fields, including mineral resources development and monitoring, air environment monitoring, and water environment monitoring (L. Chen et al. Citation2022; Sun et al. Citation2020; Tong et al. Citation2016; Wei et al. Citation2021). shows the parameters of GF-2.

Table 2. Typical parameters of Gaofen-2 satellite.

Download CSV Display Table

GF-2 captures PAN images with only one band, while the MS images captured have four bands, separately, they are blue band, green band, red band, and near-infrared band, which are denoted by B1, B2, B3, and B4, respectively.

In this work, the GF-2 remote sensing images are collected from 2015 to 2016 in Yantai, China, with a total of 10 raw images. Less than 30% of the cloud cover in these images, meets the requirements of the application. Download GF-2 data through the China Center for Resources Satellite Data and Application. As shown in , the information on the remote sensing data is listed.

Table 3. Image information of Gaofen-2 satellite used in this research.

Download CSV Display Table

Remote sensing satellites are affected by external factors and internal factors when imaging process, resulting in a certain gap between the obtained remote sensing images and real objects. External factors include fuzzy remote sensing image caused by atmospheric interference, uneven image and artifact caused by radiation scattering and illumination difference, etc. Internal factors include image distortion and position deviation caused by satellite attitude, orbit deviation or sensor system. To reduce these gaps, it is necessary to preprocess the obtained remote sensing images. GF-2 remote sensing image preprocessing has four steps: radiometric calibration, atmospheric correction, orthorectification, and fusion of PAN image and MS image.

Radiometric calibration is the process of converting the digital number of an image into a radiometric brightness value. The conversion process can be expressed as EquationEquation (1)(1) $L = Gain * D N + offset$ (1) : (1) $L = Gain * D N + offset$ (1) where $L$ is the radiance brightness value after conversion, $Gain$ is the calibration slope, $D N$ is the digital number of the image element, and $offset$ is the absolute calibration coefficient offset. Atmospheric correction aims to eliminate the influence of factors such as atmosphere and light on the reflection of features. The quick atmospheric calibration (QUAC) tool in ENVI software is used to achieve atmospheric correction of high-resolution remote images. The central wavelengths for each band of the input image are 0.514 μm, 0.546 μm, 0.656 μm, and 0.822 μm. To eliminate geometric distortion in remote sensing images, the rational polynomial coefficient file that accompanies the PAN and MS images of GF-2 is used to perform orthorectification. When performing orthorectification, we resample the resolution of the MS and PAN images to 4 m and 1 m, respectively. Image fusion involves resampling MS and PAN images to create a remote sensing image with multi-spectral features and high spatial resolution. The nearest neighbor diffusion pan sharpening tool is used to fuze data and generate a 1 m spatial resolution fusion image. shows the preprocessing process of PAN image and MS image.

Figure 3. Preprocessing operations are performed on MS and PAN images produced by Gaofen-2 satellite, respectively. The information in the example image is shown in the lower left corner.

Dataset production

The size of image is 29,200 × 27,620 pixels. However, due to hardware limitations, the existing server cannot train and predict large-scale high-resolution remote sensing images simultaneously. Therefore, it is necessary to cut the raw image into smaller patches. The raw images are cut into 256 × 256 pixels using the ROI tool in ENVI software. At the same time, to reduce the impact of positive and negative sample imbalance on the performance of the classifier, we remove these samples whose background pixels account for more than 90%. Finally, we select 329 original images to build the dataset.

The location of each BOW label is obtained from that the list of rivers with black odor characteristics in Yantai City published by the Yantai Urban Administration Bureau. At the same time, the visual interpretation signs of BOW and general water bodies are combined for labeling. Each pixel is labeled into two categories: BOW and others, by using the Labelme tool (Torralba et al. Citation2010), which is pixel-level marked with two different colors: white and black. The 329 original images are split into a training set and a test set according to the ratio of 7:3. In order to compensate the inadequacy of the dataset, we leverage the data augmentation method, including brightness adjustment, color adjustment, and flipping (Q. Chen et al. Citation2019; Singh et al. Citation2020; Xia et al. Citation2017). For the flipping method, horizontal flipping and vertical flipping are utilized. The training set consists of 1,155 images and the corresponding labels, and the test set consists of 490 images and their labels. As shown in , some typical images and their ground truth are displayed.

Figure 4. Some typical black and odorous water images and their corresponding annotation images, the first row and third row indicate the Gaofen-2 remote sensing images, and the second row and fourth row indicate the corresponding label images.

Given the high spatial resolution of the GF-2, the geometry of the captured river scenes is much clearer and finer, posing additional challenges for image classification. Due to the different high and low earth surfaces, the BOW in different scenes may appear in different sizes and directions. Also, the flight altitude and shooting direction of satellites, and the sun elevation angle can have a great influence on the appearance of BOW. The BOW of each sample image has different shapes, sizes, and proportions, and they are extracted at different times and seasons. These make the RSBD dataset have higher intra-class variations, i.e., samples from same class have high variation in attributes or characteristics. In addition, the RSBD dataset has lower inter-class dissimilarity, i.e., samples from different classes have less variation in attributes or characteristics. BOW and general water bodies mostly show similar shapes and belong to the same river with a low dissimilarity. In conclusion, the RSBD dataset has higher intra-class variations and lower inter-class dissimilarity. These variabilities may be closely related to geographical factors, pollution sources, environmental factors, etc. It provides help to develop a BOW detection method with stronger generalization ability.

The proposed method

The main challenge for detection BOW is that they are often located in hard-to-detect areas such as branches of rivers and small rivers. U-Net has been proven to be an effective architecture for semantic segmentation tasks. U-Net is a typical encoder-decoder structure. Its skip connections and symmetric encoder-decoder structure make it effective in capturing both local and global information. This is crucial for accurate segmentation. Additionally, U-Net can train effectively with limited data and patch-based augmentation techniques. It is suitable for different segmentation applications.

Network architecture

In this paper, we propose an encoder-decoder network based on the U-Net, named BDNet, which is designed to improve the accuracy of BOW detection. BDNet is made up of two main components: the encoder and the decoder. In the encoder part, four augmented attention modules are introduced to focus on spatial features and channel features of the BOW, which emphasize the BOW semantic information and increase the feature representation. The decoder part fuses the low-resolution semantic features and high-resolution semantic features. Additionally, the MBL function is adopted to solve the imbalance issues of BOW features and other features. The MBL function is a variant of the cross-entropy function, which sets a corresponding weight for each category, effectively solving the problem caused by the imbalance of object categories.

As illustrated in , the encoder part consists of the convolution block, max pooling layer, and the augmented attention block. Convolution block comprises 2 convolutions with a kernel of 3 × 3 and rectified linear unit (ReLU) activation function. The input image is first fed into the convolution block to obtain the basic feature map A1. Then through the max pooling layer and the convolution block, repeat three times, the feature maps A2, A3, and A4 are obtained. The obtained feature maps A1, A2, A3, and A4 are connected with the features that have undergone the convolution block, as the input to the next layer of convolution blocks. The features acquired in the last layer are put into the decoder. In the feature fusion stage, the features acquired in the previous layer are connected with the corresponding features in the encoder part, and then deconvolution and up sampling operations are performed to recover the size of the feature map. Finally, 1 convolution block with a kernel of 1 × 1 is used to achieve dimensionality reduction.

Figure 5. The architecture of BDNet. It mainly contains two branches: the encoder part in the red box and the decoder part in the blue box. ‘Augmented Attention Module’ contains the channel attention module and the spatial attention module, repeat the operation four times, and obtain feature mapping A1–A4. Introducing the median balancing loss function in the decoder part.

Augmented attention module

Aiming at the problem of missing detection caused by BOW of tiny shapes, the augmented attention module is introduced to optimize and improve the network. A channel attention module (CAM) and a spatial attention module (SAM) are contained (Woo et al. Citation2018). CAM obtains a stronger ability to extract BOW by integrating the channel information of the input image and generating a global association between channels. SAM makes the convolutional network pay more attention to which positions of the image play a crucial role in the final output of the network, i.e., which information at the location has the greatest impact on the final prediction. Combine CAM and SAM sequentially to form a complete attention module.

shows the structure details of CAM, the max pooling and the average pooling are performed on the input features $F,$ respectively. The input features $F$ have dimensions H × W×C, where H × W is the height and width of the feature map and C is the number of channels. We input the generated maximum pooled features and average pooled features into a shared network, which consists of the multi-layer perceptron (MLP) and hidden layers. And then, the output features are subjected to a summation operation. Further, a sigmoid activation operation is performed to generate a channel attention matrix $M_{c}$ of dimension 1 × 1 × C. Finally, $M_{c}$ is multiplied with the input feature map to generate the final output feature $F' .$

Figure 6. Diagram of the channel attention module.

presents the structure of SAM, $F'$ of dimension H × W×C is inputted into the max pooling and the average pooling. Concatenate operation is performed on the obtained feature map based on the channels to obtain the H × W × 2 feature map. Then after a 7 × 7 convolution operation, the dimensionality is reduced to an H × W × 1 feature map. And then the sigmoid function is used to produce feature matrix $M_{s} .$ Finally, the spatial attention map $M_{s}$ and the input feature map $F'$ are multiplied to obtain the final required spatial feature map F''

Figure 7. Diagram of the spatial attention module.

Loss function

MBL function replaces cross-entropy loss function in the decoder part. Cross-entropy assigns same weight to categories. Once the problem of unbalanced categories occurs, the training process will be dominated by classes with more pixels and will be difficult to learn the features of fewer objects, thus reducing the effectiveness of the network. The MBL function is a variant of the cross-entropy function, which sets a corresponding weight for each category, effectively solving the problem caused by the imbalance of object categories (Kampffmeyer et al. Citation2016). Therefore, we adopt the MBL function to supervise the training process of BDNet.

The median frequency balancing way (Eigen and Fergus Citation2015) is applied to calculate the weights of each class, the $P = {1, 2, \dots, C}$ represents the set of the $C$ classes, as presented in EquationEquation (2)(2) $w_{i} = \frac{MFB ({f_{1}, f_{2}, \dots f_{c}})}{f_{i}},$ (2) : (2) $w_{i} = \frac{MFB ({f_{1}, f_{2}, \dots f_{c}})}{f_{i}},$ (2) where $w_{i}$ denotes the weight of the $i$ th class, $MFB ({f_{1}, f_{2}, \dots f_{c}})$ represents the median frequency balancing way, and $f_{i}$ is the frequency of the $i$ th class, which denotes the proportion of the pixels number of the $i$ th class to the sum of all pixels. The loss is presented in EquationEquation (3)(3) $L = - \sum_{i = 1}^{C} w_{i} \times x_{i} \times \log_{10} (s (x_{i})),$ (3) : (3) $L = - \sum_{i = 1}^{C} w_{i} \times x_{i} \times \log_{10} (s (x_{i})),$ (3) where $L$ is the MBL function, $x_{i}$ denotes the target label of the $i$ th class, and $s (x_{i})$ is the output of softmax of the $i$ th class.

Experiment and evaluation

In this section, we introduced the details of experiments and common evaluation metrics. Several popular deep learning models, including PSPNet (H. Zhao et al. Citation2017), U-Net, FCN8s (Long et al. Citation2015), Deeplabv3 (L.-C. Chen et al. Citation2018), and LinkNet (Chaurasia and Culurciello Citation2017), were compared with the BDNet on RSBD dataset. Then the ablation experiments were performed.

Implementation details

The experiments were implemented on the Pytorch. We set the learning rate to 10⁻⁴, and optimizer was the Adam. The decay of weight, momentum, and power was set to 2 × 10⁻⁵, 0.9, and 0.99, respectively. After training 1000 rounds, the loss value converged gradually, thus the number of iterations was set to 1000. The BOW dataset contained 1645 images, which was divided into 7:3, represented the training set, and test set, respectively. The input training set contains 1155 images with a size of 256 × 256 pixels, which included two types: BOW and others.

Evaluation metrics

In this work, we considered the extraction of BOW as a binary classification problem and classified the prediction results into BOW and others. The classification results were evaluated using a pixel-based confusion matrix. The category of BOW was called positive and the category of others was called negative. The correct prediction of the classifier was recorded as true, and the error prediction of the classifier was recorded as false. The four basic terms were combined, which together constituted four elements: true positive (TP) example, false positive (FP) example, false negative (FN) example, and true negative (TN) example. Four evaluation indexes were applied to assess the accuracy of the test results, including intersection over union (IoU), Mean IoU (MIoU), Accuracy, and F₁-score (Shrestha and Vanneschi Citation2018; Xue et al. Citation2021).

The intersection of the predicted values and actual values are divided with the concurrent set to form the index $IoU,$ description is shown in EquationEquation (4)(4) $IoU = \frac{T P}{T P + F P + F N} .$ (4) : (4) $IoU = \frac{T P}{T P + F P + F N} .$ (4)

$MIoU$ is the average of the $IoU$ values for all categories, description is shown in EquationEquation (5)(5) $MIoU = \frac{{IoU}_{BOW} + {IoU}_{others}}{2},$ (5) : (5) $MIoU = \frac{{IoU}_{BOW} + {IoU}_{others}}{2},$ (5) where ${IoU}_{BOW}$ represents the $IoU$ value of the category BOW and ${IoU}_{others}$ represents the $IoU$ value of the category others.

$Accuracy$ indicates the proportion of correctly predicted pixels to the sum of all pixels, represents the description is shown in EquationEquation (6)(6) $Accuracy = \frac{T P + T N}{T P + T N + F P + F N} .$ (6) : (6) $Accuracy = \frac{T P + T N}{T P + T N + F P + F N} .$ (6)

$Precision$ describes the percentage of the detected positive samples that are true positive samples. $Recall$ describes the percentage of the actually positive samples that are true positive samples. With $Precision$ and $Recall$ above, we can calculate $F_{1} - score,$ which is the summed average of these two index values, description is shown in EquationEquation (7)(7) $F_{1} - score = \frac{2 \times Precision \times Recall}{Precision + Recall} .$ (7) : (7) $F_{1} - score = \frac{2 \times Precision \times Recall}{Precision + Recall} .$ (7)

Comparison with advanced methods

We adopted several state-of-the-art segmentate networks to comprehensively assess the effect of BDNet on the RSBD dataset, including PSPNet, U-Net, FCN8s, Deeplabv3, and LinkNet. shows the quantitative results. Among the five models, BDNet achieved the highest scores in terms of IoU, MIoU, and F₁-score. PSPNet performed the worst as it misclassified many BOW pixels. Compared to FCN8s and its variant U-Net in F₁-score, BDNet improved by 2.66% and 2.11% respectively. Although BDNet is 0.22% lower than Deeplabv3 in terms of accuracy, BDNet is 1.4%, 0.58%, and 1.07% outperformed than Deeplabv3 in IoU of the category BOW, MIoU, and F₁-score, respectively. This indicates that BDNet has better BOW detection capabilities. Additionally, BDNet showed a 0.79% improvement in F₁-score compared to LinkNet. Overall, BDNet performed the best among the five models. BDNet integrated the spatial features and the channel features of images, and addressed the imbalance issues of BOW features and other features. BDNet can distinguished BOW from other categories more accurately and had the smallest misclassification rate. As shown in , the IoU of the category BOW was lower than the IoU of the category others, which may be because the BOW pixels in the image were much less than in the others pixels. This reason led to imbalanced learning in these networks.

Table 4. Quantitative results on the RSBD dataset compared our BDNet with common deep learning methods network.

Download CSV Display Table

As shown in , some visualization segmentation results are presented, which more intuitively shown that BDNet can better extract the BOW. PSPNet had the worst performance in extracting BOW. FCN8s and LinkNet easily recognized others as BOW in the fifth and seventh columns. U-Net and Deeplabv3 struggled to recognize BOW with narrow river channels, as seen in the fourth and sixth columns. In contrast, BDNet achieved segmentation results closest to the ground truth, demonstrating better detail segmentation and effectively solving the class imbalance problem. Overall, BDNet outperformed other segmentation networks on the RSBD dataset.

Figure 8. The visualization results of different networks on the RSBD dataset. (a) The original Gaofen-2 images. (b) The corresponding ground truth. (c–h) The result of PSPNet, U-Net, FCN8s, Deeplabv3, LinkNet, and BDNet, respectively.

Ablation experiments

To evaluate and analyze the effect of the augmented attention module and the MBL function, ablation experiments were performed. The baseline model adopted U-Net, CAM and SAM simultaneously appended to U-Net, which loss applied the cross-entropy loss function. Furthermore, the MBL function was added to the U-Net to assess its effect. The effectiveness of the module was verified through a series of experiments. shows the ablation experiments quantitative results.

Table 5. Ablation experiments on the channel attention module, the spatial attention module, and the median balancing loss function.

Download CSV Display Table

In , “Baseline” denotes U-Net, “CAM” and “SAM” simultaneously adds the encoder part, and “Loss” means that the MBL function is considered in U-Net. Experimental results showed that the IoU of the category BOW, MIoU, and F₁-score of the BDNet were 3.43%, 1.65%, and 2.65% improved than that of adding CAM and SAM to the baseline network, respectively. Meanwhile, the IoU of the category BOW, MIoU, and F₁-score of the BDNet were 1.94%, 1.06%, and 1.49% improved than that of adding the MBL function to the baseline network, respectively. The qualitative results are shown in , which intuitively indicated the effectiveness of adding augmented attention module and the MBL function to the baseline network. As shown in , the baseline U-Net showed poor performance of the BOW feature for narrow rivers. Introducing the augmented attention module resulted in more accurate detection of BOW. The baseline U-Net with the MBL function led to misclassification problem, while our BDNet overcame this challenge.

Figure 9. The visualization results of ablation experiment. (a) The original Gaofen-2 images. (b) The corresponding ground truth. (c) The results of baseline model. (d) The results of adding both channel attention module and spatial attention module in the baseline model. (e) The results of using the median balancing loss function in the baseline model. (f) The results of BDNet.

The augmented attention module improved the model’s ability to focus on important BOW features. Furthermore, the integration of the MBL function further contributed to the improved accuracy of BOW detection. Overall, the experimental results validated the effectiveness of the augmented attention module and the MBL function in augmenting the accuracy of BOW detection.

Conclusion

In this work, we built a remote sensing dataset for BOW detection, named RSBD dataset. It comprises GF-2 remote sensing data with a high spatial resolution of 1 m and covers representative polluted rivers in Yantai, China. The BOW dataset is the first of its kind for detecting BOW in Yantai, China and contains 1645 images. Of these, 1155 images are in the training set, while 490 images are in the test set. Then, a novel network based on the U-Net, referred to as BDNet, was designed to identify BOW for GF-2 remote sensing images, which incorporated an augmented attention module to emphasize BOW feature information. We selected several most common semantic segmentation methods to evaluate the effectiveness of BDNet on RSBD dataset. The experiment results indicated that the segmentation accuracy of BDNet preceded the other existing networks and also had a better performance in segmentation details.

Practically, the dataset is still insufficient to cover all practical situations. In the next step, we plan to pay more attention to adding other signal sources, such as thermal infrared remote sensing images, to optimize and enrich our dataset. Then, a better BOW detection method will be proposed, and verifying its feasibility.

Acknowledgments

We appreciate the China Centre for Resources Satellite Data and Application providing the Gaofen-2 images.

Disclosure statement

No conflict of interest was reported by the author (s).

Additional information

Funding

This research was supported by the National Natural Science Foundation of China under Grants [62072391 and 62066013].

References

Chaurasia, A., and Culurciello, E. 2017. "LinkNet: Exploiting encoder representations for efficient semantic segmentation." IEEE Visual Communications and Image Processing, St. Petersburg, FL, USA.
Google Scholar
Chen, L., Letu, H., Fan, M., Shang, H., Tao, J., Wu, L., Zhang, Y., et al. 2022. “An Introduction to the Chinese High-Resolution Earth Observation System: Gaofen-1 ∼ 7 Civilian Satellites.” Journal of Remote Sensing, Vol. 2022: pp. 9769536. doi:10.34133/2022/9769536.
Google Scholar
Chen , L.-C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A.L. 2018. “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs.” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40(No. 4): pp. 834–848. doi:10.1109/TPAMI.2017.2699184.
PubMed Web of Science ®Google Scholar
Chen, Q., Wang, L., Wu, Y.F., Wu, G.M., Guo, Z.L., and Waslander, S.L. 2019. “Aerial imagery for roof segmentation: A large-scale dataset towards automatic mapping of buildings.” ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 147: pp. 42–55. doi:10.1016/j.isprsjprs.2018.11.011.
Web of Science ®Google Scholar
Duan, H., Ma, R., Loiselle, S.A., Shen, Q., Yin, H., and Zhang, Y. 2014. “Optical characterization of black water blooms in eutrophic waters.” The Science of the Total Environment, Vol. 482–483: pp. 174–183. Vol doi:10.1016/j.scitotenv.2014.02.113.
PubMed Web of Science ®Google Scholar
Eigen, D., and Fergus, R. 2015. "Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture." IEEE International Conference on Computer Vision, Santiago, Chile.
Google Scholar
Horbe, A.M.C., and Santos, AGdS. 2009. “Chemical composition of black-watered rivers in the western Amazon Region (Brazil).” Journal of the Brazilian Chemical Society, Vol. 20(No. 6): pp. 1119–1126. doi:10.1590/S0103-50532009000600018.
Web of Science ®Google Scholar
Kampffmeyer, M., Salberg, A.-B., and Jenssen, R. 2016. "Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks." IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA.
Google Scholar
Kutser, T., Paavel, B., Verpoorter, C., Ligi, M., Soomets, T., Toming, K., and Casal, G. 2016. “Remote sensing of black lakes and using 810 nm reflectance peak for retrieving water quality parameters of optically complex waters.” Remote Sensing, Vol. 8 ((No. 6): pp. 497. doi:10.3390/rs8060497.
Web of Science ®Google Scholar
Liu, C., Hu, Z., Hao, X., and Bai, Y. 2011. “Progress in the development of black-odour prediction models for urban rivers.” Journal of East China Normal University (Natural Science), Vol. 1(No. 1): pp. 43–54. doi:10.3969/j.issn.1000-5641.2011.01.005.
Google Scholar
Liu, R., Jiang, W., Li, F., Pan, Y., Wang, C., and Tian, H. 2021. “Occurrence, partition, and risk of seven heavy metals in sediments, seawater, and organisms from the eastern sea area of Shandong Peninsula, Yellow Sea, China.” Journal of Environmental Management, Vol. 279 ((No. No. 1): pp. 111771. doi:10.1016/j.jenvman.2020.111771.
PubMedGoogle Scholar
Liu, Z., Yang, D., Wang, Y., Lu, M., and Li, R. 2023. “EGNN: Graph structure learning based on evolutionary computation helps more in graph neural networks.” Applied Soft Computing, Vol. 135: pp. 110040. doi:10.1016/j.asoc.2023.110040.
Web of Science ®Google Scholar
Long, J., Shelhamer, E., and Darrell, T. 2015. "Fully convolutional networks for semantic segmentation." IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA.
Google Scholar
Luckin, B. 2006. “London’s Thames: The River That Shaped a City and Its History.” Technology and Culture, Vol. 47(No. 4): pp. 846–847. doi:10.1353/tech.2006.0239.
Web of Science ®Google Scholar
Lu, Y., Gao, X., Song, J., Chen, C.-T.A., and Chu, J. 2020. “Colloidal toxic trace metals in urban riverine and estuarine waters of Yantai City, southern coast of North Yellow Sea.” The Science of the Total Environment, Vol. 717: pp. 135265. doi:10.1016/j.scitotenv.2019.135265.
PubMed Web of Science ®Google Scholar
Lu, G.-H., Ma, Q., and Zhang, J.-H. 2011. “Analysis of black water aggregation in Taihu Lake.” Water Science and Engineering, Vol. 4(No. 4): pp. 374–385. doi:10.3882/j.issn.1674-2370.2011.04.002.
Google Scholar
Meng, C., Song, X., Tian, K., Ye, B., and Si, T. 2020. “Spatiotemporal Variation Characteristics of Water Pollution and the Cause of Pollution Formation in a Heavily Polluted River in the Upper Hai River.” Journal of Chemistry, Vol. 2020: pp. 1–15. doi:10.1155/2020/6617227.
Web of Science ®Google Scholar
Moortgat, J., Li, Z., Durand, M., Howat, I., Yadav, B., and Dai, C. 2022. “Deep learning models for river classification at sub-meter resolutions from multispectral and panchromatic commercial satellite imagery.” Remote Sensing of Environment, Vol. 282: pp. 113279. doi:10.1016/j.rse.2022.113279.
Web of Science ®Google Scholar
Nambiar, K.G., Morgenshtern, V.I., Hochreuther, P., Seehaus, T., and Braun, M.H. 2022. “A Self-Trained Model for Cloud, Shadow and Snow Detection in Sentinel-2 Images of Snow-and Ice-Covered Regions.” Remote Sensing, Vol. 14 ((No. 8): pp. 1825. doi:10.3390/rs14081825.
Web of Science ®Google Scholar
Olmanson, L.G., Brezonik, P.L., and Bauer, M.E. 2011. “Evaluation of medium to low resolution satellite imagery for regional lake water quality assessments.” Water Resources Research, Vol. 47(No. 9): pp. 1–14. doi:10.1029/2011WR011005.
Google Scholar
Pu, F., Ding, C., Chao, Z., Yu, Y., and Xu, X. 2019. “Water-quality classification of inland lakes using Landsat8 images by convolutional neural networks.” Remote Sensing, Vol. 11 ((No. 14): pp. 1674. doi:10.3390/rs11141674.
Web of Science ®Google Scholar
Ronneberger, O., Fischer, P., and Brox, T. 2015. "U-net: Convolutional networks for biomedical image segmentation." International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, Cham.
Google Scholar
Saha, N., Rahman, M.S., Ahmed, M.B., Zhou, J.L., Ngo, H.H., and Guo, W. 2017. “Industrial metal pollution in water and probabilistic assessment of human health risk.” Journal of Environmental Management, Vol. 185: pp. 70–78. doi:10.1016/j.jenvman.2016.10.023.
PubMed Web of Science ®Google Scholar
Shao, H., Ding, F., Yang, J., and Zheng, Z. 2022. “Model of Extracting Remotely-sensed Information of Black and Odorous Water Based on Deep Learning.” Journal of Yangtze River Scientific Research Institute, Vol. 39(No. 4): pp. 156–162. doi:10.11988/ckyyb.20210045.
Google Scholar
Shen, Q., Yao, Y., Li, J.S., Zhang, F.F., Wang, S.L., Wu, Y.H., Ye, H.P., and Zhang, B. 2019. “A CIE color purity algorithm to detect black and odorous water in urban rivers using high-resolution multispectral remote sensing images.” IEEE Transactions on Geoscience and Remote Sensing, Vol. 57(No. 9): pp. 6577–6590. doi:10.1109/TGRS.2019.2907283.
Web of Science ®Google Scholar
Shrestha, S., and Vanneschi, L. 2018. “Improved fully convolutional network with conditional random fields for building extraction.” Remote Sensing, Vol. 10 ((No. 7): pp. 1135. doi:10.3390/rs10071135.
Web of Science ®Google Scholar
Singh, A., Kalke, H., Loewen, M., and Ray, N. 2020. “River ice segmentation with deep learning.” IEEE Transactions on Geoscience and Remote Sensing, Vol. 58(No. 11): pp. 7570–7579. doi:10.1109/TGRS.2020.2981082.
Web of Science ®Google Scholar
Sun, W., Yang, G., Chen, C., Chang, M., Huang, K., Meng, X., and Liu, L. 2020. “Development status and literature analysis of China’s earth observation remote sensing satellites.” National Remote Sensing Bulletin, Vol. 24(No. 5): pp. 479–510. doi:10.11834/jrs.20209464.
Google Scholar
Tong, X., Zhao, W., Xing, J., and Fu, W. 2016. "Status and development of China high-resolution Earth observation system and application." IEEE International Geoscience and Remote Sensing Symposium, Beijing, China.
Google Scholar
Torralba, A., Russell, B.C., and Yuen, J. 2010. “Labelme: Online image annotation and applications.” Proceedings of the IEEE, Vol. 98(No. 8): pp. 1467–1484. doi:10.1109/JPROC.2010.2050290.
Web of Science ®Google Scholar
Wambugu, N., Chen, Y., Xiao, Z., Wei, M., Bello, S.A., Junior, J.M., and Li, J. 2021. “A hybrid deep convolutional neural network for accurate land cover classification.” International Journal of Applied Earth Observation and Geoinformation, Vol. 103: pp. 102515. doi:10.1016/j.jag.2021.102515.
Web of Science ®Google Scholar
Wang, Y., Dong, Z., Liu, D., and Di, B. 2013. “Variation of spatial and temporal distributions of phytoplankton community in coastal waters of Yantai.” Marine Science Bulletin, Vol. 32(No. 4): pp. 408–420. doi:10.11840/j.issn.1001-6392.2013.04.008.
Google Scholar
Wang, Y., Yao, J., Yang, P., Zhang, Y., Sun, Y., and Cui, N. 2022. “Dynamic remote sensing monitoring and its influence factors analysis for urban black and odorous water body management and treatment in Beijing, China.” Chinese Journal of Environmental Engineering, Vol. 16(No. 9): pp. 3092–3101. doi:10.12030/j.cjee.202206034.
Google Scholar
Wei, C., Zheng, Q., Shang, Y., Zhang, X., Yin, J., and Shen, Z. 2021. "Black and Odorous Water Monitoring by Using GF Series Remote Sensing Data." International Conference on Agro-Geoinformatics, Shenzhen, China.
Google Scholar
Woo, S., Park, J., Lee, J.-Y., and Kweon, I. S. 2018. "CBAM: Convolutional Block Attention Module." European Conference on Computer Vision, Munich, Germany.
Google Scholar
Xia, G., Hu, J., Hu, F., Shi, B., Bai, X., Zhong, Y., Zhang, L., and Lu, X. 2017. “AID: A benchmark data set for performance evaluation of aerial scene classification.” IEEE Transactions on Geoscience and Remote Sensing, Vol. 55(No. 7): pp. 3965–3981. doi:10.1109/TGRS.2017.2685945.
Web of Science ®Google Scholar
Xu, X., Cao, Z., Zhang, Z., Li, R., and Hu, B. 2016. “Spatial distribution and pollution assessment of heavy metals in the surface sediments of the Bohai and Yellow Seas.” Marine Pollution Bulletin, Vol. 110(No. 1): pp. 596–602. doi:10.1016/j.marpolbul.2016.05.079.
PubMed Web of Science ®Google Scholar
Xue, W., Yang, H., Wu, Y., Kong, P., Xu, H., Wu, P., and Ma, X. 2021. “Water Body Automated Extraction in Polarization SAR Images With Dense-Coordinate-Feature-Concatenate Network.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, Vol. 14: pp. 12073–12087. doi:10.1109/JSTARS.2021.3129182.
Web of Science ®Google Scholar
Yao, Y., Shen, Q., Zhu, L., Gao, H., Cao, H., Han, H., Sun, J., and Li, J. 2019. “Remote sensing identification of urban black-odor water bodies in Shenyang city based on GF-2 image.” National Remote Sensing Bulletin, Vol. 23(No. 2): pp. 230–242. doi:10.11834/jrs.20197482.
Google Scholar
Yu, J., Zhao, Y., Yang, M., Lin, T.-F., Guo, Z., Gu, J., Li, S., and Han, W. 2009. “Occurrence of odour-causing compounds in different source waters of China.” Journal of Water Supply: Research and Technology-Aqua, Vol. 58(No. 8): pp. 587–594. doi:10.2166/aqua.2009.023.
Web of Science ®Google Scholar
Zhang, D., Yang, H., Lan, S., Wang, C., Li, X., Xing, Y., Yue, H., Li, Q., Wang, L., and Xie, Y. 2022. “Evolution of urban black and odorous water: The characteristics of microbial community and driving-factors.” Journal of Environmental Sciences, Vol. 112: pp. 94–105. doi:10.1016/j.jes.2021.05.012.
Google Scholar
Zhao, J., Hu, C., Lapointe, B., Melo, N., Johns, E.M., and Smith, R.H. 2013. “Satellite-observed black water events off Southwest Florida: Implications for coral reef health in the Florida Keys National Marine Sanctuary.” Remote Sensing, Vol. 5(No. 1): pp. 415–431. doi:10.3390/rs5010415.
Web of Science ®Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. 2017. "Pyramid scene parsing network." IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.
Google Scholar

Black and Odorous Water Detection of Remote Sensing Images Based on Improved Deep Learning

Détection des eaux noires et odorantes dans les images de télédétection basée sur l'apprentissage profond amélioré

Abstract

RÉSUMÉ

Introduction