Search in:

European Journal of Remote Sensing Volume 56, 2023 - Issue 1

Submit an article Journal homepage

Open access

1,151

Views

CrossRef citations to date

Altmetric

Listen

Research Article

Fuzzy neighbourhood neural network for high-resolution remote sensing image segmentation

Tingting QuSchool of Computer and Control Engineering, Yantai University, Yantai, Shandong, ChinaView further author information

Jindong XuSchool of Computer and Control Engineering, Yantai University, Yantai, Shandong, China

https://orcid.org/0000-0001-6688-5014 View further author information

Qianpeng ChongSchool of Computer and Control Engineering, Yantai University, Yantai, Shandong, ChinaView further author information

Zhaowei LiuSchool of Computer and Control Engineering, Yantai University, Yantai, Shandong, ChinaView further author information

Weiqing YanSchool of Computer and Control Engineering, Yantai University, Yantai, Shandong, ChinaView further author information

Xuan WangSchool of Computer and Control Engineering, Yantai University, Yantai, Shandong, ChinaView further author information

Yongchao SongSchool of Computer and Control Engineering, Yantai University, Yantai, Shandong, ChinaView further author information

Mengying NiSchool of Computer and Control Engineering, Yantai University, Yantai, Shandong, ChinaCorrespondence[email protected]
View further author information

show all

Article: 2174706 | Received 30 Jun 2022, Accepted 26 Jan 2023, Published online: 16 Feb 2023

Cite this article
https://doi.org/10.1080/22797254.2023.2174706
CrossMark

In this article

ABSTRACT
Introduction
Methodology
Experiments and analysis
Discussion
Conclusion
Acknowledgements
Disclosure statement
Additional information
References

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

Remote sensing image segmentation plays an important role in many industrial-grade image processing applications. However, the problem of uncertainty caused by intraclass heterogeneity and interclass blurring is prevalent in high-resolution remote sensing images. Moreover, the complexity of information in high-resolution remote sensing images leads to a large amount of background information around objects. To solve this problem, a new fuzzy convolutional neural network is proposed in this paper. This network resolves the ambiguity and uncertainty of feature information by introducing a fuzzy neighbourhood module in the deep learning network structure. In addition, it adds a multi-attention gating module to highlight small object features and separate them from the complex background information to achieve fine segmentation of high-resolution remote sensing images. Experimental results on three different segmentation datasets suggest that the proposed method has higher segmentation accuracy and better performance than other deep learning networks, especially for complicated shadow information. Code will be provided in (https://github.com/tingtingqu/code).

KEYWORDS:

Fuzzy neighbourhood
high-resolution remote sensing image
image segmentation
multi-attention gating

Introduction

Remote sensing image segmentation technology has been widely used in urban planning (Wurm et al., Citation2019), environmental protection (Song et al., Citation2018), climate change, and other fields (Salzano et al., Citation2019). However, it is still a challenging task due to the complexity of the objects in remote sensing image, such as the wide observation range, high information complexity and unstable imaging quality (X. Li et al., Citation2021).

A considerable amount of literature has been published on the remote sensing image segmentation. The conventional methods rely on the priori knowledge for manual selection of target features, and there is a gap with the actual semantic features in the process of building the corresponding segmentation model (Guo et al., Citation2018; Han & Wu, Citation2017; Yuan et al., Citation2014; Jabri et al., Citation2014). In recent years, convolutional neural networks have been extensively used in high-resolution remote sensing image segmentation due to their superior nonlinear representation capabilities (Pan et al., Citation2021), which can learn deeper and more essential features from massive sample data (Wang et al., Citation2022; Pan et al., Citation2021). From scene classification (Zhang et al., Citation2016), semantic segmentation (Bao et al., Citation2021; R. Liu et al., Citation2020), instance segmentation to panoramic segmentation (de Carvalho, de Carvalho Júnior, Silva, et al., Citation2022). Segmentation approaches vary from the patch-based methods (Zhang et al., Citation2018; Sharma et al., Citation2017) to a large variety of convolutional neural networks (Caesar et al., Citation2018; Wang et al., Citation2020; M. Liu et al., Citation2020; Ni et al., Citation2019; X. Li et al., Citation2019; Yue et al., Citation2016; Z. Zhao et al., Citation2021). M. Chen et al. (Citation2021) added dense connection blocks and residual structures to the DeepLabv3+ encoder and decoder to overcome the drawback of incomplete fusion of shallow-extracted low-level features and depth-extracted abstract features. Hua et al. (Citation2021) proposed a cascade panoramic segmentation network for high-resolution remote sensing images. This method integrates the complementary features of different stages by sharing foreground instance segmentation with background semantic segmentation. Khoshboresh-Masouleh and Shah-Hosseini (Citation2021a) proposed a building panoramic change segmentation method based on the squeeze-and-attention convolutional neural network. The proposed method provided better panoptic segmentation performance for bitemporal images. de Carvalho, de Carvalho Júnior, de Albuquerque, et al. (Citation2022) introduced all-optical segmentation of multi-spectral remote sensing data and evaluated different configurations regarding band arrangements.

Remote sensing images have the similarity among non-homogeneous targets and the diversity of homogeneous targets. The boundary of ground objects is confused and even shadow overlap, which leads to uncertainty in the remote sensing image segmentation (Zhao, Xu et al., Citation2021). shows an appropriate example (The building shadows obscure the boundaries of cars and roads, and buildings are intricately connected to low vegetation boundaries). Some methods utilize a multiscale feature fusion strategy (Abdollahi et al., Citation2020) and multiscale structure to extract feature maps with large receptive fields to recognize those challenging and complicated objects (H. Zhao et al., Citation2017; Yuan et al., Citation2020; Zheng et al., Citation2019). But these methods ignore the problem that the semantic information of objects is covered and the scale sensing receptive field of feature map does not match. Fortunately, the fuzzy learning systems (Mylonas et al., Citation2013; Sugeno & Yasukawa, Citation1993) can capture the ambiguity of human thinking from a macroscopic view, and it has the unique strength in solving uncertainty problems (Huang et al., Citation2021; Lu et al., Citation2020; Nida et al., Citation2019; Pal & Sudeep, Citation2016). Extensive researchers have attempted to apply fuzzy systems to vision tasks. Deng et al. (Citation2016) proposed a fuzzy neural network for segmentation to reduce the data noise caused by image uncertainty by fusing fuzzy features and convolutional features. Zhang et al. (Citation2017) proposed a deep belief network that uses the fuzzy c-means clustering algorithm to partition the input space and implicitly creates a deep belief network by defining a fuzzy affiliation function. Hurtik et al. (Citation2019) proposed a new image structure that reduces the error classification during pre-processing by transforming clear pixel values to fuzzy values. To this end, we propose a fuzzy neighbourhood module to learn the relationship between pixels and their neighbourhood and solve the uncertainty problem in the remote sensing image semantic segmentation.

Figure 1. Boundary pixels and shadows in remote sensing image. (a) the building shadows obscure the boundaries of cars and roads. (b) the buildings are intricately connected to low vegetation boundaries.

In addition, remote sensing images also have the characteristic of high complexity (Xu et al., Citation2021; Zhou et al., Citation2019) and different object sizes (Xiao et al., Citation2018). The attention mechanism can selectively focus on a certain part of the image, which combines information from different regions (Wei et al., Citation2021). Qi et al. (Citation2020) added an attention module to the convolutional neural network (CNN). Their network refines the nonlinear boundaries of objects in remote sensing images by using multi-scale convolution and attention mechanisms. Li, Zheng, et al. (Citation2021) proposed a dense jump-connected network with a multi-scale feature fusion attention mechanism to solve the problem of detailed and blurred edge loss during high-resolution down-sampling of remote sensing images. Sun et al. (Citation2020) proposed a boundary-aware semi-supervised semantic segmentation network, which uses a channel-weighted multi-scale feature module to balance semantic and spatial information and a boundary attention module to weight the boundary information.

Inspired by the above analysis and discussion, this paper proposes a fuzzy neighbourhood convolutional neural network for high-resolution remote sensing image segmentation. Due to the low contrast at the boundaries, such as shaded and occluded areas, the segmentation results for the boundary pixels are likely to be uncertain. We use the fuzzy neighbourhood module to eliminate unfavourable factors that affect inter-class noise and intra-class complexity, and the multi-attention gating module to acquire object features. The main contributions of this paper can be summarized as follows.

We propose a new architecture that combines fuzzy learning and convolutional neural network for high-resolution remote sensing image segmentation.
The fuzzy neighbourhood module is used to process each pixel and learn the relationship between one pixel and its neighbourhood pixels, so as to tackle the problem of the diversity of homogeneous objects and the similitude among non-homogeneous objects in remote sensing images.
A multi-attention gating module is introduced to use low-level features to provide guidance information for high-level features and to highlight small object detail information in the feature map through weighting indices.
The effectiveness of the proposed method has been demonstrated by visualizing the segmentation results and by objective indexes in a comparative experimental analysis using three benchmark remote sensing datasets.

Methodology

The network architecture is shown in . It is designed with an encoder-decoder structure, which is one of the most common frameworks of deep neural networks. Because some significant spatial information may be lost in the encoder, the multi-attention gating module is introduced into the down-sampling stage to fuse the detail information of low-level features with the semantic information of high-level features and thereby prevent the loss of spatial information in the low-level features. The outputs of the fuzzy neighbourhood module and the results from the multi-attention gating module are input to the decoder, and the non-local features are used as local features for information guidance to complete the final feature map prediction.

Figure 2. Network architecture.

Feature extraction structure

The symmetric network architecture can efficiently extract global and local information from feature intervals. The encoder in this network is used for multi-scale feature extraction from high-resolution remote sensing images. Because the scale information obtained from the convolution modules of different layers is not exactly the same, by feeding information at different scales to the multi-attention gating module, the connection between low-level features and high-level features achieves a refinement effect, suppressing irrelevant regions and highlighting useful features in salient regions. To solve the problem of foreground-background information imbalance, the features of the multi-attention gating module and the deconvolution module are stitched several times. The encoder and decoder are denoted by $\{E_{n 1}, E_{n 2}, E_{n 3}, E_{n 4}, E_{n 5}\}$ and $\{D_{e 1}, D_{e 2}, D_{e 3}, D_{e 4}, D_{e 5}\}$ , respectively. The input image is first passed through five convolutional activation blocks in the encoder to generate a fine feature map $F_{d}$ with $\{F_{1}, F_{2}, F_{3}, F_{4}, F_{5}\}$ in the horizontal and vertical directions. $F_{1}$ to $F_{5}$ denote the image features at five different resolutions. The network integrates various feature maps with different scales into a unified feature map for prediction. Specifically, the output result of the convolution blocks in the final layer is obtained by (1).

(1)

S_{n} = \{\begin{matrix} g_{n} (F_{d}, θ_{n}), n = 1, 2 \\ g_{n} (S_{n - 1} + F_{d}, θ_{n}), n = 3, 4, 5 \end{matrix} .

(1)

where $g_{n}$ denotes the extraction process of $E_{n}$ , $θ_{n}$ is the learning parameter, $F_{d}$ is the output of the encoder, and $S_{n}$ is the output of the $n$ th encoder.

The fuzzy neighbourhood module

Compared with the method of pre-processing fuzzy data and cascading to neural networks, our fuzzy neighbourhood module can extract fuzzy features from a clear set of limited information. This can solve the problem of intraclass heterogeneity caused by complex feature information in high-resolution remote sensing images by using pixel-to-pixel fuzzy relationships. The module processes the feature map instead of the original image. This avoids the information confusion caused by the phenomena of high contrast and border shadows in the original image.

The fuzzy neighbourhood module falsifies a fuzzy neighbourhood set Z for feature points $p_{i} \in X$ in each channel by fuzzifying the function $P \to Z (U) = {z_{i} | z_{i} : P \to [0, 1]}$ , where $P = {\{p_{i}\}}_{i = 1}^{N}, N = H \times W$ , $H$ denotes the height of the feature map, and $W$ denotes its width.

As shown in , let $F$ be the input feature map with size $H \times W \times C$ , where $C$ denotes the number of channels. For the input feature map $F$ , we use $M$ fuzzy functions to fuzzy transform each point in $F$ with the help of the convolution operation. It is worth noting that $M$ remains the same for all channels of each feature map. For each channel in a different feature map, $M$ is different. $M$ varies only between different input feature maps, and is related to the feature map size, not to the number of channels.

Figure 3. Fuzzy neighbourhood module.

To obtain the fuzzy set Z, the fuzzy function uses a Gaussian fuzzy function, as shown in (2).

(2)

Z_{x, y, k, c} = e^{- {(\frac{p_{x, y, k, c} - μ_{k, c}}{σ_{x, y, c}})}^{2}} .

(2)

where $(x, y)$ are the coordinates of the feature point $p_{x, y, c}$ in channel $C$ . $μ_{k, c}$ and $σ_{k, c}$ denote the mean difference and standard deviation, respectively.

The fuzzy degree $γ_{i, j} \in [0, 1]$ denotes the fuzzy relationship between the feature points $p_{i}$ and $p_{j}$ in $γ_{i, j}$ , and the fuzzy value of each pixel is calculated from the fuzzy degree. For $M$ fuzzy functions, it is necessary to recombine them through defined fuzzy logic decision rules; each Gaussian function can learn the original eigenvalue values under various distributions. The definition of fuzzy degree is as follows:

(3)

Z_{x, y, c} = Π_{k = 1}^{M} Z_{x, y, c} .

(3)

Given the feature point $P = {\{p_{j}\}}_{i = 1}^{N}$ in a channel, a fuzzy neighbour set $Z$ is calculated for each channel according to the proposed fuzzification process. After obtaining the fuzzy neighbourhood of different classes, they are passed through a convolutional normalization module to ensure that they have the same size as the input feature map $F$ . It is then stitched with the original feature map to obtain the final output feature map, which enters the next branch.

The multi-attention gating module

The input image is processed by the convolution module to obtain feature maps with different sizes. A shallow network acquires feature maps with higher resolution that contain more local detail, whereas a deep network contains more semantic information. Because remote sensing images suffer from an uneven distribution between foreground and background, low contrast, and large variability in size and shape, small objects in the foreground are not easily recognized. To achieve accurate semantic segmentation, the high-level semantic information should be preserved while the low-level detail information is also considered. To achieve this, we introduce a multi-attention gating module to refine different feature details. Finally, the splicing design is adopted to ensure the stability of propagation.

As shown in , assuming that $X_{L}$ is a vector from the low-level feature channel and $X_{H}$ is a vector from the high-level feature channel, multi-attention gating is defined by (4).

(4)

X_{0} = \frac{1}{C (X)} f (X_{H}, X_{L}) g (X_{R}) .

(4)

Figure 4. Multi-attention gating module.

where $X_{0}$ is the output of the module, $f$ is the corresponding function that calculates the relationship between $X_{H}$ and $X_{L}$ , $\frac{1}{C (X)}$ is the normalization factor (referred to here as the attention factor), and $g$ is the unary function that calculates the input vector. For each element of $X_{L}$ and $X_{H}$ there is a corresponding $X_{R}$ response. EquationEquation (4)(4) $X_{0} = \frac{1}{C (X)} f (X_{H}, X_{L}) g (X_{R}) .$ (4) is converted to a computable neural network module by (5).

(5)

α_{i} = ξ (τ E_{e} + τ F_{d}^{'}) .

(5)

where $E_{e}$ and ${F_{d}}^{'}$ denote the output of the deconvolution information and the original information, respectively. $τ$ denotes the convolution block formed by the convolution, normalization, and activation layers. $ξ$ denotes the activation function. The coefficients $α_{i} \in [0, 1]$ are used to identify the foreground region containing small objects and construct the feature map response information.

The output of the multi-attention gating module is the product of the input characteristic graph and the attention coefficient $α_{i} E_{e}$ . The module computes a single scalar attention value for each pixel vector. Thus, each attention learns to focus on a subset of the target structure rather than the entire background information. In this manner, it is more convenient to focus on a specific area and thereby reduce the amount of redundant computation. The multi-attention gating module trims the low-level feature information through contextual information, connects low-level features with high-level features to achieve information refinement. The overall definition of the module is as follows:

(6)

P_{E} = τ (E_{e} \otimes α_{i} F_{d}^{'}) .

(6)

where $E_{e}$ denotes the output deconvolution information, ${F_{d}}^{'}$ denotes the information after the multi-attention gating module, $τ$ denotes the convolution block formed by the convolution, normalization, and activation layers, and $\otimes$ denotes the splicing operation.

Experiments and analysis

In this section, we evaluate the effectiveness, superiority, and generalization ability of the proposed network. First, we describe the three datasets used and present the details of the experimental setup. Second, we report quantitative and qualitative comparison results for three datasets. Finally, we demonstrate the effectiveness of our network through ablation experiments.

Datasets and experimental setup

To validate the effectiveness of the proposed model for segmentation, we selected the China Computer Federation (CCF) dataset (Alam et al., Citation2021), the Vaihingen dataset from the ISPRS 2D Semantic Benchmark (Sherrah, Citation2016), and the IND.v2 dataset (Khoshboresh-Masouleh & Shah-Hosseini, Citation2021b).

The CCF dataset

The CCF dataset was derived from the “CCF Satellite Image AI Classification and Recognition Competition”, a high-resolution remote sensing image of a region in southern China captured in 2015. The types of features are divided into five categories: low vegetation, roads, buildings, water body, and others. Because the sizes of the original dataset images ranged from 4000 × 2000 pixels to 8000 × 8000 pixels, we sliced the sample images into slices of size 320 × 320 pixels and extended the data with remote sensing images and the corresponding labels by using OpenCV. Finally, we obtained 10, 489 slices for training and 3011 slices for testing.

The Vaihingen dataset

The Vaihingen dataset was imaged in the Vaihingen area of Stuttgart, Germany. It shows many separate buildings. The objects are divided into impervious surfaces, buildings, low vegetation, trees, clutter, and cars. This dataset includes 33 images, each with three bands, corresponding to near-infrared, red, and green wavelengths, and the ground sampling is 9 cm. In this study, we used 16 patches for training and 17 patches for testing. Finally, we used 1324 and 395 slices for training and testing, respectively.

The IND.V2 dataset

The IND.v2 dataset shows many separate buildings. The objects are divided into shadows, occluded areas, vegetation covers, complex roofs, and dense building areas. This dataset contains 294 images with a size of 1024 × 1024 pixels. This study used 256 patches for training and 38 patches for testing. We sliced the sample images into slices of size 520 × 520 pixels. Finally, we obtained 1536 slices for training and 228 slices for testing.

Implementation details

The PyTorch deep learning framework was used to conduct experiments on high-performance computing equipment with an Intel Core I7-7820× CPU and an NVIDIA GeForce GTX 1080 Ti GPU (with 23.2 GB RAM). The maximum boost frequency of this CPU is 3.6 GHz, each instruction can perform four single-precision floating-point operations, and the CPU has eight physical cores. The operating system was Linux, the compiler was PyCharm Community, and the program was written in Python 3.6. The image processing functionality was provided by open-source computer vision libraries. A commonly used stochastic gradient descent optimizer (Adam) guided the optimization. The learning rate was 0.000,1 and the base was set to 0.95. Because of the limited physical memory on our GPU card, the batch size was set to 4 during training.

The evaluation metrics used are derived from those used in (Qi et al., Citation2020): Overall Accuracy (OA), Recall, Precision, F1 Score, Mean Intersection over Union (MIoU) and Frequency Weighted Intersection over Union (FWIoU).

Comparison with existing methods

Results on the CCF dataset

The efficiency of our network was compared with those of UNet (Ronneberger et al., Citation2015), SegNet (Badrinarayanan et al., Citation2017), FCN (Long et al., Citation2015), Deeplabv3+ (L. C. Chen et al., Citation2018), LinkNet (Chaurasia & Culurciello, Citation2017), EffuNet, Deeplabv3, MacuNet (Li, Duan, et al., Citation2021) and RSFCNN (Zhao et al., Citation2021). EffuNet is a combination of UNet and Efficient-net-B7 baseline. Example results of the proposed network on the CCF dataset are shown in . Our segmentation result is complete for low vegetation, roads, and buildings. In comparing with other methods, the segmentation results of our network have clearer boundaries and are more accurate for some segmentation details within the images. Representative detailed results for the CCF database are presented as an example in . In the first row of , our network segments images with more complete buildings and roads simultaneously. The segmentations of roads and buildings are not mixed with too much orange background information. As shown in the second row of , SegNet and UNet incorrectly classify the buildings as low vegetation. In contrast, the segmentation performed by our model is a little more accurate. As shown by the third row of , our network has the advantage of identifying disturbing factors, such as buildings affected by shadows and light. The above results prove that our network can perform robust feature extraction and high-resolution detail recovery from high-resolution remote sensing images.

Figure 5. Segmentation results on the CCF dataset. (a) Original Image. (b) Ground truth. (c) UNet. (d) SegNet. (e) Deeplabv3. (f) FCN. (g) Deeplabv3+. (h) LinkNet. (i) MacuNet. (j) EffuNet. (k) RSFCNN. (l) Ours.

Figure 6. Visual examples of the CCF dataset. (a) Original Image. (b) Ground truth. (c) UNet. (d) SegNet. (e) FCN. (f) LinkNet. (g) MacuNet. (h) RSFCNN. (i) Ours.

shows the OA, Precision, Recall, FWIoU, MIoU, and Mean F1 of the different methods on the CCF dataset. Our method is superior to the comparison network in terms of Recall and MIoU. Among the indices, our model scores worse than LinkNet methods for Precision. Compared with RSFCNN, it achieves improvements of 3.486% in Mean F1, respectively. Compared with LinkNet, our method improves the OA and Recall by 0.846% and 3.453%, respectively. shows the detailed results for each class of the CCF dataset. It shows that our method achieves the highest F1 Score for buildings, road, and low vegetation. These results also prove the effectiveness of our network.

Table 1. Performance comparison of different methods on the CCF dataset.

Download CSV Display Table

Table 2. F1 score of different classes on the CCF dataset.

Download CSV Display Table

Results on the Vaihingen dataset

To further evaluate the effectiveness of our method, we also conducted comparative experiments on the Vaihingen dataset. The experimental results are presented in , which shows that our network achieved the best performance in Mean F1, MIoU, and Precision. The visualization results on the test set are shown in .

Figure 7. Segmentation results on the Vaihingen dataset. (a) Original Image. (b) Ground truth. (c) UNet. (d) SegNet. (e) Deeplabv3. (f) FCN. (g) Deeplabv3+. (h) LinkNet. (i) MacuNet. (j) EffuNet. (k) RSFCNN. (l) Ours.

Table 3. Performance comparison of different methods on the Vaihingen dataset.

Download CSV Display Table

The prediction results based on this dataset show that our method achieved a good segmentation effect on the impervious surfaces in the cyan area of Vaihingen, with more explicit building boundaries and attenuated blurred mismatch caused by shadows. In addition, objects with almost the same colour were differentiated well into their categories. For example, as shown in the first row of , SegNet and UNet incorrectly classify buildings as trees because of their shadows. RSFCNN, although relatively accurate in classification, introduced too much background information in tree classification. In contrast, our network performs optimal segmentation. As shown in the third row of , our results for buildings are significantly better than those of the other networks in large areas of low vegetation and trees.

The quantitative results for the Vaihingen dataset are shown in . These results show that our model outperformed other methods with respect to Mean F1 and MIoU. The Mean F1 of our model is 1.382% greater than those of UNet. The OA of our model is 0.541% greater than that of Deeplabv3+. Our model also demonstrates significant advantages in handling small objects. For example, for the classification of buildings and cars, the F1 Score of our model is improved by 3.072% and 0.561%, respectively, compared with that of RSFCNN.

Table 4. F1 score of different classes on the Vaihingen dataset.

Download CSV Display Table

Results on the IND.V2 dataset

We conducted the experiments that are based on the IND.v2 dataset. We performed plenty of visualization processing on the prediction results, as shown in . Since the dataset consists of a large number of single small objects, as shown in the local prediction results of the dataset, our method has a better segmentation effect on detailed information in buildings. UNet and LinkNet correctly identify the buildings but also incorrectly identify the surrounding concrete as buildings, as shown in . FCN introduces too much background information in classification, resulting in blurred building boundaries. In contrast, our network achieves satisfactory performance on shadow regions and tricky buildings, such as the building covered by the tree in the top right of the image. Finally, lists the performance comparison for the different methods. Our method outperforms the other methods for OA, FWIoU, and MIoU. In this case, FWIoU reached 94.478%, which is at a rather advanced level. MIoU reached 85.119%, which is at least 1% points better than that for the other methods.

Figure 8. Segmentation results on the IND.V2 dataset. (a) Original Image. (b) Ground truth. (c) UNet. (d) SegNet. (e) Deeplabv3. (f) FCN. (g) Deeplabv3+. (h) LinkNet. (i) MacuNet. (j) EffuNet. (k) RSFCNN. (l) Ours.

Figure 9. Visual examples of the IND.V2 dataset. (a) Original Image. (b) Ground truth. (c) UNet. (d) FCN. (e) LinkNet. (f) Ours.

Table 5. Performance comparison of different methods on the IND.V2 dataset.

Download CSV Display Table

Ablation Study and optimizer

The optimization algorithm has a significant impact on the performance of the model. In this section, according to different optimization algorithms, we report full-scale ablation experiments to validate each module. Taking the Vaihingen dataset as an example, the baseline network was trained, and we then gradually added the multi-attention gating module and the fuzzy module. We conducted and reported their performance with respect to OA. M represents the multi-attention gating module, and F represents the fuzzy module.

presents the results of the ablation study on the Vaihingen dataset. The OA of our model in the ablation experiment is 84.960%, which is better than that of the benchmark model. In addition, the use of multi-attention gating blocks significantly improves the OA, compared with the benchmark model. Using the multi-attention gating module to enhance contextual information improves the OA by at least 0.934%. Moreover, adding the fuzzy module improves the OA by 1.097%. Obviously, the fuzzy neighbourhood module is more conducive to high-resolution remote sensing image segmentation. Furthermore, when the two are combined, the OA increases by 2.924%, which proves (to some extent) that both modules are required.

Table 6. Comparison between the baseline only and the baseline with the corresponding model added.

Download CSV Display Table

The results of the visual ablation study of the Adam optimizer on the Vaihingen dataset are presented in . In comparison with the ground truth, only the baseline is used for segmentation, the boundary between the two objects is adhered to because of shadow overlap, the edge noise was evident, and the boundary was mostly jagged. After the fuzzy neighbourhood module and multi-attention gating module are added, the mutual independence between the objects is improved, and the edge noise and region noise are reduced. The corresponding visualization results are shown in the third row of .

Figure 10. Results of the same baseline fusion test on the Vaihingen dataset with six different modules. (a) Original Image. (b) Ground truth. (c) BS. (d) BS+M. (e) BS+F. (f) Ours.

The results of the visual ablation study of the SGD optimizer on the Vaihingen dataset are presented in . The semantic information of car is easily covered by other categories. As illustrated in , when applying M to the baseline, the performance of the car has been improved. When applying F to the baseline, the performance for classification of heterogeneous object has been improved. We set two different optimization algorithms and trained the models separately to investigate the segmentation effect. We have determined that the model performs best when the Adam optimiser is used.

Figure 11. Results of the same baseline fusion test on the Vaihingen dataset with six different modules. (a) Original Image. (b) Ground truth. (c) BS. (d) BS+M. (e) BS+F. (f) Ours.

We set up two different optimisers, Adam and SGD, and train the models separately to examine the effect of the iterations. This is only effective when the loss function becomes small. illustrates the visualization of the loss value per iteration for the IND.v2 datasets. It is not difficult to see that in the initial stage of training, the loss varies greatly. However, as the number of cycles increases, the loss value changes slowly. When the number of epochs reaches 500, the loss value gradually converges. Besides, when the optimizer is SGD, the loss function decreases slowly. Therefore, we chose Adam as the optimiser when the model was trained on different datasets.

Figure 12. Loss curves for the training set with different optimizer.

Discussion

In this study, we proposed a fuzzy neighbourhood convolutional neural network that extract ground objects from high-resolution remote sensing images. The network combined multi-level feature extractor and fuzzy neighbourhood module, which is used to process uncertain information resulting from intra-class noise and the boundaries of different object. However, people may question that, for the remote sensing images with high intra-class difference and complex object boundaries, the fuzzy neighbourhood module may not be reliable. Our fuzzy neighbourhood module proceeds under a basic assumption that a pixel $p_{i}$ in the feature map is related to neighbourhood pixel $p_{j}$ , which contains most of the corresponding feature representation. Based on this assumption, the difference of uncertain information can be minimized by considering pixel neighbourhood information in the pixel classification process. Thereby, the only question is whether this assumption holds true in the complex scenarios as aforementioned. We think the hypothesis holds. Our method was tested on three remote sensing datasets and obtained some remarkable segmentation results. Such results showed the potential of fuzzy neighbourhood module to solve the above problems.

Also, our method improved the deep-learning structure by adopting the fuzzy pattern recognition methods, which can construct fuzzy relations to express problems that is difficult to be accurately described by classical mathematical logic, and have a unique advantage in solving uncertain problems. Recent research combining convolutional neural networks and fuzzy learning (M. Liu et al., Citation2019) extends the image input by adding additional channels, which is equivalent to adding additional handcrafted features represented by fuzzy set to the crisp input. However, these handcrafted fuzzy features computed based on the crisp input may fundamentally limit their performance. These methods may lose the original deterministic information. Rather than using the methods of the fuzzy data pre-processing cascading neural network that extract features from crisp input, we deeply integrated a fuzzy neighbourhood module in network architecture level to boost the performance of our model. Our approach was able to incorporate the convolutional neural network model alongside with the fuzzy learning methods in an effective manner. That is, the original definite information is retained. At the same time, the local features of each point are captured, and the permutation of the data feeding order is kept unchanged.

A commonly used design in the convolutional neural networks is based on stacked convolutions and pooling operations, which constantly reduce the spatial size of features to enhance their semantic representations. As mentioned in feature extraction structure section, we used a five-layers deep convolutional neural network structure. Although a deeper and wider network can capture richer and more complex features, it came at the cost of losing detailed spatial information (He et al., Citation2016). After down sampling several times in the deep convolutional neural network architectures, the latter feature maps lost spatial information. A small object of size 32 × 32 pixels is clearly visible in shallower feature maps, but not in the deeper feature maps.

The early method (Y. Liu et al., Citation2021) to solve the small object semantic segmentation was to improve the segmentation performance through image multi-scale strategy. This method enlarged the input image to different sizes, and judged the prediction results of all the size images to get the final segmentation result. Other methods (G. Chen et al., Citation2019; Xiao et al., Citation2020) used pyramid pooling layers to segment objects in multiple scales. Recent studies on small object detection (Nguyen et al., Citation2020; Tong et al., Citation2020) have successfully improved the detection accuracy by mainly focusing on top-down or bottom-up feature fusion. For example, some methods used skip connections to directly add lower-level feature maps to higher-level feature maps, or concatenated features by performing pooling and deconvolution simultaneously.

ln our method, we adopted weight coefficients that can adaptively fuse shallow and deep features. For this manner, it is more convenient to focus on a specific area rather than the entire background information. The performance of the fuzzy neighbourhood neural network was based on the result information of the fuzzy neighbourhood module and the feature complement of the multi-attention gating module respectively. The addition of the multi-attention gating part helped to supplement object information clearly. Therefore, our method was a process of mutual promotion, not only had certain effects in small object segmentation, but also had certain advantages in eliminating complex boundaries and intra-class noise. Based on the resulting segmentation and the corresponding accuracies, we concluded that fuzzy neighbourhood neural network can plays an important role in remote sensing image segmentation.

At the same time, other works on small object detection (Pang et al., Citation2019; Qian et al., Citation2019) proposed to use losses to address class imbalance for small objects. These methods give us a lot of inspiration for improving the recognition of small objects and thus optimizing the overall segmentation effect. In addition, the proposed model heavily relies on large labelled datasets. However, it is time-consuming and laborious to label large-scale remote sensing images that contain various diversity, such as different sensors, data ground sampling distances, acquisition areas and so on. Transfer learning can extract common knowledge from labelled datasets to help improve the performance of the model when the object lacks labels. In the follow-up study, we are planning to further delve into fuzzy learning, modify the proposed architecture and explore other excellent ideas.

Conclusion

This paper proposes a network structure to solve the problems of shadow noise and classification in remote sensing images. The network performs processing using the fuzzy neighbourhood module to overcome the inherent uncertainty of remote sensing images, achieve better boundary delineation, and improve segmentation performance. It realizes the filtering and extraction of shallow information in remote sensing images by using the multi-attention gating module, which can effectively remove noise from the shallow feature images and compensate for the detail in the deep feature images more robustly. As verified on the CCF, Vaihingen, and INDv2 dataset, our method can relatively better identify shadow information and smaller targets and can recognize target edge boundaries, while maintaining high accuracy. In the future, we will continue to study fuzzy learning in depth and adhere to the idea of combining fuzzy learning with deep learning for remote sensing image segmentation and further optimization of our proposed method.

Acknowledgments

All authors would sincerely thank the reviewers and editors for their suggestions and opinions for improving this article.

Disclosure statement

No potential conflict of interest was reported by the authors.

Data availability statement

The data used to support this work will be provided in the https://github.com/tingtingqu/code.

Additional information

Funding

This research was funded by the National Natural Science Foundation of China under Grant 62072391 and Grant 62066013.

References

Abdollahi, A., Pradhan, B., & Alamri, A. (2020). Vnet: An end-to-end fully convolutional neural network for road extraction from high-resolution remote sensing data. IEEE Access, 8, 179424–16. https://doi.org/10.1109/ACCESS.2020.3026658
Web of Science ®Google Scholar
Alam, M., Wang, J. F., Guangpei, C., Yunrong, L., & Chen, Y. (2021). Convolutional neural network for the semantic segmentation of remote sensing images. Mobile Networks and Applications, 26(1), 200–215. https://doi.org/10.1007/s11036-020-01703-3
Web of Science ®Google Scholar
Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis & Machine Intelligence, 39(12), 2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615
PubMed Web of Science ®Google Scholar
Bao, Y., Liu, W., Gao, O., Lin, Z., & Hu, Q. (2021). E-unet++: A semantic segmentation method for remote sensing images. 2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference, (pp. 1858–1862). Chongqing, China. https://doi.org/10.1109/IMCEC51613.2021.9482266
Google Scholar
Caesar, H., Uijlings, J., & Ferrari, V. (2018). Coco-stuff: Thing and stuff classes in context. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 1209–1218). Salt Lake City, Utah, United States. https://doi.org/10.48550/arXiv.1612.03716
Google Scholar
Chaurasia, A., & Culurciello, E. (2017). Linknet: Exploiting encoder representations for efficient semantic segmentation. Proceedings of the IEEE Visual Communications and Image Processing, Petersburg, (pp. 1–4). FL, USA. http://doi.org/10.1109/VCIP.2017.8305148
Google Scholar
Chen, G., Li, C., Wei, W., Jing, W., Woźniak, M., Blažauskas, T., & Damaševičius, R. (2019). Fully convolutional neural network with augmented atrous spatial pyramid pool and fully connected fusion path for high resolution remote sensing image segmentation. Applied SciencesHttp, 9(9), 1816. https://doi.org/10.3390/app9091816
Google Scholar
Chen, M., Wu, J., Liu, L., Zhao, W., Tian, F., Shen, Q., Zhao, B., & Du, R. (2021). Dr-net: An improved network for building extraction from high resolution remote sensing image. Remote Sensing, 13(2), 294. https://doi.org/10.3390/rs13020294
Web of Science ®Google Scholar
Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision, (pp. 801–808). Munich, Germany. https://doi.org/10.48550/arXiv.1802.02611
Google Scholar
de Carvalho, O. L. F., de Carvalho Júnior, O. A., de Albuquerque, A. O., Santana, N. C., Borges, D. L., Luiz, A. S., Gomes, R. A. T., & Guimarães, R. F. (2022). Multispectral panoptic segmentation: Exploring the beach setting with worldview-3 imagery. International Journal of Applied Earth Observation and Geoinformation, 112, 102910. https://doi.org/10.1016/j.jag.2022.102910
Web of Science ®Google Scholar
de Carvalho, O. L. F., de Carvalho Júnior, O. A., Silva, C. R. E., de Albuquerque, A. O., Santana, N. C., Borges, D. L., Gomes, R. A. T., & Guimarães, R. F. (2022). Panoptic segmentation meets remote sensing. Remote Sensing, 14(4), 965. https://doi.org/10.3390/rs14040965
Google Scholar
Deng, Y., Ren, Z., Kong, Y., Bao, F., & Dai, Q. (2016). A hierarchical fused fuzzy deep neural network for data classification. IEEE Transactions on Fuzzy Systems, 25(4), 1006–1012. https://doi.org/10.1109/TFUZZ.2016.2574915
Web of Science ®Google Scholar
Guo, W., Yang, W., Zhang, H. J., & Hua, G. (2018). Geospatial object detection in high resolution satellite images based on multi-scale convolutional neural network. Remote Sensing, 10(1), 131. https://doi.org/10.3390/rs10010131
Web of Science ®Google Scholar
Han, B., & Wu, Y. (2017). A novel active contour model based on modified symmetric cross entropy for remote sensing river image segmentation. Pattern recognition, 67(7), 396–409. https://doi.org/10.1016/j.patcog.2017.02.022
Google Scholar
He, K., Zhang, X., Ren, X., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 770–778). Las Vegas, USA. http://doi.org/10.1109/CVPR.2016.90
Google Scholar
Huang, K., Zhang, Y., Cheng, H. D., Xing, P., & Zhang, B. (2021). Semantic segmentation of breast ultrasound image with pyramid fuzzy uncertainty reduction and direction connectedness feature. 25th International Conference on Pattern Recognition, (pp.3357–3364). Milan, Italy. https://doi.org/10.1109/ICPR48806.2021.9413082
Google Scholar
Hua, X., Wang, X., Rui, T., Shao, F., & Wang, D. (2021). Cascaded panoptic segmentation method for high resolution remote sensing image. Applied Soft Computing, 109, 107515. https://doi.org/10.1016/j.asoc.2021.107515
Web of Science ®Google Scholar
Hurtik, P., Molek, V., & Hula, J. (2019). Data preprocessing technique for neural networks based on image represented by a fuzzy function. IEEE Transactions on Fuzzy Systems, 28(7), 1195–1204. https://doi.org/10.1109/TFUZZ.2019.2911494
Web of Science ®Google Scholar
Jabri, S., Zhang, Y., & Alaeldin, S. (2014). Stereo-based building detection in very high resolution satellite imagery using IHS color system. Proceedings of the IEEE Geoscience and Remote Sensing Symposium, (pp. 2301–2304). Quebec City, Canada. https://doi.org/10.1109/IGARSS.2014.6946930
Google Scholar
Khoshboresh-Masouleh, M., & Shah-Hosseini, R. (2021a). Building panoptic change segmentation with the use of uncertainty estimation in squeeze-and-attention CNN and remote sensing observations. International Journal of Remote Sensing, 42(20), 7798–7820. https://doi.org/10.1080/01431161.2021.1966853
Web of Science ®Google Scholar
Khoshboresh-Masouleh, M., & Shah-Hosseini, R. (2021b). A deep multi-modal learning method and a new rgb-depth data set for building roof extraction. Photogrammetric Engineering & Remote Sensing, 87(10), 759–766. https://doi.org/10.14358/PERS.21-00007R2
Web of Science ®Google Scholar
Li, R., Duan, C., Zheng, S., Atkinson, P. M., & Atkinson, P. M. (2021). MACU-Net for semantic segmentation of fine-resolution remotely sensed images. IEEE Geoscience and Remote Sensing Letters, 99, 1–5. https://doi.org/10.1109/LGRS.2021.3052886
Google Scholar
Li, X., He, H., Li, X., Li, D., Cheng, G., Shi, J., Weng, L., Tong, Y., & Lin, Z. (2021). Pointflow: Flowing semantics through points for aerial image segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 4217–4226). Kuala Lumpur, Malaysia. http://doi.org/10.48550/arXiv.2103.06564
Google Scholar
Liu, M., Jiao, L., Liu, X., Li, L., Liu, F., & Yang, S. (2020). C-CNN: Contourlet convolutional neural networks. IEEE Transactions on Neural Networks and Learning Systems, 32(6), 2636–2649. https://doi.org/10.1109/TNNLS.2020.3007412
Web of Science ®Google Scholar
Liu, R., Mi, L., & Chen, Z. (2020). Afnet: Adaptive fusion network for remote sensing image semantic sementation. IEEE Transactions on Geoscience and Romote Sensing, 59(9), 7871–7886. https://doi.org/10.1109/TGRS.2020.3034123
Web of Science ®Google Scholar
Liu, Y., Sun, P., Wergeles, N., & Shang, Y. (2021). A survey and performance evaluation of deep learning methods for small object detection. Expert Systems with Applications, 172(6), 114602. https://doi.org/10.1016/j.eswa.2021.114602
Google Scholar
Liu, M., Zhou, Z., Shang, P., & Xu, D. (2019). Fuzzified image enhancement for deep learning in iris recognition. IEEE Transactions on Fuzzy Systems. PP(99) 1-1. https://doi.org/10.1109/TFUZZ.2019.2958558
Web of Science ®Google Scholar
Li, X., Zhang, L., You, A., Yang, M., Yang, K., & Tong, Y. (2019). Global aggregation then local distribution in fully convolutional networks. International Conference on Machine Learning. http://doi.org/10.48550/arXiv.1909.07229
Google Scholar
Li, R., Zheng, S., Zhang, C., Duan, C., Su, J., Wang, L., & Atkinson, P. M. (2021). Multiattention network for semantic segmentation of fine-resolution remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–13. https://doi.org/10.1109/TGRS.2021.3093977
Web of Science ®Google Scholar
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 3431–3440). Boston, United States. https://doi.org/10.1109/TPAMI.2016.2572683
Google Scholar
Lu, H., Zhang, M., Xu, X., Li, Y. J., & Shen, H. T. (2020). Deep fuzzy hashing network for efficient image retrieval. IEEE Transactions on Fuzzy Systems, 29(1), 166–176. https://doi.org/10.1109/TFUZZ.2020.2984991
Web of Science ®Google Scholar
Mylonas, S. K., Stavrakoudis, D. G., & Theocharis, J. B. (2013). GeneSIS: A ga-based fuzzy segmentation algorithm for remote sensing images. Knowledge-Based Systems, 54, 86–102. https://doi.org/10.1016/j.knosys.2013.07.018
Web of Science ®Google Scholar
Nguyen, N. D., Do, T., Ngo, T. D., & Le, D. (2020). An evaluation of deep learning methods for small object detection. Journal of Electrical and Computer Engineering, 2020, 1–18. https://doi.org/10.1155/2020/3189691
Web of Science ®Google Scholar
Ni, Z. L., Bian, G. B., Xie, X. L., Hou, Z. G., Zhou, X. H., & Zhou, Y. J. (2019). Rasnet: Segmentation for tracking surgical instruments in surgical videos using refined attention segmentation network. 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society, (pp. 5735–5738). Berlin, Germany. https://doi.org/10.1109/EMBC.2019.8856495
Google Scholar
Nida, N., Irtaza, A., Javed, A., Yousaf, M. H., & Mahmood, M. T. (2019). Melanoma lesion detection and segmentation using deep region based convolutional neural network and fuzzy c-means clustering. International Journal of Medical Informatics, 124, 37–48. https://doi.org/10.1016/j.ijmedinf.2019.01.005
PubMed Web of Science ®Google Scholar
Pal, K. K., & Sudeep, K. (2016). Preprocessing for image classification by convolutional neural networks. In Proceedings of the IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology, (pp. 1778–1781). Bangalore, India. https://doi.org/10.1109/RTEICT.2016.7808140
Google Scholar
Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., & Lin, D. (2019). Libra r-cnn: Towards balanced learning for object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 821–830). Long Beach, USA. http://doi.org/10.1109/CVPR.2019.00091
Google Scholar
Pan, S. M., Tao, Y. L., Nie, C. C., & Chong, Y. W. (2021). Pegnet: Progressive edge guidance network for semantic segmentation of remote sensing images. IEEE Geoscience and Remote Sensing Letters, 18(4), 637–641. https://doi.org/10.1109/ACCESS.2020.3015587
Web of Science ®Google Scholar
Qian, Q., Chen, L., Li, H., & Jin, R. (2019). DR loss: Improving object detection by distributional ranking. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 12164–12172). Seattle, USA. http://doi.org/10.1109/CVPR42600.2020.01218
Google Scholar
Qi, X., Li, K., Liu, P., Zhou, X., & Sun, M. (2020). Deep attention and multi-scale networks for accurate remote sensing image segmentation. IEEE Access, 8, 146627–146639. https://doi.org/10.1109/ACCESS.2020.3015587/
Google Scholar
Ronneberger, O., Fischer, P., & Brox, T. (2015). Unet: Convolutional networks for biomedical image segmentation. International Conference on Medical Image Computing and Computer-assisted Intervention, (pp. 234–241). Munich, Germany. https://doi.org/10.1007/978331924574428
Google Scholar
Salzano, R., Salvatori, R., Valt, M., Giuliani, G., Chatenoux, B., & Ioppi, L. (2019). Automated classification of terrestrial images: The contribution to the remote sensing of snow cover. Geosciences, 9(2), 97. https://doi.org/10.3390/geosciences9020097
Web of Science ®Google Scholar
Sharma, A., Liu, X., Yang, X., & Shi, D. (2017). A patch-based convolutional neural network for remote sensing image classification. Neural Networks, 95(11), 19–28. https://doi.org/10.1016/j.neunet.2017.07.017
PubMedGoogle Scholar
Sherrah, J. (2016). Fully convolutional networks for dense semantic labelling of high-resolution aerial imagery. Computer Vision and Pattern Recognition. http://doi.org/10.48550/arXiv.1606.02585
Google Scholar
Song, N. Q., Wang, N., Lin, W. N., & Wu, N. (2018). Using satellite remote sensing and numerical modelling for the monitoring of suspended particulate matter concentration during reclamation construction at dalian offshore airport in china. European Journal of Remote Sensing, 51(1), 878–888. https://doi.org/10.1080/22797254.2018.1498301
Web of Science ®Google Scholar
Sugeno, M., & Yasukawa, T. (1993). A fuzzy-logic-based approach to qualitative modeling. IEEE Transactions on Fuzzy Systems, 1(1), 7–31. https://doi.org/10.1109/TFUZZ.1993.390281
Google Scholar
Sun, X., Shi, A., Huang, H., & Mayer, H. (2020). Bas4net: Boundary-aware semi-supervised semantic segmentation network for very high resolution remote sensing images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 5398–5413. https://doi.org/10.1109/JSTARS.2020.3021098
Web of Science ®Google Scholar
Tong, K., Wu, Y., & Zhou, F. (2020). Recent advances in small object detection based on deep learning: A review. Image and Vision Computing, 97(6), 103910. https://doi.org/10.1016/j.imavis.2020.103910
Google Scholar
Wang, H., Chen, X. Z., Zhang, T. X., Xu, Z. Y., & Li, J. Y. (2022). Cctnet: Coupled CNN and transformer network for crop segmentation of remote sensing images. Remote Sensing, 14(9), 1956. https://doi.org/10.3390/rs14091956
Web of Science ®Google Scholar
Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., Liu, W., & Xiao, B. (2020). Deep high-resolution representation learning for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10), 3349–3364. https://doi.org/10.1109/TPAMI.2020.2983686
Web of Science ®Google Scholar
Wei, H., Xu, X., Ou, N., Zhang, X., & Dai, Y. (2021). Deanet: Dual encoder with attention network for semantic segmentation of remote sensing imagery. Remote Sensing, 13(19), 3900. https://doi.org/10.3390/rs13193900
Web of Science ®Google Scholar
Wurm, M., Stark, T., Zhu, X. X., Weigand, M., & Taubenböck, H. (2019). Semantic segmentation of slums in satellite images using transfer learning on fully convolutional neural networks. Isprs Journal of Photogrammetry and Remote Sensing, 150, 59–69. https://doi.org/10.1016/j.isprsjprs.2019.02.006
Web of Science ®Google Scholar
Xiao, T., Liu, Y., Zhou, B., & Jiang, Y. (2018). Unified perceptual parsing for scene understanding. Proceedings of the European Conference on Computer Vision, (pp. 418–434). Munich, Germany. https://doi.org/10.1109/CVPRW53098.2021.00179
Google Scholar
Xiao, Y., Tian, Z., Yu, J., Zhang, Y. S., Liu, S., Du, S., & Lan, X. (2020). A review of object detection based on deep learning. Multimedia Tools and Applications, 79(33), 23729–23791. https://doi.org/10.1007/s11042-020-08976-6
Web of Science ®Google Scholar
Xu, J., Zhao, T., Feng, G., Ni, M., & Ou, S. (2021). A fuzzy C-means clustering algorithm based on spatial context model for image segmentation. International Journal of Fuzzy Systems, 23(3), 816–832. https://doi.org/10.1007/s40815-020-01015-4
Web of Science ®Google Scholar
Yuan, Y., Chen, X., & Wang, J. (2020). Object-contextual representations for semantic segmentation. European Conference on Computer Vision, (pp. 173–190). Springer, Cham. https://doi.org/10.48550/arXiv.1909.11065
Google Scholar
Yuan, J., Wang, D., & Li, R. (2014). Remote sensing image segmentation by combining spectral and texture features. IEEE Transactions on Geoscience and Remote Sensing, 52(1), 16–24. https://doi.org/10.1109/TGRS.2012.2234755
Web of Science ®Google Scholar
Yue, J., Mao, S., & Li, M. (2016). A deep learning framework for hyperspectral image classification using spatial pyramid pooling. Remote Sensing Letters, 7(9), 875–884. https://doi.org/10.1080/2150704X.2016.1193793
Web of Science ®Google Scholar
Zhang, F., Du, B., & Zhang, L. (2016). Scene classification via a gradient boosting random convolutional network framework. IEEE Transactions on Geoscience and Remote Sensing, 54(3), 1793–1802. https://doi.org/10.1109/TGRS.2015.2488681
Web of Science ®Google Scholar
Zhang, C., Pan, X., Li, H., Gardiner, A., Sargent, I., Hare, J., & Atkinson, P. M. (2018). A hybrid mlp-cnn classifier for very fine resolution remotely sensed image classification. Isprs Journal of Photogrammetry and Remote Sensing, 140, 133–144. https://doi.org/10.1016/J.ISPRSJPRS.2017.07.014
Web of Science ®Google Scholar
Zhang, X., Pan, X., & Wang, S. (2017). Fuzzy dbn with rule-based knowledge representation and high interpretability. In 12th International Conference on Intelligent Systems and Knowledge Engineering, (pp. 1–25). Nanjing, China. https://doi.org/10.1109/ISKE.2017.8258762
Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 2881–2890). Hawaii, USA. http://doi.org/10.1109/CVPR.2017.660
Google Scholar
Zhao, Z., Vuran, M. C., Guo, F., & Scott, S. D. (2021). Deep-waveform: A learned ofdm receiver based on deep complex-valued convolutional networks. IEEE Journal on Selected Areas in Communications, 39(8), 2407–2420. https://doi.org/10.1109/JSAC.2021.3087241
Web of Science ®Google Scholar
Zhao, T., Xu, J., Chen, R., & Ma, X. (2021). Remote sensing image segmentation based on the fuzzy deep convolutional neural network. International Journal of Remote Sensing, 42(16), 6264–6283. https://doi.org/10.1080/01431161.2021.1938738
Web of Science ®Google Scholar
Zheng, X., Huan, L., Xiong, H., & Gong, J. (2019). Elkppnet: An edge-aware neural network with large kernel pyramid pooling for learning discriminative features in semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA. https://doi.org/10.48550/arXiv.1906.11428
Google Scholar
Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., & Torralba, A. (2019). Semantic understanding of scenes through the ade20k dataset. International Journal of Computer Vision, 127(3), 302–321. https://doi.org/10.1007/s11263-018-1140-0
Web of Science ®Google Scholar

Download PDF

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Fuzzy neighbourhood neural network for high-resolution remote sensing image segmentation

ABSTRACT

Introduction