Search in:

GIScience & Remote Sensing Volume 60, 2023 - Issue 1

Submit an article Journal homepage

Open access

4,241

Views

CrossRef citations to date

Altmetric

Research Article

Water index and Swin Transformer Ensemble (WISTE) for water body extraction from multispectral remote sensing images

Donghui Maa Geovis Spatial Technology Co.Ltd, Xi’an, ChinaView further author information

Liguang Jiangb Shenzhen Key Laboratory of Precision Measurement and Early Warning Technology for Urban Environmental Health Risks, School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen, ChinaCorrespondence[email protected]

https://orcid.org/0000-0001-7907-1677 View further author information

Jie Lic Xi’an Surveying and Mapping Institute, Xi’an, ChinaView further author information

Yun Shid College of Geomatics, Xi’an University of Science and Technology, Xi’an, ChinaView further author information

Article: 2251704 | Received 20 Apr 2023, Accepted 20 Aug 2023, Published online: 29 Aug 2023

Cite this article
https://doi.org/10.1080/15481603.2023.2251704
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

Figures & data

Figure 1. Overall structure of the proposed WISTE method. The encoder part is shown in the blue box, and the decoder part is shown in the green box. The Swin Transformer block structure is shown in the lower-right part. The abbreviations MLP, LN, W-MSA, and SW-MSA denote a multilayer perceptron, layer normalization, window-based multihead self-attention, and shifted window-based multihead self-attention, respectively.

Figure 2. Main computation process of the Swin Transformer. (a) Patch merging to reduce the dimensionality of the feature map. The H, W, and C values of a patch denote the height, width, and number of channels of the corresponding feature map, respectively. (b) Hierarchical layer with different window sizes, with sampling levels of 4×, 8×, and 16×. (c) Window partitioning strategy in the Swin Transformer, where the semantic information between neighbouring patches is considered. (d) Shifted-window self-attention computation. The patches in dashed red boxes are masked, and self-attention is computed within the solid red boxes.

Figure 3. Illustration of the dual-encoder architecture of the WISTE. The auxiliary encoder branch is indicated by a dashed red box. ST denotes the Swin Transformer; W_i represents the matrices for each computational block; x_i denotes the intermediate layer outputs; and L_i signifies the loss function of the corresponding branch.

Figure 4. Images land cover labels and corresponding water body extraction labels of two public datasets, i.e. (a) The GID and (b) DeepGlobe dataset.

Figure 5. Images and labels of different water types contained in the GID dataset. From top to bottom, the rows present lakes, rivers, canals, and ponds.

Figure 6. One Sentinel-2 image used for ablation experiments. (a) Sentinel-2 image obtained after band fusion (B2, B3, B4 and B8 for the B, G, R, and NIR bands, respectively). (b) Ground truth of the water body, with water in white and the background in black.

Table 1. Ablation experiment results obtained by the proposed WISTE on the GID and Sentinel-2 images (the best values are in bold).

Download CSV Display Table

Figure 7. Comparison among the prediction results produced by different methods on the DeepGlobe dataset. The water areas are in white, while the background is in black. The red boxes highlight areas with weak spatial information.

Table 2. Comparison among the different methods on the DeepGlobe dataset (the best values are in bold).

Download CSV Display Table

Figure 8. Comparison among the prediction results produced by different methods using the GID. The water areas are drawn in white. The areas where the WISTE demonstrates greater advantages are highlighted in red boxes.

Table 3. Comparison among the different methods on the GID dataset (the best values are in bold).

Download CSV Display Table

Figure 9. In-detail comparison among the prediction results produced by different methods on the GID dataset.

Table 4. Different band arrangements of the training samples in the GID dataset (the best values are in bold).

Download CSV Display Table

Table 5. Relationships between different FCN positions and the accuracy of ST-Dual (the best values are in bold).

Download CSV Display Table

Supplemental material

Supplemental Material

Download MS Word (15.6 MB)

Data availability statement

The datasets used in this study are publicly available. The Gaofen Image Dataset (GID) is available at https://x-ytong.github.io/project/GID.html, and the DeepGlobe dataset is downloadable at https://competitions.codalab.org/competitions/18468

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Water index and Swin Transformer Ensemble (WISTE) for water body extraction from multispectral remote sensing images

Table 1. Ablation experiment results obtained by the proposed WISTE on the GID and Sentinel-2 images (the best values are in bold).

Table 2. Comparison among the different methods on the DeepGlobe dataset (the best values are in bold).

Table 3. Comparison among the different methods on the GID dataset (the best values are in bold).

Table 4. Different band arrangements of the training samples in the GID dataset (the best values are in bold).

Table 5. Relationships between different FCN positions and the accuracy of ST-Dual (the best values are in bold).

Supplemental Material

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Water index and Swin Transformer Ensemble (WISTE) for water body extraction from multispectral remote sensing images

Figures & data

Table 1. Ablation experiment results obtained by the proposed WISTE on the GID and Sentinel-2 images (the best values are in bold).

Table 2. Comparison among the different methods on the DeepGlobe dataset (the best values are in bold).

Table 3. Comparison among the different methods on the GID dataset (the best values are in bold).

Table 4. Different band arrangements of the training samples in the GID dataset (the best values are in bold).

Table 5. Relationships between different FCN positions and the accuracy of ST-Dual (the best values are in bold).

Supplemental Material

Data availability statement

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date