65
Views
0
CrossRef citations to date
0
Altmetric
Research Article

When zero-padding position encoding encounters linear space reduction attention: an efficient semantic segmentation Transformer of remote sensing images

, ORCID Icon, , & ORCID Icon
Pages 609-633 | Received 29 Aug 2023, Accepted 16 Dec 2023, Published online: 25 Jan 2024
 

ABSTRACT

Semantic segmentation of remote sensing images (RSIs) is of great significance for obtaining geospatial object information. Transformers win promising effect, whereas multi-head self-attention (MSA) is expensive. We propose an efficient semantic segmentation Transformer (ESST) of RSIs that combines zero-padding position encoding with linear space reduction attention (LSRA). First, to capture the coarse-to-fine features of RSI, a zero-padding position encoding is proposed by adding overlapping patch embedding (OPE) layers and convolution feed-forward networks (CFFN) to improve the local continuity of features. Then, we replace LSRA in the attention operation to extract multi-level features to reduce the computational cost of the encoder. Finally, we design a lightweight all multi-layer perceptron (all-MLP) head decoder to easily aggregate multi-level features to generate multi-scale features for semantic segmentation. Experimental results demonstrate that our method produces a trade-off in accuracy and speed for semantic segmentation of RSIs on the Potsdam and Vaihingen datasets, respectively.

Disclosure statement

No potential conflict of interest was reported by the authors.

Data Availability statement

Data is openly available in a public repository that issues datasets. The data that support the findings of this study are openly available in Potsdam at https://www2.isprs.org/commissions/comm3/wg4/benchmark/2d-sem-label-potsdam/ and Vaihingen at https://www2.isprs.org/commissions/comm3/wg4/benchmark/2d-sem-label-vaihingen/.

Additional information

Funding

This work was supported by National Natural Science Foundation of China under Grant 62371015, Beijing Natural Science Foundation under Grant L211017, and General Program of Beijing Municipal Education Commission under Grant KM202110005027.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.