Views

CrossRef citations to date

Altmetric

Research Article

When zero-padding position encoding encounters linear space reduction attention: an efficient semantic segmentation Transformer of remote sensing images

Yi Yana Faculty of Information Technology, Beijing University of Technology, Beijing, ChinaView further author information

Jing Zhanga Faculty of Information Technology, Beijing University of Technology, Beijing, China;b Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, ChinaCorrespondence[email protected]

https://orcid.org/0000-0003-1290-0738 View further author information

Xinjia Wua Faculty of Information Technology, Beijing University of Technology, Beijing, ChinaView further author information

Jiafeng Lia Faculty of Information Technology, Beijing University of Technology, Beijing, China;b Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, ChinaView further author information

Li Zhuoa Faculty of Information Technology, Beijing University of Technology, Beijing, China;b Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China

https://orcid.org/0000-0002-9937-2669 View further author information

ABSTRACT

Semantic segmentation of remote sensing images (RSIs) is of great significance for obtaining geospatial object information. Transformers win promising effect, whereas multi-head self-attention (MSA) is expensive. We propose an efficient semantic segmentation Transformer (ESST) of RSIs that combines zero-padding position encoding with linear space reduction attention (LSRA). First, to capture the coarse-to-fine features of RSI, a zero-padding position encoding is proposed by adding overlapping patch embedding (OPE) layers and convolution feed-forward networks (CFFN) to improve the local continuity of features. Then, we replace LSRA in the attention operation to extract multi-level features to reduce the computational cost of the encoder. Finally, we design a lightweight all multi-layer perceptron (all-MLP) head decoder to easily aggregate multi-level features to generate multi-scale features for semantic segmentation. Experimental results demonstrate that our method produces a trade-off in accuracy and speed for semantic segmentation of RSIs on the Potsdam and Vaihingen datasets, respectively.

KEYWORDS:

Disclosure statement

No potential conflict of interest was reported by the authors.

Data Availability statement

Data is openly available in a public repository that issues datasets. The data that support the findings of this study are openly available in Potsdam at https://www2.isprs.org/commissions/comm3/wg4/benchmark/2d-sem-label-potsdam/ and Vaihingen at https://www2.isprs.org/commissions/comm3/wg4/benchmark/2d-sem-label-vaihingen/.

Additional information

Funding

This work was supported by National Natural Science Foundation of China under Grant 62371015, Beijing Natural Science Foundation under Grant L211017, and General Program of Beijing Municipal Education Commission under Grant KM202110005027.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

When zero-padding position encoding encounters linear space reduction attention: an efficient semantic segmentation Transformer of remote sensing images

Information for

Open access

Opportunities

Help and information

When zero-padding position encoding encounters linear space reduction attention: an efficient semantic segmentation Transformer of remote sensing images

ABSTRACT

Disclosure statement

Data Availability statement

Additional information

Funding

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature