138
Views
0
CrossRef citations to date
0
Altmetric
Articles

A shape-aware enhancement Vision Transformer for building extraction from remote sensing imagery

ORCID Icon, &
Pages 1250-1276 | Received 26 Sep 2023, Accepted 10 Jan 2024, Published online: 02 Feb 2024
 

ABSTRACT

Convolutional neural networks (CNN) have been developed for several years in the field of extracting buildings from remote sensing images. Vision Transformer (ViT) has recently demonstrated superior performance over CNN, thanks to its ability to model long-range dependencies through self-attention mechanisms. However, most existing ViT models lack shape information enhancement for the building objects, resulting in insufficient fine-grained segmentation. To address this limitation, we construct an efficient dual-path ViT framework for building segmentation, termed shape-aware enhancement Vision Transformer (SAEViT). Our approach incorporates shape-aware enhancement module (SAEM) that perceives and enhances the shape features of buildings using multi-shapes of convolutional kernels. We also introduce multi-pooling channel attention (MPCA) to exploit channel-wise information without squeezing the channel dimension. Furthermore, we propose a progressive aggregation upsampling model (PAUM) in the decoder to aggregate multilevel features using a progressive upsampling methodology, coupled with the utilization of the soft-pool algorithm operating on the channel axis. We evaluate our model on three public building datasets. The experimental results show that SAEViT obtains a significant improvement on various datasets, confirming its efficacy. Compared with several state-of-the-art models, SAEViT achieves a comprehensive transcendence in overall performance.

Acknowledgements

This work owes a great deal of gratitude to Natural Science Foundation of Xinjiang Uygur Autonomous Region and all teachers and students of the research group, whose unwavering support and invaluable assistance have been instrumental in shaping the outcome of this research.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

WHU building dataset is openly accessible at https://study.rsgis.whu.edu.cn/pages/download/building_dataset.html, Massachusetts building bataset is openly accessible at https://www.cs.toronto.edu/vmnih/data/, and Inria building dataset is openly accessible at https://project.inria.fr/aerialimagelabeling/.

Additional information

Funding

This research was funded by Natural Science Foundation of Xinjiang Uygur Autonomous Region [2023D01C31].

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.