138
Views
0
CrossRef citations to date
0
Altmetric
Articles

A shape-aware enhancement Vision Transformer for building extraction from remote sensing imagery

ORCID Icon, &
Pages 1250-1276 | Received 26 Sep 2023, Accepted 10 Jan 2024, Published online: 02 Feb 2024
 

ABSTRACT

Convolutional neural networks (CNN) have been developed for several years in the field of extracting buildings from remote sensing images. Vision Transformer (ViT) has recently demonstrated superior performance over CNN, thanks to its ability to model long-range dependencies through self-attention mechanisms. However, most existing ViT models lack shape information enhancement for the building objects, resulting in insufficient fine-grained segmentation. To address this limitation, we construct an efficient dual-path ViT framework for building segmentation, termed shape-aware enhancement Vision Transformer (SAEViT). Our approach incorporates shape-aware enhancement module (SAEM) that perceives and enhances the shape features of buildings using multi-shapes of convolutional kernels. We also introduce multi-pooling channel attention (MPCA) to exploit channel-wise information without squeezing the channel dimension. Furthermore, we propose a progressive aggregation upsampling model (PAUM) in the decoder to aggregate multilevel features using a progressive upsampling methodology, coupled with the utilization of the soft-pool algorithm operating on the channel axis. We evaluate our model on three public building datasets. The experimental results show that SAEViT obtains a significant improvement on various datasets, confirming its efficacy. Compared with several state-of-the-art models, SAEViT achieves a comprehensive transcendence in overall performance.

Acknowledgements

This work owes a great deal of gratitude to Natural Science Foundation of Xinjiang Uygur Autonomous Region and all teachers and students of the research group, whose unwavering support and invaluable assistance have been instrumental in shaping the outcome of this research.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

WHU building dataset is openly accessible at https://study.rsgis.whu.edu.cn/pages/download/building_dataset.html, Massachusetts building bataset is openly accessible at https://www.cs.toronto.edu/vmnih/data/, and Inria building dataset is openly accessible at https://project.inria.fr/aerialimagelabeling/.

Additional information

Funding

This research was funded by Natural Science Foundation of Xinjiang Uygur Autonomous Region [2023D01C31].

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 689.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.