Views

CrossRef citations to date

Altmetric

Articles

A shape-aware enhancement Vision Transformer for building extraction from remote sensing imagery

Tuerhong YimingCollege of Civil Engineering and Architecture, Xinjiang University, Urumqi, Xinjiang, China

https://orcid.org/0009-0005-1074-5238 View further author information

Xiaoyan TangCollege of Civil Engineering and Architecture, Xinjiang University, Urumqi, Xinjiang, ChinaCorrespondence[email protected]
View further author information

Haibin ShangCollege of Civil Engineering and Architecture, Xinjiang University, Urumqi, Xinjiang, ChinaView further author information

ABSTRACT

Convolutional neural networks (CNN) have been developed for several years in the field of extracting buildings from remote sensing images. Vision Transformer (ViT) has recently demonstrated superior performance over CNN, thanks to its ability to model long-range dependencies through self-attention mechanisms. However, most existing ViT models lack shape information enhancement for the building objects, resulting in insufficient fine-grained segmentation. To address this limitation, we construct an efficient dual-path ViT framework for building segmentation, termed shape-aware enhancement Vision Transformer (SAEViT). Our approach incorporates shape-aware enhancement module (SAEM) that perceives and enhances the shape features of buildings using multi-shapes of convolutional kernels. We also introduce multi-pooling channel attention (MPCA) to exploit channel-wise information without squeezing the channel dimension. Furthermore, we propose a progressive aggregation upsampling model (PAUM) in the decoder to aggregate multilevel features using a progressive upsampling methodology, coupled with the utilization of the soft-pool algorithm operating on the channel axis. We evaluate our model on three public building datasets. The experimental results show that SAEViT obtains a significant improvement on various datasets, confirming its efficacy. Compared with several state-of-the-art models, SAEViT achieves a comprehensive transcendence in overall performance.

KEYWORDS:

Acknowledgements

This work owes a great deal of gratitude to Natural Science Foundation of Xinjiang Uygur Autonomous Region and all teachers and students of the research group, whose unwavering support and invaluable assistance have been instrumental in shaping the outcome of this research.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

WHU building dataset is openly accessible at https://study.rsgis.whu.edu.cn/pages/download/building_dataset.html, Massachusetts building bataset is openly accessible at https://www.cs.toronto.edu/vmnih/data/, and Inria building dataset is openly accessible at https://project.inria.fr/aerialimagelabeling/.

Additional information

Funding

This research was funded by Natural Science Foundation of Xinjiang Uygur Autonomous Region [2023D01C31].

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

A shape-aware enhancement Vision Transformer for building extraction from remote sensing imagery

Information for

Open access

Opportunities

Help and information

A shape-aware enhancement Vision Transformer for building extraction from remote sensing imagery

ABSTRACT

Acknowledgements

Disclosure statement

Data availability statement

Additional information

Funding

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature