138
Views
0
CrossRef citations to date
0
Altmetric
Articles

A shape-aware enhancement Vision Transformer for building extraction from remote sensing imagery

ORCID Icon, &
Pages 1250-1276 | Received 26 Sep 2023, Accepted 10 Jan 2024, Published online: 02 Feb 2024

References

  • Adriano, B., H. Miura, W. Liu, M. Matsuoka, and S. Koshimura. 2023. “Developing a Framework for Rapid Collapsed Building Mapping Using Satellite Imagery and Deep Learning Models.” In IGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium, Pasadena, CA, USA, 1273–1276. IEEE.
  • Cai, J., and Y. Chen. 2021. “MHA-Net: Multipath Hybrid Attention Network for Building Footprint Extraction from High-Resolution Remote Sensing Imagery.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14:5807–5817. https://doi.org/10.1109/JSTARS.2021.3084805.
  • Cai, B., Z. Shao, X. Huang, X. Zhou, and S. Fang. 2023. “Deep Learning-Based Building Height Mapping Using Sentinel-1 and Sentinel-2 Data.” International Journal of Applied Earth Observation and Geoinformation 122:103399. https://doi.org/10.1016/j.jag.2023.103399.
  • Chen, J., Y. Jiang, L. Luo, and W. Gong. 2022. “ASF-Net: Adaptive Screening Feature Network for Building Footprint Extraction from Remote-Sensing Images.” IEEE Transactions on Geoscience & Remote Sensing 60:1–13. https://doi.org/10.1109/TGRS.2022.3165204.
  • Chen, J., Y. Lu, Q. Yu, X. Luo, E. Adeli, Y. Wang, L. Lu, A. L. Yuille, and Y. Zhou. 2021. “TransUnet: Transformers Make Strong Encoders for Medical Image Segmentation.” https://doi.org/10.48550/arXiv.2102.04306.
  • Chen, X., C. Qiu, W. Guo, Y. Anzhu, X. Tong, and M. Schmitt. 2022. “Multiscale Feature Learning by Transformer for Building Extraction from Satellite Images.” IEEE Geoscience & Remote Sensing Letters 19:1–5. https://doi.org/10.1109/LGRS.2022.3142279.
  • Chen, J., D. Zhang, Y. Wu, Y. Chen, and X. Yan. 2022. “A Context Feature Enhancement Network for Building Extraction from High-Resolution Remote Sensing Imagery.” Remote Sensing 14 (9): 2276. https://doi.org/10.3390/rs14092276.
  • Chen, L.-C., Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. 2018 Aug. “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.”
  • Chen, K., Z. Zou, and Z. Shi. 2021. “Building Extraction from Remote Sensing Images with Sparse Token Transformers.” Remote Sensing 13 (21): 4441. https://doi.org/10.3390/rs13214441.
  • Dai, J., H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei. 2017 Jun. “Deformable Convolutional Networks.”
  • Demir, I., K. Koperski, D. Lindenbaum, G. Pang, J. Huang, S. Basu, F. Hughes, D. Tuia, and R. Raskar. 2018. “DeepGlobe 2018: A Challenge to Parse the Earth Through Satellite Images.” In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, Jun., 172–17209. IEEE.
  • Deng, W., Q. Shi, and L. Jun. 2021. “Attention-Gate-Based Encoder–Decoder Network for Automatical Building Extraction.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14:2611–2620. https://doi.org/10.1109/JSTARS.2021.3058097.
  • Ding, X., X. Zhang, Y. Zhou, J. Han, G. Ding, and J. Sun. 2022 Apr. “Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs.”
  • Dixit, M., K. Chaurasia, and V. K Mishra. 2021. “Dilated-ResUnet: A Novel Deep Learning Architecture for Building Extraction from Medium Resolution Multi-Spectral Satellite Imagery.” Expert Systems with Applications 184:115530. https://doi.org/10.1016/j.eswa.2021.115530.
  • Dornaika, F., A. Moujahid, Y. El Merabet, and Y. Ruichek. 2016. “Building Detection from Orthophotos Using a Machine Learning Approach: An Empirical Study on Image Segmentation and Descriptors.” Expert Systems with Applications 58:130–142. https://doi.org/10.1016/j.eswa.2016.03.024.
  • Dosovitskiy, A., L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, et al. 2021 Jun. “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.”
  • Feng, D., H. Chen, Y. Xie, Z. Liu, Z. Liao, J. Zhu, and H. Zhang. 2022. “GCCINet: Global Feature Capture and Cross-Layer Information Interaction Network for Building Extraction from Remote Sensing Imagery.” International Journal of Applied Earth Observation and Geoinformation 114:103046. https://doi.org/10.1016/j.jag.2022.103046.
  • Feng, D., H. Chu, and L. Zheng. 2022. “Frequency Spectrum Intensity Attention Network for Building Detection from High-Resolution Imagery.” Remote Sensing 14 (21): 5457. https://doi.org/10.3390/rs14215457.
  • Gong, M., T. Liu, M. Zhang, Q. Zhang, D. Lu, H. Zheng, and F. Jiang. 2023. “Context– Content Collaborative Network for Building Extraction from High-Resolution Imagery.” Knowledge-Based Systems 263:110283. https://doi.org/10.1016/j.knosys.2023.110283.
  • Gu, J., H. Kwon, D. Wang, W. Ye, M. Li, Y.-H. Chen, L. Lai, V. Chandra, and D. Z. Pan. 2021 Nov. “Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation.”
  • Guo, H., B. Du, L. Zhang, and S. Xin. 2022. “A Coarse-To-Fine Boundary Refinement Network for Building Footprint Extraction from Remote Sensing Imagery.” ISPRS Journal of Photogrammetry & Remote Sensing 183:240–252. https://doi.org/10.1016/j.isprsjprs.2021.11.005.
  • Guo, M.-H., C-Z Lu, Q. Hou, Z. Liu, M.-M. Cheng, and S-M. Hu. 2022 Sep. “SegNext: Rethinking Convolutional Attention Design for Semantic Segmentation.”
  • Guo, H., Q. Shi, B. Du, L. Zhang, D. Wang, and H. Ding. 2021. “Scene-Driven Multitask Parallel Attention Network for Building Extraction in High-Resolution Remote Sensing Images.” IEEE Transactions on Geoscience & Remote Sensing 59 (5): 4287–4306. https://doi.org/10.1109/TGRS.2020.3014312.
  • Guo, H., Q. Shi, A. Marinoni, B. Du, and L. Zhang. 2021. “Deep Building Footprint Update Network: A Semi-Supervised Method for Updating Existing Building Footprint from Bi-Temporal Remote Sensing Images.” Remote Sensing of Environment 264:112589. https://doi.org/10.1016/j.rse.2021.112589.
  • He, T., Z. Zhang, H. Zhang, Z. Zhang, J. Xie, and M. Li. 2018 Dec. “Bag of Tricks for Image Classification with Convolutional Neural Networks.” ArXiv:1812.01187 [cs] http://arxiv.org/abs/1812.01187.
  • He, X., Y. Zhou, J. Zhao, D. Zhang, R. Yao, and Y. Xue. 2022. “Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation.” IEEE Transactions on Geoscience and Remote Sensing 60:1–15. https://doi.org/10.1109/TGRS.2022.3230846.
  • Hu, J., L. Shen, S. Albanie, G. Sun, and W. Enhua. 2019 May. “Squeeze-and-Excitation Networks.”
  • Hu, Q., L. Zhen, Y. Mao, X. Zhou, and G. Zhou. 2021. “Automated Building Extraction Using Satellite Remote Sensing Imagery.” Automation in Construction 123:103509. https://doi.org/10.1016/j.autcon.2020.103509.
  • Ji, S., S. Wei, and L. Meng. 2019. “Fully Convolutional Networks for Multisource Building Extraction from an Open Aerial and Satellite Imagery Data Set.” IEEE Transactions on Geoscience and Remote Sensing 57 (1): 574–586. https://doi.org/10.1109/TGRS.2018.2858817.
  • Kang, W., Y. Xiang, F. Wang, and H. You. 2019. “EU-Net: An Efficient Fully Convolutional Network for Building Extraction from Optical Remote Sensing Images.” Remote Sensing 11 (23): 2813. https://doi.org/10.3390/rs11232813.
  • Li, R., T. Chen, Y. Liu, and H. Jiang. 2023. “CoupleUnet: Swin Transformer Coupling CNNs Makes Strong Contextual Encoders for VHR Image Road Extraction.” International Journal of Remote Sensing 44 (18): 5788–5813. https://doi.org/10.1080/01431161.2023.2255353.
  • Liu, H., J. Liu, B Huang, X. Hu Y. Sun Yang Sun, N. Zhou, and. 2019. “DE-Net: Deep Encoding Network for Building Extraction from High-Resolution Remote Sensing Imagery.” Remote Sensing 11 (20): 2380. https://doi.org/10.3390/rs11202380.
  • Liu, Z., Y. Lin, Y. Cao, H. Han, Y. Wei, Z. Zhang, S. Lin, and B. Guo. 2021 Aug. “Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows.”
  • Liu, P., X. Liu, M. Liu, Q. Shi, J. Yang, X. Xiaocong, and Y. Zhang. 2019. “Building Footprint Extraction from High-Resolution Images via Spatial Residual Inception Convolutional Neural Network.” Remote Sensing 11 (7): 830. https://doi.org/10.3390/rs11070830.
  • Liu, Z., H. Mao, C-Y Wu, C. Feichtenhofer, T. Darrell, and S. Xie. 2022 Mar. “A ConvNet for the 2020s.”
  • Liu, Y., Y. Zhang, Y. Wang, F. Hou, J. Yuan, J. Tian, Y. Zhang, Z. Shi, J. Fan, and H. Zhiqiang. 2023. “A Survey of Visual Transformers.” In IEEE Transactions on Neural Networks and Learning Systems 1–21. https://doi.org/10.1109/TNNLS.2022.3227717.
  • Li, X., X. Yao, and Y. Fang. 2018. “Building-A-Nets: Robust Building Extraction from High-Resolution Remote Sensing Images with Adversarial Networks.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 11 (10): 3680–3687. https://doi.org/10.1109/JSTARS.2018.2865187.
  • Loshchilov, I., and F. Hutter. 2017 May. “LM-CMA: An Alternative to L-BFGS for Large-Scale Black Box Optimization.” Evolutionary Computation 25 (1, May): 143–171. https://doi.org/10.1162/EVCO_a_00168.
  • Loshchilov, I., and F. Hutter. 2019 Jan. “Decoupled Weight Decay Regularization.”
  • Maggiori, E., Y. Tarabalka, G. Charpiat, and P. Alliez. 2017. “Can Semantic Labeling Methods Generalize to Any City? The Inria Aerial Image Labeling Benchmark.” In 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, Jul, 3226–3229. IEEE.
  • Minaee, S., Y. Boykov, F. Porikli, A. Plaza, N. Kehtarnavaz, and D. Terzopoulos. 2020 Nov. “Image Segmentation Using Deep Learning: A Survey”. IEEE Transactions on Pattern Analysis & Machine Intelligence. 1–1. https://doi.org/10.1109/TPAMI.2021.3059968.
  • Mnih, V. 2013. Machine Learning for Aerial Image Labeling. Canada: University of Toronto.
  • Mou, L., Y. Hua, and X. X. Zhu. 2020. “Relation Matters: Relational Context-Aware Fully Convolutional Network for Semantic Segmentation of High-Resolution Aerial Images.” IEEE Transactions on Geoscience and Remote Sensing 58 (11): 7557–7569. https://doi.org/10.1109/TGRS.2020.2979552.
  • Pan, X., F. Yang, L. Gao, Z. Chen, B. Zhang, H. Fan, and J. Ren. 2019. “Building Extraction from High-Resolution Aerial Imagery Using a Generative Adversarial Network with Spatial and Channel Attention Mechanisms.” Remote Sensing 11 (8): 917. https://doi.org/10.3390/rs11080917.
  • Peng, C., X. Zhang, G. Yu, G. Luo, and J. Sun. 2017. “Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network.” In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, Jul, 1743–1751. IEEE.
  • Qiu, Y., F. Wu, J. Yin, C. Liu, X. Gong, and A. Wang. 2022. “MSL-Net: An Efficient Network for Building Extraction from Aerial Imagery.” Remote Sensing 14 (16): 3914. https://doi.org/10.3390/rs14163914.
  • Ren, S., D. Zhou, S. He, J. Feng, and X. Wang. 2022. “Shunted Self-Attention via Multi-Scale Token Aggregation.” In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, Jun., 10843–10852. IEEE.
  • Selvaraju, R. R., M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. 2020. “Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization.” International Journal of Computer Vision 128 (2): 336–359. https://doi.org/10.1007/s11263-019-01228-7.
  • Shao, Z., P. Tang, Z. Wang, N. Saleem, S. Yam, and C. Sommai. 2020. “BRRNet: A Fully Convolutional Neural Network for Automatic Building Extraction from High-Resolution Remote Sensing Images.” Remote Sensing 12 (6): 1050. https://doi.org/10.3390/rs12061050.
  • Stergiou, A., R. Poppe, and G. Kalliatakis. 2021 Mar. “Refining Activation Downsampling with SoftPool.”
  • Strudel, R., R. Garcia, I. Laptev, and C. Schmid. 2021 Sep. “Segmenter: Transformer for Semantic Segmentation.”
  • Sun, K., Y. Zhao, B. Jiang, T. Cheng, B. Xiao, D. Liu, Y. Mu, X.-G. Wang, W. Liu, and J. Wang. 2019 Apr. “High-Resolution Representations for Labeling Pixels and Regions.”
  • Tuli, S., I. Dasgupta, E. Grant, and T. L. Griffiths. 2021 Jul. “Are Convolutional Neural Networks or Transformers More like Human Vision?”
  • Tu, Z., H. Talebi, H. Zhang, F. Yang, P. Milanfar, A. Bovik, and L. Yinxiao. 2022 Sep. “MaxViT: Multi-Axis Vision Transformer.”
  • Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. 2017 Dec. “Attention Is All You Need.”
  • Wang, L., S. Fang, X. Meng, and L. Rui. 2022. “Building Extraction with Vision Transformer.” IEEE Transactions on Geoscience and Remote Sensing 60:1–11. https://doi.org/10.1109/TGRS.2022.3186634.
  • Wang, Q., B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu. 2020 Apr. “ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks.”
  • Wang, W., E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao. 2021 Aug. “Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions.”
  • Warth, G., A. Braun, O. Assmann, K. Fleckenstein, and V. Hochschild. 2020. “Prediction of Socio-Economic Indicators for Urban Planning Using VHR Satellite Imagery and Spatial Analysis.” Remote Sensing 12 (11): 1730. https://doi.org/10.3390/rs12111730.
  • Wei, S., S. Ji, and M. Lu. 2020. “Toward Automatic Building Footprint Delineation from Aerial Images Using CNN and Regularization.” IEEE Transactions on Geoscience and Remote Sensing 58 (3): 2178–2189. https://doi.org/10.1109/TGRS.2019.2954461.
  • Woo, S., J. Park, J.-Y. Lee, and I. So Kweon. 2018 Jul. “CBAM: Convolutional Block Attention Module.”
  • Xiao, T., Y. Liu, Y. Huang, M. Li, and G. Yang. 2023. “Enhancing Multiscale Representations with Transformer for Remote Sensing Image Semantic Segmentation.” IEEE Transactions on Geoscience and Remote Sensing 61:1–16. https://doi.org/10.1109/TGRS.2023.3256064.
  • Xie, E., W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo. 2021 Oct. “SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers.”
  • Xing, Z., S. Yang, X. Zan, X. Dong, Y. Yao, Z. Liu, and X. Zhang. 2023. “Flood Vulnerability Assessment of Urban Buildings Based on Integrating High-Resolution Remote Sensing and Street View Images.” Sustainable Cities and Society 92:104467. https://doi.org/10.1016/j.scs.2023.104467.
  • Yuan, W., and W. Xu. 2021. “MSST-Net: A Multi-Scale Adaptive Network for Building Extraction from Remote Sensing Images Based on Swin Transformer.” Remote Sensing 13 (23): 4743. https://doi.org/10.3390/rs13234743.
  • Yu, T. P. Tang, B. Zhao, S. Bai, P. Gou, J. Liao, and C. Jin. 2023. “ConvBnet: A Convolutional Network for Building Footprint Extraction.” IEEE Geoscience and Remote Sensing Letters 20:1–5. https://doi.org/10.1109/LGRS.2023.3250091.
  • Zeng, C., Y. Liu, A. Stein, and L. Jiao. 2015. “Characterization and Spatial Modeling of Urban Sprawl in the Wuhan Metropolitan Area, China.” International Journal of Applied Earth Observation and Geoinformation 34:10–24. https://doi.org/10.1016/j.jag.2014.06.012.
  • Zhang, H., Y. Liao, H. Yang, G. Yang, and L. Zhang. 2022. “A Local–Global Dual-Stream Network for Building Extraction from Very-High-Resolution Remote Sensing Images.” IEEE Transactions on Neural Networks and Learning Systems 33 (3): 1269–1283. https://doi.org/10.1109/TNNLS.2020.3041646.
  • Zhang, T., H. Tang, Y. Ding, P. Li, C. Ji, and P. Xu. 2021. “FSRSS-Net: High-Resolution Mapping of Buildings from Middle-Resolution Satellite Images Using a Super-Resolution Semantic Segmentation Network.” Remote Sensing 13 (12): 2290. https://doi.org/10.3390/rs13122290.
  • Zhang, Z., and Y. Wang. 2019. “JointNet: A Common Neural Network for Road and Building Extraction.” Remote Sensing 11 (6): 696. https://doi.org/10.3390/rs11060696.
  • Zhang, L., J. Wu, Y. Fan, H. Gao, and Y. Shao. 2020. “An Efficient Building Extraction Method from High Spatial Resolution Remote Sensing Images Based on Improved Mask R-CNN.” Sensors 20 (5): 1465. https://doi.org/10.3390/s20051465.
  • Zhang, R., Q. Zhang, and G. Zhang, Z. Zhang, H. Ji, H. Fan, Y. Zhang, and H. Wang. 2023. “SDSC-UNet: Dual Skip Connection ViT-Based U-Shaped Model for Building Extraction.” IEEE Geoscience and Remote Sensing Letters 20:1–5. https://doi.org/10.1109/LGRS.2023.3329687.
  • Zhang, H., X. Zheng, N. Zheng, and W. Shi. 2022. “A Multiscale and Multipath Network with Boundary Enhancement for Building Footprint Extraction from Remotely Sensed Imagery.” IEEE Journal of Selected Topics in Applied Earth Observations & Remote Sensing 15:8856–8869. https://doi.org/10.1109/JSTARS.2022.3214485.
  • Zhao, H., J. Shi, X. Qi, X. Wang, and J. Jia. 2017 Apr. “Pyramid Scene Parsing Network.”
  • Zheng, S., J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, Y. Fu, et al. 2021 Jul. “Rethinking Semantic Segmentation from a Sequence-To-Sequence Perspective with Transformers.”
  • Zhou, Y., Z. Chen, B. Wang, S. Li, H. Liu, D. Xu, and C. Ma. 2022. “BOMSC-Net: Boundary Optimization and Multi-Scale Context Awareness Based Building Extraction from High-Resolution Remote Sensing Imagery.” IEEE Transactions on Geoscience and Remote Sensing 60:1–17. https://doi.org/10.1109/TGRS.2022.3152575.
  • Zhou, D., B. Kang, X. Jin, L. Yang, X. Lian, Z. Jiang, Q. Hou, and J. Feng. 2021 Apr. “DeepViT: Towards Deeper Vision Transformer.”
  • Zhu, Q., C. Liao, H. Han, X. Mei, and H. Li. 2021. “MAP-Net: Multi Attending Path Neural Network for Building Footprint Extraction from Remote Sensed Imagery.” IEEE Transactions on Geoscience and Remote Sensing 59 (7): 6169–6181. https://doi.org/10.1109/TGRS.2020.3026051.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.