171
Views
1
CrossRef citations to date
0
Altmetric
Research Article

Spatial-specific Transformer with involution for semantic segmentation of high-resolution remote sensing images

, ORCID Icon, , , ORCID Icon &
Pages 1280-1307 | Received 15 Sep 2022, Accepted 08 Feb 2023, Published online: 08 Mar 2023

References

  • Chen, L.C., Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. 2018. “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.” In 2018 European Conference on Computer Vision, 801–818. Munich, Germany. doi:10.1007/978-3-030-01234-2_49.
  • Chollet, F. 2017. “Xception: Deep Learning with Depthwise Separable Convolutions.” In 2017 IEEE Conference on Computer Vision and Pattern Recognition, 1251–1258. Honolulu, USA. doi:10.1109/CVPR.2017.195.
  • Dosovitskiy, A., L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, and S. Gelly. 2020. “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.“ International Conference on Learning Representations, Virtual 1–22.
  • Esser, P., R. Rombach, and B. Ommer. 2021. “Taming Transformers for High-Resolution Image Synthesis.” In 2021 IEEE Conference on Computer Vision and Pattern Recognition, 12873–12883. Nashville, USA. doi:10.1109/CVPR46437.2021.01268.
  • Fu, J., J. Liu, H. Tian, L. Yong, Y. Bao, Z. Fang, and L. Hanqing 2019. “Dual Attention Network for Scene Segmentation.” In 2019 IEEE Conference on Computer Vision and Pattern Recognition, 3146–3154. Long Beach, USA. doi:10.1109/CVPR.2019.00326.
  • Gao, L., H. Liu, M. Yang, L. Chen, Y. Wan, Z. Xiao, and Y. Qian. 2021. “STransFuse: Fusing Swin Transformer and Convolutional Neural Network for Remote Sensing Image Semantic Segmentation.” IEEE Journal of Selected Topics in Applied Earth Observations Remote Sensing 14: 10990–11003. doi:10.1109/JSTARS.2021.3119654.
  • He, J., Z. Deng, and Y. Qiao. 2019. “Dynamic Multi-Scale Filters for Semantic Segmentation.” In 2019 IEEE International Conference on Computer Vision, 3562–3572. Seoul, Korea (South). doi:10.1109/ICCV.2019.00366.
  • He, J., Z. Deng, L. Zhou, Y. Wang, and Y. Qiao. 2019. “Adaptive Pyramid Context Network for Semantic Segmentation.” In 2019 IEEE Conference on Computer Vision and Pattern Recognition, 7519–7528. Long Beach, USA. doi:10.1109/CVPR.2019.00770.
  • Hu, J., L. Shen, and G. Sun. 2018. “Squeeze-And-Excitation Networks.” In 2018 IEEE Conference on Computer Vision and Pattern Recognition, 7132–7141. Salt Lake City, USA. doi:10.1109/TPAMI.2019.2913372.
  • Khan, N., U. Chaudhuri, B. Banerjee, and S. Chaudhuri. 2019. “Graph Convolutional Network for Multi-Label VHR Remote Sensing Scene Recognition.” Neurocomputing 357: 36–46. doi:10.1016/j.neucom.2019.05.024.
  • Khan, S., M. Naseer, M. Hayat, S. Waqas Zamir, F. Shahbaz Khan, and M. Shah. 2021. “Transformers in Vision: A Survey.“ ACM Computing Surveys (CSUR) 54: 1–41.
  • Kolesnikov, A., L. Beyer, X. Zhai, J. Puigcerver, J. Yung, S. Gelly, and N. Houlsby. 2020. “Big Transfer (BiT): General Visual Representation Learning.” In 2020 European Conference on Computer Vision, Virtual 491–507.
  • LeCun, Y., Y. Bengio, and G. Hinton. 2015. “Deep Learning.” Nature 521 (7553): 436–444. doi:10.1038/nature14539.
  • LeCun, Y., L. Bottou, Y. Bengio, and P. Haffner. 1998. “Gradient-Based Learning Applied to Document Recognition.” Proceedings of the IEEE 86 (11): 2278–2324. doi:10.1109/5.726791.
  • Li, D., J. Hu, C. Wang, X. Li, Q. She, L. Zhu, T. Zhang, and Q. Chen. 2021. “Involution: Inverting the Inherence of Convolution for Visual Recognition.” In 2021 IEEE Conference on Computer Vision and Pattern Recognition, 12316–12325. Virtual. doi:10.1109/cvpr46437.2021.01214.
  • Lin, T.Y., P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie. 2017. “Feature Pyramid Networks for Object Detection.” In 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2117–2125. Honolulu, USA. doi:10.1109/CVPR.2017.106.
  • Liu, Z., H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao, Z. Zhang, and L. Dong. 2022. “Swin Transformer V2: Scaling Up Capacity and Resolution.” In 2022 IEEE Conference on Computer Vision and Pattern Recognition, 12009–12019. New Orleans, USA.
  • Liu, Z., Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo. 2021. “Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows.” In 2021 IEEE International Conference on Computer Vision, 10012–10022. Montreal, Canada. doi:10.1109/ICCV48922.2021.00986.
  • Li, X., F. Xu, X. Lyu, H. Gao, Y. Tong, S. Cai, S. Li, and D. Liu. 2021. “Dual Attention Deep Fusion Semantic Segmentation Networks of Large-Scale Satellite Remote-Sensing Images.” International Journal of Remote Sensing 42 (9): 3583–3610. doi:10.1080/01431161.2021.1876272.
  • Li, Y., K. Zhang, J. Cao, R. Timofte, and L. Van Gool. 2021. “LocalViT: Bringing Locality to Vision Transformers.“ doi:10.48550/arXiv.2104.05707.
  • Long, J., E. Shelhamer, and T. Darrell. 2015. “Fully Convolutional Networks for Semantic Segmentation.” In 2015 IEEE Conference on Computer Vision and Pattern Recognition, 3431–3440. Boston, USA. doi:10.1109/TPAMI.2016.2572683.
  • Loshchilov, I., and F. Hutter. 2017. “Decoupled Weight Decay Regularization.” arXiv preprint arXiv:.05101. doi:10.1162/EVCO_a_00168.
  • Ma, L., Y. Liu, X. Zhang, Y. Ye, G. Yin, and B. Alan Johnson. 2019. “Deep Learning in Remote Sensing Applications: A Meta-Analysis and Review.” Isprs Journal of Photogrammetry and Remote Sensing 152: 166–177. doi:10.1016/j.isprsjprs.2019.04.015.
  • Pan, X., C. Ge, R. Lu, S. Song, G. Chen, Z. Huang, and G. Huang. 2022. “On the Integration of Self-Attention and Convolution.” In 2022 IEEE Conference on Computer Vision and Pattern Recognition, 815–825. New Orleans, USA.
  • Peng, Z., W. Huang, S. Gu, L. Xie, Y. Wang, J. Jiao, and Q. Ye. 2021. “Conformer: Local Features Coupling Global Representations for Visual Recognition.” In 2021 IEEE International Conference on Computer Vision, 367–376. Virtual. doi:10.1109/ICCV48922.2021.00042.
  • Ronneberger, O., P. Fischer, and T. Brox. 2015. “U-Net: Convolutional Networks for Biomedical Image Segmentation.” In 2015 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, 234–241. Munich, Germany. doi:10.1007/978-3-319-24574-4_28.
  • Sun, K., B. Xiao, D. Liu, and J. Wang. 2019. “Deep High-Resolution Representation Learning for Human Pose Estimation.” In 2019 IEEE Conference on Computer Vision and Pattern Recognition, 5693–5703. Long Beach, USA. doi:10.1109/CVPR.2019.00584.
  • Tian, J., J. Zhang, W. Li, J. Li, and L. Zhuo. 2022. “Structurally Re-Parameterized Rotation Detector for Arbitrary-Oriented Objects in High-Resolution Remote Sensing Images.” International Journal of Remote Sensing 43 (1): 241–269. doi:10.1080/01431161.2021.2012294.
  • Tolstikhin, I. O., N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Unterthiner, J. Yung, et al. 2021. “MLP-Mixer: An All-MLP Architecture for Vision.” Advances in Neural Information Processing Systems 34: 24261–24272.
  • Touvron, H., M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou. 2021. “Training Data-Efficient Image Transformers & Distillation Through Attention.” In International Conference on Machine Learning, Virtual, 10347–10357.
  • Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. 2017. “Attention is All You Need.” In 2017 Neural Information Processing Systems, 5998–6008. Long Beach, USA.
  • Wang, L., S. Fang, C. Zhang, R. Li, and C. Duan. 2021. “Efficient Hybrid Transformer: Learning Global-Local Context for Urban Scene Segmentation.” arXiv preprint arXiv:.08937.
  • Woo, S., J. Park, J.Y. Lee, and I. So Kweon. 2018. “CBAM: Convolutional Block Attention Module.” In 2018 European Conference on Computer Vision, 3–19. Munich, Germany. doi:10.1007/978-3-030-01234-2_1.
  • Xiao, T., Y. Liu, B. Zhou, Y. Jiang, and J. Sun. 2018. “Unified Perceptual Parsing for Scene Understanding.” In 2018 European Conference on Computer Vision, 418–434. Munich, Germany. doi:10.1007/978-3-030-01228-1_26.
  • Xie, S., R. Girshick, P. Dollár, Z. Tu, and K. He. 2017. “Aggregated Residual Transformations for Deep Neural Networks.” In 2017 IEEE Conference on Computer Vision and Pattern Recognition, 1492–1500. Honolulu, USA. doi:10.1109/CVPR.2017.634.
  • Xu, Z., C. Su, and X. Zhang. 2021. “A Semantic Segmentation Method with Category Boundary for Land Use and Land Cover Mapping of Very-High Resolution Remote Sensing Image.” International Journal of Remote Sensing 42 (8): 3146–3165. doi:10.1080/01431161.2020.1871100.
  • Xu, Z., W. Zhang, T. Zhang, and J. Li. 2020. “HRCNet: High-Resolution Context Extraction Network for Semantic Segmentation of Remote Sensing Images.” Remote Sensing 13 (1): 71. doi:10.3390/rs13010071.
  • Xu, Z., W. Zhang, T. Zhang, Z. Yang, and J. Li. 2021. “Efficient Transformer for Remote Sensing Image Segmentation.” Remote Sensing 13 (18): 3585. doi:10.3390/rs13183585.
  • Yan, H., C. Zhang, and W. Ming. 2022. “Lawin Transformer: Improving Semantic Segmentation Transformer with Multi-Scale Representations via Large Window Attention.” arXiv preprint arXiv:.01615. doi:10.1161/STROKEAHA.122.041725.
  • Yuan, X., J. Shi, and L. Gu. 2021. “A Review of Deep Learning Methods for Semantic Segmentation of Remote Sensing Imagery.” Expert Systems with Applications 169: 114417. doi:10.1016/j.eswa.2020.114417.
  • Yu, Q., Y. Xia, Y. Bai, Y. Lu, A. L. Yuille, and W. Shen. 2021. “Glance-And-Gaze Vision Transformer.” Advances in Neural Information Processing Systems 34: 12992–13003.
  • Zang, N., Y. Cao, Y. Wang, B. Huang, L. Zhang, and P. Takis Mathiopoulos. 2021. “Land-Use Mapping for High-Spatial Resolution Remote Sensing Image via Deep Learning: A Review.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14: 5372–5391. doi:10.1109/JSTARS.2021.3078631.
  • Zhao, H., J. Shi, X. Qi, X. Wang, and J. Jia. 2017. “Pyramid Scene Parsing Network.” In 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2881–2890. Honolulu, USA. doi:10.1109/CVPR.2017.660.
  • Zhao, X., J. Zhang, J. Tian, L. Zhuo, and J. Zhang. 2021. “Multiscale Object Detection in High-Resolution Remote Sensing Images via Rotation Invariant Deep Features Driven by Channel Attention.” International Journal of Remote Sensing 42 (15): 5764–5783. doi:10.1080/01431161.2021.1931537.
  • Zheng, S., J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, Y. Fu, J. Feng, T. Xiang, and P. H. Torr. 2021. “Rethinking Semantic Segmentation from a Sequence-To-Sequence Perspective with Transformers.” In 2021 IEEE Conference on Computer Vision and Pattern Recognition, 6881–6890. Virtual. doi:10.1109/CVPR46437.2021.00681.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.