Search in:

International Journal of Remote Sensing Volume 44, 2023 - Issue 4

Submit an article Journal homepage

171

Views

CrossRef citations to date

Altmetric

Research Article

Spatial-specific Transformer with involution for semantic segmentation of high-resolution remote sensing images

Xinjia Wua Faculty of Information Technology, Beijing University of Technology, Beijing, ChinaView further author information

Jing Zhanga Faculty of Information Technology, Beijing University of Technology, Beijing, China;b Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, ChinaCorrespondence[email protected]

https://orcid.org/0000-0003-1290-0738 View further author information

Wensheng Lia Faculty of Information Technology, Beijing University of Technology, Beijing, ChinaView further author information

Jiafeng Lia Faculty of Information Technology, Beijing University of Technology, Beijing, China;b Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, ChinaView further author information

Li Zhuoa Faculty of Information Technology, Beijing University of Technology, Beijing, China;b Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing, China

https://orcid.org/0000-0002-9937-2669 View further author information

Jie Zhangc Institute of Mathematical Geology Remote Sensing, China University of Geosciences, Wuhan, ChinaView further author information

Pages 1280-1307 | Received 15 Sep 2022, Accepted 08 Feb 2023, Published online: 08 Mar 2023

Cite this article
https://doi.org/10.1080/01431161.2023.2179897
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Chen, L.C., Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. 2018. “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.” In 2018 European Conference on Computer Vision, 801–818. Munich, Germany. doi:10.1007/978-3-030-01234-2_49.
Google Scholar
Chollet, F. 2017. “Xception: Deep Learning with Depthwise Separable Convolutions.” In 2017 IEEE Conference on Computer Vision and Pattern Recognition, 1251–1258. Honolulu, USA. doi:10.1109/CVPR.2017.195.
Google Scholar
Dosovitskiy, A., L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, and S. Gelly. 2020. “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.“ International Conference on Learning Representations, Virtual 1–22.
Google Scholar
Esser, P., R. Rombach, and B. Ommer. 2021. “Taming Transformers for High-Resolution Image Synthesis.” In 2021 IEEE Conference on Computer Vision and Pattern Recognition, 12873–12883. Nashville, USA. doi:10.1109/CVPR46437.2021.01268.
Google Scholar
Fu, J., J. Liu, H. Tian, L. Yong, Y. Bao, Z. Fang, and L. Hanqing 2019. “Dual Attention Network for Scene Segmentation.” In 2019 IEEE Conference on Computer Vision and Pattern Recognition, 3146–3154. Long Beach, USA. doi:10.1109/CVPR.2019.00326.
Google Scholar
Gao, L., H. Liu, M. Yang, L. Chen, Y. Wan, Z. Xiao, and Y. Qian. 2021. “STransFuse: Fusing Swin Transformer and Convolutional Neural Network for Remote Sensing Image Semantic Segmentation.” IEEE Journal of Selected Topics in Applied Earth Observations Remote Sensing 14: 10990–11003. doi:10.1109/JSTARS.2021.3119654.
Web of Science ®Google Scholar
He, J., Z. Deng, and Y. Qiao. 2019. “Dynamic Multi-Scale Filters for Semantic Segmentation.” In 2019 IEEE International Conference on Computer Vision, 3562–3572. Seoul, Korea (South). doi:10.1109/ICCV.2019.00366.
Google Scholar
He, J., Z. Deng, L. Zhou, Y. Wang, and Y. Qiao. 2019. “Adaptive Pyramid Context Network for Semantic Segmentation.” In 2019 IEEE Conference on Computer Vision and Pattern Recognition, 7519–7528. Long Beach, USA. doi:10.1109/CVPR.2019.00770.
Google Scholar
Hu, J., L. Shen, and G. Sun. 2018. “Squeeze-And-Excitation Networks.” In 2018 IEEE Conference on Computer Vision and Pattern Recognition, 7132–7141. Salt Lake City, USA. doi:10.1109/TPAMI.2019.2913372.
Google Scholar
Khan, N., U. Chaudhuri, B. Banerjee, and S. Chaudhuri. 2019. “Graph Convolutional Network for Multi-Label VHR Remote Sensing Scene Recognition.” Neurocomputing 357: 36–46. doi:10.1016/j.neucom.2019.05.024.
Web of Science ®Google Scholar
Khan, S., M. Naseer, M. Hayat, S. Waqas Zamir, F. Shahbaz Khan, and M. Shah. 2021. “Transformers in Vision: A Survey.“ ACM Computing Surveys (CSUR) 54: 1–41.
Web of Science ®Google Scholar
Kolesnikov, A., L. Beyer, X. Zhai, J. Puigcerver, J. Yung, S. Gelly, and N. Houlsby. 2020. “Big Transfer (BiT): General Visual Representation Learning.” In 2020 European Conference on Computer Vision, Virtual 491–507.
Google Scholar
LeCun, Y., Y. Bengio, and G. Hinton. 2015. “Deep Learning.” Nature 521 (7553): 436–444. doi:10.1038/nature14539.
PubMed Web of Science ®Google Scholar
LeCun, Y., L. Bottou, Y. Bengio, and P. Haffner. 1998. “Gradient-Based Learning Applied to Document Recognition.” Proceedings of the IEEE 86 (11): 2278–2324. doi:10.1109/5.726791.
Web of Science ®Google Scholar
Li, D., J. Hu, C. Wang, X. Li, Q. She, L. Zhu, T. Zhang, and Q. Chen. 2021. “Involution: Inverting the Inherence of Convolution for Visual Recognition.” In 2021 IEEE Conference on Computer Vision and Pattern Recognition, 12316–12325. Virtual. doi:10.1109/cvpr46437.2021.01214.
Google Scholar
Lin, T.Y., P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie. 2017. “Feature Pyramid Networks for Object Detection.” In 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2117–2125. Honolulu, USA. doi:10.1109/CVPR.2017.106.
Google Scholar
Liu, Z., H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao, Z. Zhang, and L. Dong. 2022. “Swin Transformer V2: Scaling Up Capacity and Resolution.” In 2022 IEEE Conference on Computer Vision and Pattern Recognition, 12009–12019. New Orleans, USA.
Google Scholar
Liu, Z., Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo. 2021. “Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows.” In 2021 IEEE International Conference on Computer Vision, 10012–10022. Montreal, Canada. doi:10.1109/ICCV48922.2021.00986.
Google Scholar
Li, X., F. Xu, X. Lyu, H. Gao, Y. Tong, S. Cai, S. Li, and D. Liu. 2021. “Dual Attention Deep Fusion Semantic Segmentation Networks of Large-Scale Satellite Remote-Sensing Images.” International Journal of Remote Sensing 42 (9): 3583–3610. doi:10.1080/01431161.2021.1876272.
Web of Science ®Google Scholar
Li, Y., K. Zhang, J. Cao, R. Timofte, and L. Van Gool. 2021. “LocalViT: Bringing Locality to Vision Transformers.“ doi:10.48550/arXiv.2104.05707.
Google Scholar
Long, J., E. Shelhamer, and T. Darrell. 2015. “Fully Convolutional Networks for Semantic Segmentation.” In 2015 IEEE Conference on Computer Vision and Pattern Recognition, 3431–3440. Boston, USA. doi:10.1109/TPAMI.2016.2572683.
Google Scholar
Loshchilov, I., and F. Hutter. 2017. “Decoupled Weight Decay Regularization.” arXiv preprint arXiv:.05101. doi:10.1162/EVCO_a_00168.
Google Scholar
Ma, L., Y. Liu, X. Zhang, Y. Ye, G. Yin, and B. Alan Johnson. 2019. “Deep Learning in Remote Sensing Applications: A Meta-Analysis and Review.” Isprs Journal of Photogrammetry and Remote Sensing 152: 166–177. doi:10.1016/j.isprsjprs.2019.04.015.
Web of Science ®Google Scholar
Pan, X., C. Ge, R. Lu, S. Song, G. Chen, Z. Huang, and G. Huang. 2022. “On the Integration of Self-Attention and Convolution.” In 2022 IEEE Conference on Computer Vision and Pattern Recognition, 815–825. New Orleans, USA.
Google Scholar
Peng, Z., W. Huang, S. Gu, L. Xie, Y. Wang, J. Jiao, and Q. Ye. 2021. “Conformer: Local Features Coupling Global Representations for Visual Recognition.” In 2021 IEEE International Conference on Computer Vision, 367–376. Virtual. doi:10.1109/ICCV48922.2021.00042.
Google Scholar
Ronneberger, O., P. Fischer, and T. Brox. 2015. “U-Net: Convolutional Networks for Biomedical Image Segmentation.” In 2015 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, 234–241. Munich, Germany. doi:10.1007/978-3-319-24574-4_28.
Google Scholar
Sun, K., B. Xiao, D. Liu, and J. Wang. 2019. “Deep High-Resolution Representation Learning for Human Pose Estimation.” In 2019 IEEE Conference on Computer Vision and Pattern Recognition, 5693–5703. Long Beach, USA. doi:10.1109/CVPR.2019.00584.
Google Scholar
Tian, J., J. Zhang, W. Li, J. Li, and L. Zhuo. 2022. “Structurally Re-Parameterized Rotation Detector for Arbitrary-Oriented Objects in High-Resolution Remote Sensing Images.” International Journal of Remote Sensing 43 (1): 241–269. doi:10.1080/01431161.2021.2012294.
Web of Science ®Google Scholar
Tolstikhin, I. O., N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Unterthiner, J. Yung, et al. 2021. “MLP-Mixer: An All-MLP Architecture for Vision.” Advances in Neural Information Processing Systems 34: 24261–24272.
Google Scholar
Touvron, H., M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou. 2021. “Training Data-Efficient Image Transformers & Distillation Through Attention.” In International Conference on Machine Learning, Virtual, 10347–10357.
Google Scholar
Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. 2017. “Attention is All You Need.” In 2017 Neural Information Processing Systems, 5998–6008. Long Beach, USA.
Google Scholar
Wang, L., S. Fang, C. Zhang, R. Li, and C. Duan. 2021. “Efficient Hybrid Transformer: Learning Global-Local Context for Urban Scene Segmentation.” arXiv preprint arXiv:.08937.
Google Scholar
Woo, S., J. Park, J.Y. Lee, and I. So Kweon. 2018. “CBAM: Convolutional Block Attention Module.” In 2018 European Conference on Computer Vision, 3–19. Munich, Germany. doi:10.1007/978-3-030-01234-2_1.
Google Scholar
Xiao, T., Y. Liu, B. Zhou, Y. Jiang, and J. Sun. 2018. “Unified Perceptual Parsing for Scene Understanding.” In 2018 European Conference on Computer Vision, 418–434. Munich, Germany. doi:10.1007/978-3-030-01228-1_26.
Google Scholar
Xie, S., R. Girshick, P. Dollár, Z. Tu, and K. He. 2017. “Aggregated Residual Transformations for Deep Neural Networks.” In 2017 IEEE Conference on Computer Vision and Pattern Recognition, 1492–1500. Honolulu, USA. doi:10.1109/CVPR.2017.634.
Google Scholar
Xu, Z., C. Su, and X. Zhang. 2021. “A Semantic Segmentation Method with Category Boundary for Land Use and Land Cover Mapping of Very-High Resolution Remote Sensing Image.” International Journal of Remote Sensing 42 (8): 3146–3165. doi:10.1080/01431161.2020.1871100.
Web of Science ®Google Scholar
Xu, Z., W. Zhang, T. Zhang, and J. Li. 2020. “HRCNet: High-Resolution Context Extraction Network for Semantic Segmentation of Remote Sensing Images.” Remote Sensing 13 (1): 71. doi:10.3390/rs13010071.
Web of Science ®Google Scholar
Xu, Z., W. Zhang, T. Zhang, Z. Yang, and J. Li. 2021. “Efficient Transformer for Remote Sensing Image Segmentation.” Remote Sensing 13 (18): 3585. doi:10.3390/rs13183585.
Web of Science ®Google Scholar
Yan, H., C. Zhang, and W. Ming. 2022. “Lawin Transformer: Improving Semantic Segmentation Transformer with Multi-Scale Representations via Large Window Attention.” arXiv preprint arXiv:.01615. doi:10.1161/STROKEAHA.122.041725.
Google Scholar
Yuan, X., J. Shi, and L. Gu. 2021. “A Review of Deep Learning Methods for Semantic Segmentation of Remote Sensing Imagery.” Expert Systems with Applications 169: 114417. doi:10.1016/j.eswa.2020.114417.
Web of Science ®Google Scholar
Yu, Q., Y. Xia, Y. Bai, Y. Lu, A. L. Yuille, and W. Shen. 2021. “Glance-And-Gaze Vision Transformer.” Advances in Neural Information Processing Systems 34: 12992–13003.
Google Scholar
Zang, N., Y. Cao, Y. Wang, B. Huang, L. Zhang, and P. Takis Mathiopoulos. 2021. “Land-Use Mapping for High-Spatial Resolution Remote Sensing Image via Deep Learning: A Review.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14: 5372–5391. doi:10.1109/JSTARS.2021.3078631.
Web of Science ®Google Scholar
Zhao, H., J. Shi, X. Qi, X. Wang, and J. Jia. 2017. “Pyramid Scene Parsing Network.” In 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2881–2890. Honolulu, USA. doi:10.1109/CVPR.2017.660.
Google Scholar
Zhao, X., J. Zhang, J. Tian, L. Zhuo, and J. Zhang. 2021. “Multiscale Object Detection in High-Resolution Remote Sensing Images via Rotation Invariant Deep Features Driven by Channel Attention.” International Journal of Remote Sensing 42 (15): 5764–5783. doi:10.1080/01431161.2021.1931537.
Web of Science ®Google Scholar
Zheng, S., J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, Y. Fu, J. Feng, T. Xiang, and P. H. Torr. 2021. “Rethinking Semantic Segmentation from a Sequence-To-Sequence Perspective with Transformers.” In 2021 IEEE Conference on Computer Vision and Pattern Recognition, 6881–6890. Virtual. doi:10.1109/CVPR46437.2021.00681.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Spatial-specific Transformer with involution for semantic segmentation of high-resolution remote sensing images

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Spatial-specific Transformer with involution for semantic segmentation of high-resolution remote sensing images

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date