Search in:

International Journal of Remote Sensing Volume 44, 2023 - Issue 18

Submit an article Journal homepage

188

Views

CrossRef citations to date

Altmetric

Research Article

CoupleUNet: Swin Transformer coupling CNNs makes strong contextual encoders for VHR image road extraction

Ruirui Lia School of Information Science, Beijing University of Chemical Technology, Beijing, Chaoyang, China;b Key Laboratory of Image Interpretation and Intelligent Processing, Beijing University of Chemical Technology, Beijing, China

https://orcid.org/0000-0002-8014-7170 View further author information

Tao Chena School of Information Science, Beijing University of Chemical Technology, Beijing, Chaoyang, China;b Key Laboratory of Image Interpretation and Intelligent Processing, Beijing University of Chemical Technology, Beijing, China

https://orcid.org/0000-0002-4625-4781 View further author information

Yiran Liua School of Information Science, Beijing University of Chemical Technology, Beijing, Chaoyang, China;b Key Laboratory of Image Interpretation and Intelligent Processing, Beijing University of Chemical Technology, Beijing, China

https://orcid.org/0009-0003-4654-6076 View further author information

Haoyu Jianga School of Information Science, Beijing University of Chemical Technology, Beijing, Chaoyang, China;b Key Laboratory of Image Interpretation and Intelligent Processing, Beijing University of Chemical Technology, Beijing, ChinaCorrespondence[email protected]

https://orcid.org/0000-0002-8807-4113 View further author information

Pages 5788-5813 | Received 08 Jun 2023, Accepted 22 Aug 2023, Published online: 20 Sep 2023

Cite this article
https://doi.org/10.1080/01431161.2023.2255353
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Batra, A., S. Singh, G. Pang, C. J. Saikat Basu, and M. Paluri. 2019. “Improved Road Connectivity by Joint Learning of Orientation and Segmentation.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, United States of America. 10385–10393.
Google Scholar
Chen, T., D. Jiang, and L. Ruirui 2022. “Swin Transformers Make Strong Contextual Encoders for VHR Image Road Extraction.” In IGARSS 2022-2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia. 3019–3022. IEEE.
Google Scholar
Chen, L.-C., G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. 2017. “Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected Crfs.” IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (4): 834–848. https://doi.org/10.1109/TPAMI.2017.2699184.
PubMed Web of Science ®Google Scholar
Chen, L.-C., Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. 2018. “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.” In Proceedings of the European conference on computer vision (ECCV), Munich, Germany. 801–818.
Google Scholar
Dai, L., G. Zhang, and R. Zhang. 2023. “RADANet: Road Augmented Deformable Attention Network for Road Extraction from Complex High-Resolution Remote-Sensing Images.” IEEE Transactions on Geoscience and Remote Sensing 61:1–13. https://doi.org/10.1109/TGRS.2023.3237561.
Web of Science ®Google Scholar
Ding, L., and L. Bruzzone. 2021. “DiResnet: Direction-Aware Residual Network for Road Extraction in VHR Remote Sensing Images.” IEEE Transactions on Geoscience and Remote Sensing 59 (12): 10243–10254. https://doi.org/10.1109/TGRS.2020.3034011.
Web of Science ®Google Scholar
Dong, S., P. Wang, and K. Abbas. 2021. “A Survey on Deep Learning and Its Applications.” Computer Science Review 40:100379. https://doi.org/10.1016/j.cosrev.2021.100379.
Web of Science ®Google Scholar
Dosovitskiy, A., L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, et al. 2021. “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.” In International Conference on Learning Representations, Vienna, Austria.
Google Scholar
Fan, H., B. Xiong, K. Mangalam, L. Yanghao, Z. Yan, J. Malik, and C. Feichtenhofer. 2021. “Multiscale Vision Transformers.” In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, Canada. 6824–6835.
Google Scholar
Guo, M.-H., X. Tian-Xing, J.-J. Liu, Z.-N. Liu, P.-T. Jiang, M. Tai-Jiang, S.-H. Zhang, R. R. Martin, M.-M. Cheng, and H. Shi-Min. 2022. “Attention Mechanisms in Computer Vision: A Survey.” Computational Visual Media 8 (3): 331–368. https://doi.org/10.1007/s41095-022-0271-y.
Google Scholar
Guo, D., A. Weeks, and H. Klee. 2007. “Robust Approach for Suburban Road Segmentation in High-Resolution Aerial Images.” International Journal of Remote Sensing 28 (2): 307–318. https://doi.org/10.1080/01431160600721822.
Web of Science ®Google Scholar
Halme, E., O. Ihalainen, I. Korpela, and M. Mõttus. 2022. “Assessing Spatial Variability and Estimating Mean Crown Diameter in Boreal Forests Using Variograms and Amplitude Spectra of Very-High-Resolution Remote Sensing Data.” International Journal of Remote Sensing 43 (1): 349–369. https://doi.org/10.1080/01431161.2021.2018148.
Web of Science ®Google Scholar
Kenton, J. D. M.-W. C., and L. Kristina Toutanova. 2019. “Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” In Proceedings of naacL- HLT, Minneapolis, United States. Vol. 1, 2.
Google Scholar
Lan, M., Y. Zhang, L. Zhang, and B. Du. 2020. “Global Context Based Automatic Road Segmentation via Dilated Convolutional Neural Network.” Information Sciences 535:156–171. https://doi.org/10.1016/j.ins.2020.05.062.
Web of Science ®Google Scholar
Liang, S., Z. Hua, and L. Jinjiang. 2023. “Hybrid Transformer-CNN Networks Using Superpixel Segmentation for Remote Sensing Building Change Detection.” International Journal of Remote Sensing 44 (8): 2754–2780. https://doi.org/10.1080/01431161.2023.2208711.
Web of Science ®Google Scholar
Liu, Z., Y. Lin, Y. Cao, H. Han, Y. Wei, Z. Zhang, S. Lin, and B. Guo. 2021. “Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows.” In Proceedings of the IEEE/CVF international conference on computer vision, Montreal, Canada. 10012–10022.
Google Scholar
Liu, X., Z. Wang, J. Wan, J. Zhang, X. Yue, R. Liu, and Q. Miao. 2023. “RoadFormer: Road Extraction Using a Swin Transformer Combined with a Spatial and Channel Separable Convolution.” Remote Sensing 15 (4): 1049. https://doi.org/10.3390/rs15041049.
Web of Science ®Google Scholar
Liu, Y., J. Yao, L. Xiaohu, M. Xia, X. Wang, and Y. Liu. 2018. “RoadNet: Learning to Comprehensively Analyze Road Networks in Complex Urban Scenes from Highresolution Remotely Sensed Images.” IEEE Transactions on Geoscience and Remote Sensing 57 (4): 2043–2056. https://doi.org/10.1109/TGRS.2018.2870871.
Web of Science ®Google Scholar
Luo, L., J.-X. Wang, S.-B. Chen, J. Tang, and B. Luo. 2022. “BDTNet: Road Extraction by Bi-Direction Transformer from Remote Sensing Images.” IEEE Geoscience and Remote Sensing Letters 19:1–5. https://doi.org/10.1109/LGRS.2022.3183828.
Web of Science ®Google Scholar
Mnih, V., and G. E. Hinton. 2012. “Learning to Label Aerial Images from Noisy Data.” In Proceedings of the 29th International conference on machine learning (ICML-12), Edinburgh, UK. 567–574.
Google Scholar
Naseer, M. M., K. Ranasinghe, S. H. Khan, M. Hayat, F. Shahbaz Khan, and M.-H. Yang. 2021. “Intriguing Properties of Vision Transformers.“ In Advances in Neural Information Processing Systems, Montreal, Canada. 34:23296–23308.
Google Scholar
Pan, D., M. Zhang, and B. Zhang. 2021. “A Generic FCN-Based Approach for the Road-Network Extraction from VHR Remote Sensing Images – Using OpenStreetmap as Benchmarks.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14:2662–2673. https://doi.org/10.1109/JSTARS.2021.3058347.
Web of Science ®Google Scholar
Peng, Z., Z. Guo, W. Huang, Y. Wang, L. Xie, J. Jiao, Q. Tian, and Y. Qixiang. 2023. “Conformer: Local Features Coupling Global Representations for Recognition and Detection.” IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (8): 9454–9468. https://doi.org/10.1109/TPAMI.2023.3243048.
PubMed Web of Science ®Google Scholar
Raghu, M., T. Unterthiner, S. Kornblith, C. Zhang, and A. Dosovitskiy. 2021. “Do Vision Transformers See Like Convolutional Neural Networks?.“ In Advances in Neural Information Processing Systems, Montreal, Canada. 34:12116–12128.
Google Scholar
Reksten, J. H., and A.-B. Salberg. 2021. “Estimating Traffic in Urban Areas from Very-High Resolution Aerial Images.” International Journal of Remote Sensing 42 (3): 865–883. https://doi.org/10.1080/01431161.2020.1815891.
Web of Science ®Google Scholar
Ronneberger, O., P. Fischer, and T. Brox. 2015. “U-Net: Convolutional Networks for Biomedical Image Segmentation.” In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Proceedings, Part III 18, Munich, Germany, October 5-9, 2015, 234–241. Springer.
Google Scholar
Ruan, K., Y. Zha, Z. Zhou, and P. Yang. 2016. “A Modified GAC Model for Extracting Waterline from Remotely Sensed Imagery.” International Journal of Remote Sensing 37 (17): 3961–3973. https://doi.org/10.1080/01431161.2016.1207263.
Web of Science ®Google Scholar
Selvaraju, R. R., M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. 2019. “Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization.” International Journal of Computer Vision 128 (2): 336–359. https://doi.org/10.1007/s11263-019-01228-7.
Web of Science ®Google Scholar
Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. 2017. “Attention is All You Need.” Advances in Neural Information Processing Systems 30. https://doi.org/10.5555/3295222.3295349.
Google Scholar
Vinogradova, K., A. Dibrov, and G. Myers. 2020. “Towards Interpretable Semantic Segmentation via Gradient-Weighted Class Activation Mapping (Student Abstract).” Proceedings of the AAAI Conference on Artificial Intelligence 34 (10): 13943–13944. https://doi.org/10.1609/aaai.v34i10.7244.
Google Scholar
Wang, W., E. Xie, L. Xiang, D.-P. Fan, K. Song, D. Liang, L. Tong, P. Luo, and L. Shao. 2021. “Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions.” In Proceedings of the IEEE/CVF international conference on computer vision, Montreal, Canada. 568–578.
Google Scholar
Wang, W., N. Yang, Y. Zhang, F. Wang, T. Cao, and P. Eklund. 2016. “A Review of Road Extraction from Remote Sensing Images.” Journal of Traffic & Transportation Engineering 3 (3): 271–282. https://doi.org/10.1016/j.jtte.2016.05.005.
Google Scholar
Weihao, Y., M. Luo, P. Zhou, S. Chenyang, Y. Zhou, X. Wang, J. Feng, and S. Yan. 2022. “Metaformer is Actually What You Need for Vision.” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, New Orleans, LA, USA. 10819–10829.
Google Scholar
Yuan, L., Y. Chen, T. Wang, Y. Weihao, Y. Shi, Z.-H. Jiang, E. H. Francis, J. F. Tay, and S. Yan. 2021. “Tokens-To-Token ViT: Training Vision Transformers from Scratch on ImageNet.” In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada. October, 558–567.
Google Scholar
Zhang, X., Z. Xiao, L. Dongyang, M. Fan, and L. Zhao. 2019. “Semantic Segmentation of Remote Sensing Images Using Multiscale Decoding Network.” IEEE Geoscience and Remote Sensing Letters 16 (9): 1492–1496. https://doi.org/10.1109/LGRS.2019.2901592.
Web of Science ®Google Scholar
Zhiyong, X., W. Zhang, T. Zhang, Z. Yang, and L. Jiangyun. 2021. “Efficient Transformer for Remote Sensing Image Segmentation.” Remote Sensing 13 (18): 3585. https://doi.org/10.3390/rs13183585.
Web of Science ®Google Scholar
Zhou, G., W. Chen, L. Xianju, and L. Wang. 2021. “Split Depth-Wise Separable Graph-Convolution Network for Road Extraction in Complex Environments from High-resolution Remote-Sensing Images.“ IEEE Transactions on Geoscience and Remote Sensing 60:1–15. https://doi.org/10.1109/TGRS.2021.3128033.
Web of Science ®Google Scholar
Zhou, L., C. Zhang, and W. Ming 2018. “D-Linknet: LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA. 182–186.
Google Scholar
Zhu, Q., Y. Zhang, L. Wang, Y. Zhong, Q. Guan, L. Xiaoyan, L. Zhang, and L. Deren. 2021. “A Global Context-Aware and Batch-Independent Network for Road Extraction from VHR Satellite Imagery.” ISPRS Journal of Photogrammetry and Remote Sensing 175:353–365. https://doi.org/10.1016/j.isprsjprs.2021.03.016.
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

CoupleUNet: Swin Transformer coupling CNNs makes strong contextual encoders for VHR image road extraction

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

CoupleUNet: Swin Transformer coupling CNNs makes strong contextual encoders for VHR image road extraction

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date