188
Views
2
CrossRef citations to date
0
Altmetric
Research Article

CoupleUNet: Swin Transformer coupling CNNs makes strong contextual encoders for VHR image road extraction

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Pages 5788-5813 | Received 08 Jun 2023, Accepted 22 Aug 2023, Published online: 20 Sep 2023

References

  • Batra, A., S. Singh, G. Pang, C. J. Saikat Basu, and M. Paluri. 2019. “Improved Road Connectivity by Joint Learning of Orientation and Segmentation.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, United States of America. 10385–10393.
  • Chen, T., D. Jiang, and L. Ruirui 2022. “Swin Transformers Make Strong Contextual Encoders for VHR Image Road Extraction.” In IGARSS 2022-2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia. 3019–3022. IEEE.
  • Chen, L.-C., G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. 2017. “Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected Crfs.” IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (4): 834–848. https://doi.org/10.1109/TPAMI.2017.2699184.
  • Chen, L.-C., Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. 2018. “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.” In Proceedings of the European conference on computer vision (ECCV), Munich, Germany. 801–818.
  • Dai, L., G. Zhang, and R. Zhang. 2023. “RADANet: Road Augmented Deformable Attention Network for Road Extraction from Complex High-Resolution Remote-Sensing Images.” IEEE Transactions on Geoscience and Remote Sensing 61:1–13. https://doi.org/10.1109/TGRS.2023.3237561.
  • Ding, L., and L. Bruzzone. 2021. “DiResnet: Direction-Aware Residual Network for Road Extraction in VHR Remote Sensing Images.” IEEE Transactions on Geoscience and Remote Sensing 59 (12): 10243–10254. https://doi.org/10.1109/TGRS.2020.3034011.
  • Dong, S., P. Wang, and K. Abbas. 2021. “A Survey on Deep Learning and Its Applications.” Computer Science Review 40:100379. https://doi.org/10.1016/j.cosrev.2021.100379.
  • Dosovitskiy, A., L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, et al. 2021. “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.” In International Conference on Learning Representations, Vienna, Austria.
  • Fan, H., B. Xiong, K. Mangalam, L. Yanghao, Z. Yan, J. Malik, and C. Feichtenhofer. 2021. “Multiscale Vision Transformers.” In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, Canada. 6824–6835.
  • Guo, M.-H., X. Tian-Xing, J.-J. Liu, Z.-N. Liu, P.-T. Jiang, M. Tai-Jiang, S.-H. Zhang, R. R. Martin, M.-M. Cheng, and H. Shi-Min. 2022. “Attention Mechanisms in Computer Vision: A Survey.” Computational Visual Media 8 (3): 331–368. https://doi.org/10.1007/s41095-022-0271-y.
  • Guo, D., A. Weeks, and H. Klee. 2007. “Robust Approach for Suburban Road Segmentation in High-Resolution Aerial Images.” International Journal of Remote Sensing 28 (2): 307–318. https://doi.org/10.1080/01431160600721822.
  • Halme, E., O. Ihalainen, I. Korpela, and M. Mõttus. 2022. “Assessing Spatial Variability and Estimating Mean Crown Diameter in Boreal Forests Using Variograms and Amplitude Spectra of Very-High-Resolution Remote Sensing Data.” International Journal of Remote Sensing 43 (1): 349–369. https://doi.org/10.1080/01431161.2021.2018148.
  • Kenton, J. D. M.-W. C., and L. Kristina Toutanova. 2019. “Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” In Proceedings of naacL- HLT, Minneapolis, United States. Vol. 1, 2.
  • Lan, M., Y. Zhang, L. Zhang, and B. Du. 2020. “Global Context Based Automatic Road Segmentation via Dilated Convolutional Neural Network.” Information Sciences 535:156–171. https://doi.org/10.1016/j.ins.2020.05.062.
  • Liang, S., Z. Hua, and L. Jinjiang. 2023. “Hybrid Transformer-CNN Networks Using Superpixel Segmentation for Remote Sensing Building Change Detection.” International Journal of Remote Sensing 44 (8): 2754–2780. https://doi.org/10.1080/01431161.2023.2208711.
  • Liu, Z., Y. Lin, Y. Cao, H. Han, Y. Wei, Z. Zhang, S. Lin, and B. Guo. 2021. “Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows.” In Proceedings of the IEEE/CVF international conference on computer vision, Montreal, Canada. 10012–10022.
  • Liu, X., Z. Wang, J. Wan, J. Zhang, X. Yue, R. Liu, and Q. Miao. 2023. “RoadFormer: Road Extraction Using a Swin Transformer Combined with a Spatial and Channel Separable Convolution.” Remote Sensing 15 (4): 1049. https://doi.org/10.3390/rs15041049.
  • Liu, Y., J. Yao, L. Xiaohu, M. Xia, X. Wang, and Y. Liu. 2018. “RoadNet: Learning to Comprehensively Analyze Road Networks in Complex Urban Scenes from Highresolution Remotely Sensed Images.” IEEE Transactions on Geoscience and Remote Sensing 57 (4): 2043–2056. https://doi.org/10.1109/TGRS.2018.2870871.
  • Luo, L., J.-X. Wang, S.-B. Chen, J. Tang, and B. Luo. 2022. “BDTNet: Road Extraction by Bi-Direction Transformer from Remote Sensing Images.” IEEE Geoscience and Remote Sensing Letters 19:1–5. https://doi.org/10.1109/LGRS.2022.3183828.
  • Mnih, V., and G. E. Hinton. 2012. “Learning to Label Aerial Images from Noisy Data.” In Proceedings of the 29th International conference on machine learning (ICML-12), Edinburgh, UK. 567–574.
  • Naseer, M. M., K. Ranasinghe, S. H. Khan, M. Hayat, F. Shahbaz Khan, and M.-H. Yang. 2021. “Intriguing Properties of Vision Transformers.“ In Advances in Neural Information Processing Systems, Montreal, Canada. 34:23296–23308.
  • Pan, D., M. Zhang, and B. Zhang. 2021. “A Generic FCN-Based Approach for the Road-Network Extraction from VHR Remote Sensing Images – Using OpenStreetmap as Benchmarks.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14:2662–2673. https://doi.org/10.1109/JSTARS.2021.3058347.
  • Peng, Z., Z. Guo, W. Huang, Y. Wang, L. Xie, J. Jiao, Q. Tian, and Y. Qixiang. 2023. “Conformer: Local Features Coupling Global Representations for Recognition and Detection.” IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (8): 9454–9468. https://doi.org/10.1109/TPAMI.2023.3243048.
  • Raghu, M., T. Unterthiner, S. Kornblith, C. Zhang, and A. Dosovitskiy. 2021. “Do Vision Transformers See Like Convolutional Neural Networks?.“ In Advances in Neural Information Processing Systems, Montreal, Canada. 34:12116–12128.
  • Reksten, J. H., and A.-B. Salberg. 2021. “Estimating Traffic in Urban Areas from Very-High Resolution Aerial Images.” International Journal of Remote Sensing 42 (3): 865–883. https://doi.org/10.1080/01431161.2020.1815891.
  • Ronneberger, O., P. Fischer, and T. Brox. 2015. “U-Net: Convolutional Networks for Biomedical Image Segmentation.” In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Proceedings, Part III 18, Munich, Germany, October 5-9, 2015, 234–241. Springer.
  • Ruan, K., Y. Zha, Z. Zhou, and P. Yang. 2016. “A Modified GAC Model for Extracting Waterline from Remotely Sensed Imagery.” International Journal of Remote Sensing 37 (17): 3961–3973. https://doi.org/10.1080/01431161.2016.1207263.
  • Selvaraju, R. R., M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. 2019. “Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization.” International Journal of Computer Vision 128 (2): 336–359. https://doi.org/10.1007/s11263-019-01228-7.
  • Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. 2017. “Attention is All You Need.” Advances in Neural Information Processing Systems 30. https://doi.org/10.5555/3295222.3295349.
  • Vinogradova, K., A. Dibrov, and G. Myers. 2020. “Towards Interpretable Semantic Segmentation via Gradient-Weighted Class Activation Mapping (Student Abstract).” Proceedings of the AAAI Conference on Artificial Intelligence 34 (10): 13943–13944. https://doi.org/10.1609/aaai.v34i10.7244.
  • Wang, W., E. Xie, L. Xiang, D.-P. Fan, K. Song, D. Liang, L. Tong, P. Luo, and L. Shao. 2021. “Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions.” In Proceedings of the IEEE/CVF international conference on computer vision, Montreal, Canada. 568–578.
  • Wang, W., N. Yang, Y. Zhang, F. Wang, T. Cao, and P. Eklund. 2016. “A Review of Road Extraction from Remote Sensing Images.” Journal of Traffic & Transportation Engineering 3 (3): 271–282. https://doi.org/10.1016/j.jtte.2016.05.005.
  • Weihao, Y., M. Luo, P. Zhou, S. Chenyang, Y. Zhou, X. Wang, J. Feng, and S. Yan. 2022. “Metaformer is Actually What You Need for Vision.” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, New Orleans, LA, USA. 10819–10829.
  • Yuan, L., Y. Chen, T. Wang, Y. Weihao, Y. Shi, Z.-H. Jiang, E. H. Francis, J. F. Tay, and S. Yan. 2021. “Tokens-To-Token ViT: Training Vision Transformers from Scratch on ImageNet.” In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada. October, 558–567.
  • Zhang, X., Z. Xiao, L. Dongyang, M. Fan, and L. Zhao. 2019. “Semantic Segmentation of Remote Sensing Images Using Multiscale Decoding Network.” IEEE Geoscience and Remote Sensing Letters 16 (9): 1492–1496. https://doi.org/10.1109/LGRS.2019.2901592.
  • Zhiyong, X., W. Zhang, T. Zhang, Z. Yang, and L. Jiangyun. 2021. “Efficient Transformer for Remote Sensing Image Segmentation.” Remote Sensing 13 (18): 3585. https://doi.org/10.3390/rs13183585.
  • Zhou, G., W. Chen, L. Xianju, and L. Wang. 2021. “Split Depth-Wise Separable Graph-Convolution Network for Road Extraction in Complex Environments from High-resolution Remote-Sensing Images.“ IEEE Transactions on Geoscience and Remote Sensing 60:1–15. https://doi.org/10.1109/TGRS.2021.3128033.
  • Zhou, L., C. Zhang, and W. Ming 2018. “D-Linknet: LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA. 182–186.
  • Zhu, Q., Y. Zhang, L. Wang, Y. Zhong, Q. Guan, L. Xiaoyan, L. Zhang, and L. Deren. 2021. “A Global Context-Aware and Batch-Independent Network for Road Extraction from VHR Satellite Imagery.” ISPRS Journal of Photogrammetry and Remote Sensing 175:353–365. https://doi.org/10.1016/j.isprsjprs.2021.03.016.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.