139
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Incorporating convolutional and transformer architectures to enhance semantic segmentation of fine-resolution urban images

, &
Article: 2361768 | Received 07 Feb 2024, Accepted 26 May 2024, Published online: 04 Jun 2024

References

  • Abdollahi, A., & Pradhan, B. (2021). Integrated technique of segmentation and classification methods with connected components analysis for road extraction from orthophoto images. Expert Systems with Applications, 176, 114908. https://doi.org/10.1016/j.eswa.2021.114908
  • Audebert, N., Le Saux, B., & Lefèvre, S. (2018). Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks. ISPRS Journal of Photogrammetry and Remote Sensing, Geospatial Computer Vision, 140, 20–18. https://doi.org/10.1016/j.isprsjprs.2017.11.011
  • Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., & Wang, M. (2023). Swin-Unet: Unet-like pure transformer for medical image segmentation. In L. Karlinsky, T. Michaeli, & K. Nishino (Eds.), Computer vision – ECCV 2022 workshops, lecture notes in computer science (pp. 205–218). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-25066-8_9
  • Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. https://doi.org/10.48550/arXiv.2005.12872
  • Caye Daudt, R., Le Saux, B., & Boulch, A. (2018). Fully convolutional siamese networks for change detection. 2018 25th IEEE International Conference on Image Processing (ICIP) Presented at the 2018 25th IEEE International Conference on Image Processing (ICIP) (pp. 4063–4067). https://doi.org/10.1109/ICIP.2018.8451652
  • Chen, F., Jiang, H., Van de Voorde, T., Lu, S., Xu, W., & Zhou, Y. (2018). Land cover mapping in urban environments using hyperspectral APEX data: A study case in Baden, Switzerland. International Journal of Applied Earth Observation and Geoinformation, 71, 70–82. https://doi.org/10.1016/j.jag.2018.04.011
  • Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A., & Zhou, Y. (2021). TransUnet: Transformers make strong encoders for medical image segmentation. arXiv. org. Retrieved May 11, 2021, from https://arxiv.org/abs/2102.04306v1
  • Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2014). Semantic image segmentation with deep convolutional nets and fully connected CRFs [WWW document]. arXiv.org. Retrieved April 24, 2023, from https://arxiv.org/abs/1412.7062v4
  • Chen, F., Wang, K., Van de Voorde, T., & Tang, T. F. (2017). Mapping urban land cover from high spatial resolution hyperspectral data: An approach based on simultaneously unmixing similar pixels with jointly sparse spectral mixture analysis. Remote Sensing of Environment, 196, 324–342. https://doi.org/10.1016/j.rse.2017.05.014
  • Clark, M. L., & Kilham, N. E. (2016). Mapping of land cover in northern California with simulated hyperspectral satellite imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 119, 228–245. https://doi.org/10.1016/j.isprsjprs.2016.06.007
  • Coseo, P., & Larsen, L. (2019). Accurate characterization of land cover in urban environments: Determining the importance of including obscured impervious surfaces in urban heat island models. Atmosphere, 10(6), 347. https://doi.org/10.3390/atmos10060347
  • Ding, L., Lin, D., Lin, S., Zhang, J., Cui, X., Wang, Y., Tang, H., & Bruzzone, L. (2022). Looking outside the window: Wide-context transformer for the semantic segmentation of high-resolution remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–13. https://doi.org/10.1109/TGRS.2022.3168697
  • Ding, L., Zhu, K., Peng, D., Tang, H., Yang, K., & Bruzzone, L. (2024). Adapting segment anything model for change detection in VHR remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 62, 1–11. https://doi.org/10.1109/TGRS.2024.3368168
  • Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. https://doi.org/10.48550/arXiv.2010.11929
  • Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., & Lu, H. (2019). Dual attention network for scene segmentation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Presented at the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3141–3149). https://doi.org/10.1109/CVPR.2019.00326
  • Gao, L., Liu, H., Yang, M., Chen, L., Wan, Y., Xiao, Z., & Qian, Y. (2021). STransFuse: Fusing swin transformer and convolutional neural network for remote sensing image semantic segmentation. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14, 10990–11003. https://doi.org/10.1109/JSTARS.2021.3119654
  • Griffiths, D., & Boehm, J. (2019). Improving public data for building segmentation from convolutional neural networks (CNNs) for fused airborne lidar and image data using active contours. ISPRS Journal of Photogrammetry and Remote Sensing, 154, 70–83. https://doi.org/10.1016/j.isprsjprs.2019.05.013
  • He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1904–1916. https://doi.org/10.1109/TPAMI.2015.2389824
  • Hou, Q., Cheng, M. M., Hu, X. W., Borji, A., Tu, Z., & Torr, P. (2019). Deeply supervised salient object detection with short connections. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(4), 815–828. https://doi.org/10.1109/TPAMI.2018.2815688
  • Huang, X., Yuan, W., Li, J., & Zhang, L. (2017). A new building extraction postprocessing framework for high-spatial-resolution remote-sensing imagery. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10(2), 654–668. https://doi.org/10.1109/JSTARS.2016.2587324
  • Huang, X., & Zhang, L. (2012). Morphological building/shadow index for building extraction from high-resolution imagery over urban areas. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 5(1), 161–172. https://doi.org/10.1109/JSTARS.2011.2168195
  • Hu, P., Perazzi, F., Heilbron, F. C., Wang, O., Lin, Z., Saenko, K., & Sclaroff, S. (2021). Real-time semantic segmentation with fast attention. IEEE Robotics and Automation Letters, 6(1), 263–270. https://doi.org/10.1109/LRA.2020.3039744
  • Kampffmeyer, M., Salberg, A. B., & Jenssen, R. (2016). Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks. 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Presented at the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (pp. 680–688). https://doi.org/10.1109/CVPRW.2016.90
  • Kemker, R., Salvaggio, C., & Kanan, C. (2018). Algorithms for semantic segmentation of multispectral remote sensing imagery using deep learning. ISPRS Journal of Photogrammetry and Remote Sensing, Deep Learning RS Data, 145, 60–77. https://doi.org/10.1016/j.isprsjprs.2018.04.014
  • Kotaridis, I., & Lazaridou, M. (2021). Remote sensing image segmentation advances: A meta-analysis. ISPRS Journal of Photogrammetry and Remote Sensing, 173, 309–322. https://doi.org/10.1016/j.isprsjprs.2021.01.020
  • Li, T., Jiang, C., Bian, Z., Wang, M., & Niu, X. (2020). Semantic segmentation of urban street scene based on convolutional neural network. Journal of Physics Conference Series, 1682(1), 012077. https://doi.org/10.1088/1742-6596/1682/1/012077
  • Li, Y., Liu, Z., Yang, J., & Zhang, H. (2023). Wavelet transform feature enhancement for semantic segmentation of remote sensing images. Remote Sensing, 15(24), 5644. https://doi.org/10.3390/rs15245644
  • Lin, T., Wang, Y., Liu, X., & Qiu, X. (2021). A survey of transformers [WWW document]. arXiv.org. Retrieved June 16, 2023, from https://arxiv.org/abs/2106.04554v2
  • Liu, Q., Kampffmeyer, M., Jenssen, R., & Salberg, A. B. (2020). Dense dilated convolutions’ merging network for land cover classification. IEEE Transactions on Geoscience and Remote Sensing, 58(9), 6309–6320. https://doi.org/10.1109/TGRS.2020.2976658
  • Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Presented at the 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 9992–10002). https://doi.org/10.1109/ICCV48922.2021.00986
  • Liu, Y., Minh Nguyen, D., Deligiannis, N., Ding, W., & Munteanu, A. (2017). Hourglass-ShapeNetwork based semantic segmentation for high resolution aerial imagery. Remote Sensing, 9(6), 522. https://doi.org/10.3390/rs9060522
  • Liu, R., Tao, F., Liu, X., Na, J., Leng, H., Wu, J., & Zhou, T. (2022). RAANet: A residual ASPP with attention framework for semantic segmentation of high-resolution remote sensing images. Remote Sensing, 14(13), 3109. https://doi.org/10.3390/rs14133109
  • Li, X., Xu, F., Xia, R., Lyu, X., Gao, H., & Tong, Y. (2021). Hybridizing cross-level contextual and attentive representations for remote sensing imagery semantic segmentation. Remote Sensing, 13(15), 2986. https://doi.org/10.3390/rs13152986
  • Li, R., Zheng, S., Duan, C., Su, J., & Zhang, C. (2022). Multistage attention ResU-net for semantic segmentation of fine-resolution remote sensing images. IEEE Geoscience and Remote Sensing Letters, 19, 1–5. https://doi.org/10.1109/LGRS.2021.3063381
  • Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Presented at the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3431–3440). https://doi.org/10.1109/CVPR.2015.7298965
  • Ma, L., Liu, Y., Zhang, X., Ye, Y., Yin, G., & Johnson, B. A. (2019). Deep learning in remote sensing applications: A meta-analysis and review. ISPRS Journal of Photogrammetry and Remote Sensing, 152, 166–177. https://doi.org/10.1016/j.isprsjprs.2019.04.015
  • Ma, X., Man, Q., Yang, X., Dong, P., Yang, Z., Wu, J., & Liu, C. (2023). Urban feature extraction within a complex urban area with an improved 3D-CNN using airborne hyperspectral data. Remote Sensing, 15(4), 992. https://doi.org/10.3390/rs15040992
  • Mezaal, M. R., Pradhan, B., Shafri, H. Z. M., & Yusoff, Z. M. (2017). Automatic landslide detection using Dempster–Shafer theory from LiDAR-derived data and orthophotos. Geomatics, Natural Hazards and Risk, 8(2), 1935–1954. https://doi.org/10.1080/19475705.2017.1401013
  • Mou, L., & Zhu, X. X. (2018). RiFCN: Recurrent network in fully convolutional network for semantic segmentation of high resolution remote sensing images. arXiv.org. Retrieved May 5, 2018, from https://doi.org/10.48550/arXiv.1805.0209
  • Nogueira, K., Dalla Mura, M., Chanussot, J., Schwartz, W. R., & dos Santos, J. A. (2019). Dynamic multicontext segmentation of remote sensing images based on convolutional networks. IEEE Transactions on Geoscience and Remote Sensing, 57(10), 7503–7520. https://doi.org/10.1109/TGRS.2019.2913861
  • Ok, A. O. (2013). Automated detection of buildings from single VHR multispectral images using shadow information and graph cuts. ISPRS Journal of Photogrammetry and Remote Sensing, 86, 21–40. https://doi.org/10.1016/j.isprsjprs.2013.09.004
  • Oršić, M., & Šegvić, S. (2021). Efficient semantic segmentation with pyramidal fusion. Pattern Recognition, 110, 107611. https://doi.org/10.1016/j.patcog.2020.107611
  • Pan, X., Gao, L., Marinoni, A., Zhang, B., Yang, F., & Gamba, P. (2018). Semantic labeling of high resolution aerial imagery and LiDAR data with fine segmentation network. Remote Sensing, 10(5), 743. https://doi.org/10.3390/rs10050743
  • Romera, E., Álvarez, J. M., Bergasa, L. M., & Arroyo, R. (2018). ERFNet: Efficient residual factorized ConvNet for real-time semantic segmentation. IEEE Transactions on Intelligent Transportation Systems, 19(1), 263–272. https://doi.org/10.1109/TITS.2017.2750080
  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In N. Navab, J. Hornegger, W.M. Wells, & A.F. Frangi (Eds.), Medical image computing and computer-assisted intervention – MICCAI 2015, lecture notes in computer science (pp. 234–241). Springer International Publishing. https://doi.org/10.1007/978-3-319-24574-4_28
  • Samie, A., Abbas, A., Azeem, M. M., Hamid, S., Iqbal, M. A., Hasan, S. S., & Deng, X. (2020). Examining the impacts of future land use/land cover changes on climate in Punjab province, Pakistan: Implications for environmental sustainability and economic growth. Environmental Science and Pollution Research, 27(20), 25415–25433. https://doi.org/10.1007/s11356-020-08984-x
  • Shamsolmoali, P., Zareapoor, M., Zhou, H., Wang, R., & Yang, J. (2021). Road segmentation for remote sensing images using adversarial spatial pyramid networks. IEEE Transactions on Geoscience and Remote Sensing, 59(6), 4673–4688. https://doi.org/10.1109/TGRS.2020.3016086
  • Shen, Y., Chen, J., Xiao, L., & Pan, D. (2019). Optimizing multiscale segmentation with local spectral heterogeneity measure for high resolution remote sensing images. ISPRS Journal of Photogrammetry and Remote Sensing, 157, 13–25. https://doi.org/10.1016/j.isprsjprs.2019.08.014
  • Strudel, R., Garcia, R., Laptev, I., & Schmid, C. (2021). Segmenter: Transformer for semantic segmentation. arXiv.org. Retrieved May 11, 2021, from https://doi.org/10.48550/arXiv.2105.05633
  • Su, Y., Cheng, J., Bai, H., Liu, H., & He, C. (2022). Semantic segmentation of very-high-resolution remote sensing images via deep multi-feature learning. Remote Sensing, 14(3), 533. https://doi.org/10.3390/rs14030533
  • Sun, Y., Tian, Y., & Xu, Y. (2019). Problems of encoder-decoder frameworks for high-resolution remote sensing image segmentation: Structural stereotype and insufficient learning. Neurocomputing, 330, 297–304. https://doi.org/10.1016/j.neucom.2018.11.051
  • Tang, L., & Werner, T. T. (2023). Global mining footprint mapped from high-resolution satellite imagery. Communications Earth & Environment, 4(1), 1–12. https://doi.org/10.1038/s43247-023-00805-6
  • Tong, X. Y., Xia, G. S., Lu, Q., Shen, H., Li, S., You, S., & Zhang, L. (2020). Land-cover classification with high-resolution remote sensing images using transferable deep models. Remote Sensing of Environment, 237, 111322. https://doi.org/10.1016/j.rse.2019.111322
  • Ünsalan, C., & Boyer, K. L. (2005). A system to detect houses and residential street networks in multispectral satellite images. Computer Vision and Image Understanding, 98(3), 423–461. https://doi.org/10.1016/j.cviu.2004.10.006
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need [WWW document]. arXiv.org. Retrieved April 25, 2023, from https://arxiv.org/abs/1706.03762v5
  • Vobecky, A., Hurych, D., Siméoni, O., Gidaris, S., Bursuc, A., Pérez, P., & Sivic, J. (2022). Drive & segment: Unsupervised semantic segmentation of urban scenes via cross-modal distillation. Proceedings of the European Conference on Computer Vision (ECCV), 13679, 478–495. https://doi.org/10.48550/arXiv.2203.11160
  • Wang, Z., Chen, J., & Hoi, S. C. H. (2021). Deep learning for image super-resolution: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10), 3365–3387. https://doi.org/10.1109/TPAMI.2020.2982166
  • Wang, L., Li, R., Duan, C., Zhang, C., Meng, X., & Fang, S. (2022). A novel transformer based semantic segmentation scheme for fine-resolution remote sensing images. IEEE Geoscience and Remote Sensing Letters, 19, 1–5. https://doi.org/10.1109/LGRS.2022.3143368
  • Wang, L., Li, R., Zhang, C., Fang, S., Duan, C., Meng, X., & Atkinson, P. M. (2022). UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 190, 196–214. https://doi.org/10.1016/j.isprsjprs.2022.06.008
  • Wang, J., Song, J., Chen, M., & Yang, Z. (2015). Road network extraction: A neural-dynamic framework based on deep learning and a finite state machine. International Journal of Remote Sensing, 36(12), 3144–3169. https://doi.org/10.1080/01431161.2015.1054049
  • Wang, M., Zhang, X., Niu, X., Wang, F., & Zhang, X. (2019). Scene classification of high-resolution remotely sensed image based on ResNet. Journal of Geovisualization and Spatial Analysis, 3(2), 16. https://doi.org/10.1007/s41651-019-0039-9
  • Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J. M., & Luo, P. (2021). SegFormer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems (NeurIPS), 34, 12077–12090. https://doi.org/10.48550/arXiv.2105.15203
  • Xing, J., Sieber, R., & Caelli, T. (2018). A scale-invariant change detection method for land use/cover change research. ISPRS Journal of Photogrammetry and Remote Sensing, 141, 252–264. https://doi.org/10.1016/j.isprsjprs.2018.04.013
  • Yang, L., Huang, C., Homer, C. G., Wylie, B. K., & Coan, M. J. (2003). An approach for mapping large-area impervious surfaces: Synergistic use of Landsat-7 ETM+ and high spatial resolution imagery. Canadian Journal of Remote Sensing, 29(2), 230–240. https://doi.org/10.5589/m02-098
  • Yang, X., Li, S., Chen, Z., Chanussot, J., Jia, X., Zhang, B., Li, B., & Chen, P. (2021). An attention-fused network for semantic segmentation of very-high-resolution remote sensing imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 177, 238–262. https://doi.org/10.1016/j.isprsjprs.2021.05.004
  • Yuan, Q., & Mohd Shafri, H. Z. (2022). Multi-modal feature fusion network with adaptive center point detector for building instance extraction. Remote Sensing, 14(19), 4920. https://doi.org/10.3390/rs14194920
  • Yuan, Q., Shafri, H. Z. M., Alias, A. H., & Hashim, S. J. B. (2021). Multiscale semantic feature optimization and fusion network for building extraction using high-resolution aerial images and LiDAR data. Remote Sensing, 13(13), 2473. https://doi.org/10.3390/rs13132473
  • Yu, B., Yang, L., & Chen, F. (2018). Semantic segmentation for high spatial resolution remote sensing images based on convolution neural network and pyramid pooling module. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11(9), 3252–3261. https://doi.org/10.1109/JSTARS.2018.2860989
  • Zhang, B., Kong, Y., Leung, H., & Xing, S. (2019). Urban UAV images semantic segmentation based on fully convolutional networks with digital surface models. 2019 Tenth International Conference on Intelligent Control and Information Processing (ICICIP). Presented at the 2019 Tenth International Conference on Intelligent Control and Information Processing (ICICIP) (pp. 1–6). https://doi.org/10.1109/ICICIP47338.2019.9012207
  • Zhang, X., Li, L., Di, D., Wang, J., Chen, G., Jing, W., & Emam, M. (2022). SERNet: Squeeze and excitation residual network for semantic segmentation of high-resolution remote sensing images. Remote Sensing, 14(19), 4770. https://doi.org/10.3390/rs14194770
  • Zhang, Z., Liu, F., Liu, C., Tian, Q., & Qu, H. (2023). ACTNet: A dual-attention adapter with a CNN-transformer network for the semantic segmentation of remote sensing imagery. Remote Sensing, 15(9), 2363. https://doi.org/10.3390/rs15092363
  • Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Presented at the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 6230–6239). https://doi.org/10.1109/CVPR.2017.660