98
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Cross-modal retrieval of remote sensing images and text based on self-attention unsupervised deep common feature space

, , &
Pages 3892-3909 | Received 27 Nov 2022, Accepted 08 Jun 2023, Published online: 12 Jul 2023

References

  • Abdullah, T., Y. Bazi, M. Rahhal, M. L. Mekhalfi, L. Rangarajan, and M. Zuair. 2020. “TextRS: Deep Bidirectional Triplet Network for Matching Text to Remote Sensing Images.” Remote Sensing 12 (3): 405. https://doi.org/10.3390/rs12030405.
  • Andrej, K., and F. Li. 2015. “Deep Visual-Semantic Alignments for Generating Image Descriptions.” Paper presented at 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA. 3128–3137.
  • Andrew, G., R. Arora, J. Bilmes, and K. Livescu. 2013. “Deep Canonical Correlation Analysis.” Paper presented at 2013 International Conference on International Conference on Machine Learning, Atlanta, GA, USA. 1247–1255.
  • Chaudhuri, U., B. Banerjee, A. Bhattacharya, and M. Datcu. 2020. “CMIR-NET: A Deep Learning Based Model for Cross-Modal Retrieval in Remote Sensing.” Pattern Recognition Letters 131:456–462. https://doi.org/10.1016/j.patrec.2020.02.006.
  • Chen, H., G. Ding, X. Liu, Z. Lin, J. Liu, and J. Han. 2020. “IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval.” Paper presented at 2020 IEEE Conference on Computer Vision and Pattern Recognition 12652–12660. https://doi.org/10.1109/CVPR42600.2020.01267.
  • Cheng, Q., Y. Zhou, P. Fu, Y. Xu, and L. Zhang. 2021. “A Deep Semantic Alignment Network for the Cross-Modal Image-Text Retrieval in Remote Sensing.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14:4284–4296. https://doi.org/10.1109/JSTARS.2021.3070872.
  • Datta, R., D. Joshi, J. Li, and J. Z. Wang. 2008. “Image Retrieval: Ideas, Influences, and Trends of the New Age.” ACM Computing Surveys 40 (2): 1–60. https://doi.org/10.1145/1348246.1348248.
  • Faghri, F., D. J. Fleet, J. R. Kiros, and S. Fidler. 2017. “VSE++: Improving Visual-Semantic Embeddings with Hard Negatives.” arXiv:1707.05612 [cs].
  • Gu, J., J. Cai, S. Joty, L. Niu, and G. Wang. 2018. “Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models.” Paper presented at 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, USA. 7181–7189.
  • He, K., X. Zhang, S. Ren, and J. Sun. 2016. “Deep Residual Learning for Image Recognition.” Paper presented at 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA. 770–778.
  • Hotelling, H. 1936. “Relations Between Two Sets of Variates.” Biometrika 28:321–377. https://doi.org/10.1093/biomet/28.3-4.321.
  • Hoxha, G., F. Melgani, and B. Demir. 2020. “Toward Remote Sensing Image Retrieval Under a Deep Image Captioning Perspective.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13:4462–4475. https://doi.org/10.1109/JSTARS.2020.3013818.
  • Kaur, P., H. S. Pannu, and A. K. Malhi. 2021. “Comparative Analysis on Cross-Modal Information Retrieval: A Review.” Computer Science Review 39 (2): 100336. https://doi.org/10.1016/j.cosrev.2020.100336.
  • Kingma, D., and J. Ba. 2015. “Adam: A Method for Stochastic Optimization.“ Paper presented at 2015 International Conference for Learning Representations, San Diego, CA, USA. doi 10.48550/arXiv.1412.6980.
  • Kiros, R., R. Salakhutdinov, and R. S. Zemel. 2014. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models. Computer Science. https://arxiv.org/pdf/1411.2539.
  • Lee, K., X. Chen, G. Hua, H. Hu, and X. He. 2018. “Stacked Cross Attention for Image-Text Matching.” Paper presented at 2018 European Conference on Computer Vision, Munich, Germany. 11208: 212–228.
  • Le, Q., and T. Mikolov. 2014. “Distributed Representations of Sentences and Documents.” Paper presented at 2014 International Conference on Machine Learning JMLR, Detroit, MI USA. 32.
  • Li, Y., Y. Zhang, C. Tao, and H. Zhu. 2016. “Content-Based High-Resolution Remote Sensing Image Retrieval via Unsupervised Feature Learning and Collaborative Affinity Metric Fusion.” Remote Sensing 8 (9). https://doi.org/10.3390/rs8090709.
  • Lowe, D. G. 1999. “Object Recognition from Local Scale-Invariant Features.” Paper presented at 1999 IEEE Conference on Computer Vision, Kerkyra, Greece. 1150–1157.
  • Lu, X., B. Wang, X. Zheng, and X. Li. 2018. “Exploring Models and Data for Remote Sensing Image Caption Generation.” IEEE Transactions Geoscience Remote Sensing 56 (4): 2183–2195. https://doi.org/10.1109/TGRS.2017.2776321.
  • Pereira, J. C., E. Coviello, G. Doyle, N. Rasiwasia, G. R. G. Lanckriet, R. Levy, and N. Vasconcelos. 2014. “On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval.” IEEE Transactions on Pattern Analysis and Machine Intelligence 36 (3): 521–535. https://doi.org/10.1109/TPAMI.2013.142.
  • Qu, B., X. Li, D. Tao, and X. Lu. 2016. “Deep Semantic Understanding of High Resolution Remote Sensing Image.” Paper presented at 2016 International Conference on Computer, Information Telecommunication Systems, Kunming, Yunnan, China. 1–5.
  • Rasiwasia, N., J. C. Pereira, and E. Coviello. 2010. “A New Approach to Cross-Modal Multimedia Retrieval.” Paper presented at 2010 18th International Conference on Multimedia, 251–260, Firenze, Italy.
  • Simonyan, K., and A. Zisserman. 2015. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” Paper presented at 2015 International Conference on Learning Representations, San Diego, CA, USA.
  • Socher, R., A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng. 2014. “Grounded Compositional Semantics for Finding and Describing Images with Sentences.” Transactions of the Association for Computational Linguistics 2:207–218. https://doi.org/10.1162/tacl_a_00177.
  • Vendrov, I., R. Kiros, S. Fidler, and R. Urtasun. 2016. “Order Embeddings of Images and Language.” Paper presented at 2016 International Conference on Learning Representations, San Juan, Puerto Rico.
  • Wang, X., N. Chen, Z. Chen, X. Yang, and J. Li. 2016. “Earth Observation Metadata Ontology Model for Spatiotemporal-Spectral Semantic-Enhanced Satellite Observation Discovery: A Case Study of Soil Moisture Monitoring.” GIscience & Remote Sensing 53 (1): 22–44. https://doi.org/10.1080/15481603.2015.1092490.
  • Wang, Y., J. Ma, M. Li, X. Tang, X. Han, and L. Jiao. 2022. “Multi-Scale Interactive Transformer for Remote Sensing Cross-Modal Image-Text Retrieval.” Paper presented at 2022 IEEE International Geoscience and Remote Sensing Symposium, 839–842. https://doi.org/10.1109/IGARSS46834.2022.9883252.
  • Wang, Y., H. Yang, X. Qian, L. Ma, J. Lu, B. Li, and X. Fan. 2019. “Position Focused Attention Network for Image-Text Matching.” Paper presented at 2019 International Joint Conference Artificial Intelligence, Macau SAR, China. 3792–3798.
  • Wolfmuller, M., D. Dietrich, E. Sireteanu, S. Kiemle, E. Mikusch, and M. Bottcher. 2009. “Dataflow and Workflow Organization-The Data Management for the TerraSAR-X Payload Ground Segment.” IEEE Transactions on Geoscience and Remote Sensing: A Publication of the IEEE Geoscience and Remote Sensing Society 47 (1): 44–50. https://doi.org/10.1109/TGRS.2008.2003074.
  • Wu, Y., S. Wang, and Q. Huang. 2018. “Learning Semantic Structure-Preserved Embeddings for Cross-Modal Retrieval.” Paper presented at 2018 ACM International Conference on Multimedia, Seoul, Republic of Korea. 825–833.
  • Zhen, L., P. Hu, X. Wang, and D. Peng. 2019. “Deep Supervised Cross-Modal Retrieval.” Paper presented at 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA. 10386–10395.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.