Search in:

Advanced search

International Journal of Remote Sensing Volume 44, 2023 - Issue 12

Submit an article Journal homepage

Views

CrossRef citations to date

Altmetric

Research Article

Cross-modal retrieval of remote sensing images and text based on self-attention unsupervised deep common feature space

Qilin DingCollege of Information and Communication, National University of Defense Technology, Wuhan, China

Haisu ZhangCollege of Information and Communication, National University of Defense Technology, Wuhan, ChinaCorrespondence[email protected]

Xu WangCollege of Information and Communication, National University of Defense Technology, Wuhan, China

Weipeng LiCollege of Information and Communication, National University of Defense Technology, Wuhan, China

Pages 3892-3909 | Received 27 Nov 2022, Accepted 08 Jun 2023, Published online: 12 Jul 2023

Cite this article
https://doi.org/10.1080/01431161.2023.2225705
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Abdullah, T., Y. Bazi, M. Rahhal, M. L. Mekhalfi, L. Rangarajan, and M. Zuair. 2020. “TextRS: Deep Bidirectional Triplet Network for Matching Text to Remote Sensing Images.” Remote Sensing 12 (3): 405. https://doi.org/10.3390/rs12030405.
Web of Science ®Google Scholar
Andrej, K., and F. Li. 2015. “Deep Visual-Semantic Alignments for Generating Image Descriptions.” Paper presented at 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA. 3128–3137.
Google Scholar
Andrew, G., R. Arora, J. Bilmes, and K. Livescu. 2013. “Deep Canonical Correlation Analysis.” Paper presented at 2013 International Conference on International Conference on Machine Learning, Atlanta, GA, USA. 1247–1255.
Google Scholar
Chaudhuri, U., B. Banerjee, A. Bhattacharya, and M. Datcu. 2020. “CMIR-NET: A Deep Learning Based Model for Cross-Modal Retrieval in Remote Sensing.” Pattern Recognition Letters 131:456–462. https://doi.org/10.1016/j.patrec.2020.02.006.
Web of Science ®Google Scholar
Chen, H., G. Ding, X. Liu, Z. Lin, J. Liu, and J. Han. 2020. “IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval.” Paper presented at 2020 IEEE Conference on Computer Vision and Pattern Recognition 12652–12660. https://doi.org/10.1109/CVPR42600.2020.01267.
Google Scholar
Cheng, Q., Y. Zhou, P. Fu, Y. Xu, and L. Zhang. 2021. “A Deep Semantic Alignment Network for the Cross-Modal Image-Text Retrieval in Remote Sensing.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14:4284–4296. https://doi.org/10.1109/JSTARS.2021.3070872.
Web of Science ®Google Scholar
Datta, R., D. Joshi, J. Li, and J. Z. Wang. 2008. “Image Retrieval: Ideas, Influences, and Trends of the New Age.” ACM Computing Surveys 40 (2): 1–60. https://doi.org/10.1145/1348246.1348248.
Web of Science ®Google Scholar
Faghri, F., D. J. Fleet, J. R. Kiros, and S. Fidler. 2017. “VSE++: Improving Visual-Semantic Embeddings with Hard Negatives.” arXiv:1707.05612 [cs].
Google Scholar
Gu, J., J. Cai, S. Joty, L. Niu, and G. Wang. 2018. “Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models.” Paper presented at 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, USA. 7181–7189.
Google Scholar
He, K., X. Zhang, S. Ren, and J. Sun. 2016. “Deep Residual Learning for Image Recognition.” Paper presented at 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA. 770–778.
Google Scholar
Hotelling, H. 1936. “Relations Between Two Sets of Variates.” Biometrika 28:321–377. https://doi.org/10.1093/biomet/28.3-4.321.
Web of Science ®Google Scholar
Hoxha, G., F. Melgani, and B. Demir. 2020. “Toward Remote Sensing Image Retrieval Under a Deep Image Captioning Perspective.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13:4462–4475. https://doi.org/10.1109/JSTARS.2020.3013818.
Web of Science ®Google Scholar
Kaur, P., H. S. Pannu, and A. K. Malhi. 2021. “Comparative Analysis on Cross-Modal Information Retrieval: A Review.” Computer Science Review 39 (2): 100336. https://doi.org/10.1016/j.cosrev.2020.100336.
Google Scholar
Kingma, D., and J. Ba. 2015. “Adam: A Method for Stochastic Optimization.“ Paper presented at 2015 International Conference for Learning Representations, San Diego, CA, USA. doi 10.48550/arXiv.1412.6980.
Google Scholar
Kiros, R., R. Salakhutdinov, and R. S. Zemel. 2014. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models. Computer Science. https://arxiv.org/pdf/1411.2539.
Google Scholar
Lee, K., X. Chen, G. Hua, H. Hu, and X. He. 2018. “Stacked Cross Attention for Image-Text Matching.” Paper presented at 2018 European Conference on Computer Vision, Munich, Germany. 11208: 212–228.
Google Scholar
Le, Q., and T. Mikolov. 2014. “Distributed Representations of Sentences and Documents.” Paper presented at 2014 International Conference on Machine Learning JMLR, Detroit, MI USA. 32.
Google Scholar
Li, Y., Y. Zhang, C. Tao, and H. Zhu. 2016. “Content-Based High-Resolution Remote Sensing Image Retrieval via Unsupervised Feature Learning and Collaborative Affinity Metric Fusion.” Remote Sensing 8 (9). https://doi.org/10.3390/rs8090709.
Google Scholar
Lowe, D. G. 1999. “Object Recognition from Local Scale-Invariant Features.” Paper presented at 1999 IEEE Conference on Computer Vision, Kerkyra, Greece. 1150–1157.
Google Scholar
Lu, X., B. Wang, X. Zheng, and X. Li. 2018. “Exploring Models and Data for Remote Sensing Image Caption Generation.” IEEE Transactions Geoscience Remote Sensing 56 (4): 2183–2195. https://doi.org/10.1109/TGRS.2017.2776321.
Web of Science ®Google Scholar
Pereira, J. C., E. Coviello, G. Doyle, N. Rasiwasia, G. R. G. Lanckriet, R. Levy, and N. Vasconcelos. 2014. “On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval.” IEEE Transactions on Pattern Analysis and Machine Intelligence 36 (3): 521–535. https://doi.org/10.1109/TPAMI.2013.142.
PubMed Web of Science ®Google Scholar
Qu, B., X. Li, D. Tao, and X. Lu. 2016. “Deep Semantic Understanding of High Resolution Remote Sensing Image.” Paper presented at 2016 International Conference on Computer, Information Telecommunication Systems, Kunming, Yunnan, China. 1–5.
Google Scholar
Rasiwasia, N., J. C. Pereira, and E. Coviello. 2010. “A New Approach to Cross-Modal Multimedia Retrieval.” Paper presented at 2010 18th International Conference on Multimedia, 251–260, Firenze, Italy.
Google Scholar
Simonyan, K., and A. Zisserman. 2015. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” Paper presented at 2015 International Conference on Learning Representations, San Diego, CA, USA.
Google Scholar
Socher, R., A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng. 2014. “Grounded Compositional Semantics for Finding and Describing Images with Sentences.” Transactions of the Association for Computational Linguistics 2:207–218. https://doi.org/10.1162/tacl_a_00177.
Google Scholar
Vendrov, I., R. Kiros, S. Fidler, and R. Urtasun. 2016. “Order Embeddings of Images and Language.” Paper presented at 2016 International Conference on Learning Representations, San Juan, Puerto Rico.
Google Scholar
Wang, X., N. Chen, Z. Chen, X. Yang, and J. Li. 2016. “Earth Observation Metadata Ontology Model for Spatiotemporal-Spectral Semantic-Enhanced Satellite Observation Discovery: A Case Study of Soil Moisture Monitoring.” GIscience & Remote Sensing 53 (1): 22–44. https://doi.org/10.1080/15481603.2015.1092490.
Web of Science ®Google Scholar
Wang, Y., J. Ma, M. Li, X. Tang, X. Han, and L. Jiao. 2022. “Multi-Scale Interactive Transformer for Remote Sensing Cross-Modal Image-Text Retrieval.” Paper presented at 2022 IEEE International Geoscience and Remote Sensing Symposium, 839–842. https://doi.org/10.1109/IGARSS46834.2022.9883252.
Google Scholar
Wang, Y., H. Yang, X. Qian, L. Ma, J. Lu, B. Li, and X. Fan. 2019. “Position Focused Attention Network for Image-Text Matching.” Paper presented at 2019 International Joint Conference Artificial Intelligence, Macau SAR, China. 3792–3798.
Google Scholar
Wolfmuller, M., D. Dietrich, E. Sireteanu, S. Kiemle, E. Mikusch, and M. Bottcher. 2009. “Dataﬂow and Workﬂow Organization-The Data Management for the TerraSAR-X Payload Ground Segment.” IEEE Transactions on Geoscience and Remote Sensing: A Publication of the IEEE Geoscience and Remote Sensing Society 47 (1): 44–50. https://doi.org/10.1109/TGRS.2008.2003074.
Web of Science ®Google Scholar
Wu, Y., S. Wang, and Q. Huang. 2018. “Learning Semantic Structure-Preserved Embeddings for Cross-Modal Retrieval.” Paper presented at 2018 ACM International Conference on Multimedia, Seoul, Republic of Korea. 825–833.
Google Scholar
Zhen, L., P. Hu, X. Wang, and D. Peng. 2019. “Deep Supervised Cross-Modal Retrieval.” Paper presented at 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA. 10386–10395.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Cross-modal retrieval of remote sensing images and text based on self-attention unsupervised deep common feature space

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Cross-modal retrieval of remote sensing images and text based on self-attention unsupervised deep common feature space

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date