98
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Cross-modal retrieval of remote sensing images and text based on self-attention unsupervised deep common feature space

, , &
Pages 3892-3909 | Received 27 Nov 2022, Accepted 08 Jun 2023, Published online: 12 Jul 2023
 

ABSTRACT

With the increase in multimodal data, such as remote sensing images, text, and video, cross-modal retrieval is widely applied in many fields. Because different modality data are heterogeneous, the main aim of cross-modal retrieval is to bridge their ‘semantic gap’. However, there are few cross-modal retrieval studies focused on remote sensing images and text, and fine-grained information from the remote sensing images and text are not fully considered by existing methods. To improve the retrieval accuracy of remote sensing images and text, we propose a self-attention unsupervised deep common feature space cross-modal retrieval method. First, a self-attention mechanism is applied to capture the fine-grained relationships between word fragments of text and remote sensing images. Then, a deep learning method and the triplet loss of remote sensing image and text pairs are used to extract the features of remote sensing images and text, and generate a semantically consistent common feature space, in which the feature representations of remote sensing images and text exhibit a uniform distribution. The experimental results of three benchmark cross-modal datasets, including RSICD, UCM_captions, and Sydney_captions, show that the proposed method performs better than the state-of-the-art methods. For example, when compared with the best model (VSE++) tested on RSICD, the proposed model demonstrated 2.1% and 5.6% higher precision in text to image and image to text retrieval tasks, respectively, and 3.8% higher mean average precision. However, similar to the other models tested, we found that the proposed model had a weaker performance when retrieving text from an image query than retrieving images from a text query.

Disclosure statement

No potential conflict of interest was reported by the authors.

Data Availability statement

Data are available at https://github.com/201528014227051/RSICD_optimal.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.