533
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Exploring latent weight factors and global information for food-oriented cross-modal retrieval

, , , &
Article: 2233714 | Received 04 Apr 2023, Accepted 03 Jul 2023, Published online: 28 Jul 2023
 

Abstract

Food-oriented cross-modal retrieval aims to retrieve relevant recipes given food images or vice versa. The modality semantic gap between recipes and food images (text and image modalities) is the main challenge. Though several studies are introduced to bridge this gap, they still suffer from two major limitations: 1) The simple embedding concatenation only can capture the simple interactions rather than complex interactions between different recipe components. 2) The image feature extraction based on convolutional neural networks only considers the local features and ignores the global features of an image, as well as the interactions between different extracted features. This paper proposes a novel method based on Latent Component Weight Factors and Global Information (LCWF-GI) to learn the robust recipe and image representations for food-oriented cross-modal retrieval. This proposed method integrates the textual embeddings of different recipe components into a compact embedding to represent the recipes with the latent component-specific weight factors. A transformer encoder is utilised to capture the intra-modality interactions and the importance of different extracted image features for enhanced image representations. Finally, the bi-directional triplet loss is further used to perform retrieval learning. Experimental results on the Recipe 1M dataset show that our LCWF-GI method achieves competent improvements.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Correction Statement

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

Additional information

Funding

This work was supported in part by the Hunan Provincial Natural Science Foundation of China [grant no 2022JJ30020], the Guangdong Basic and Applied Basic Research Foundation of China [grant no 2023A1515012718], the Philosophy and Social Sciences 14th Five-Year Plan Project of Guangdong Province [grant no GD23CTS03], the Scientific Research Fund of Hunan Provincial Education Department [grant no 21A0319], and the Hunan Provincial Innovation Foundation for Postgraduate [grant no CX20210986].