551
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Exploring latent weight factors and global information for food-oriented cross-modal retrieval

, , , &
Article: 2233714 | Received 04 Apr 2023, Accepted 03 Jul 2023, Published online: 28 Jul 2023

Figures & data

Figure 1. The overall framework of the proposed method.

Figure 1. The overall framework of the proposed method.

Table 1. The symbols and corresponding meanings in the current paper

Table 2. The comparison results of our proposed method and popular baseline methods. In this table, bold values represent the best scores.

Table 3. The food-oriented cross-modal retrieval results of this proposed method fusing different recipe components. In this table, bold values represent the best scores.

Figure 2. The retrieval results of latent weight factor with different rank values in our proposed LCWF-GI method on Image-to-Recipe and Recipe-to-Image tasks on 1 and 10 K test set. (a) The Value of Recall@1 of Our LCWF-GI method for Image-to-Recipe and Recipe-to-Image Retrieval Tasks on 1 K Test Set; (b) The Value of Recall@1 of Our LCWF-GI method for Image-to-Recipe and Recipe-to-Image Retrieval Tasks on 10 K Test Set.

Figure 2. The retrieval results of latent weight factor with different rank values in our proposed LCWF-GI method on Image-to-Recipe and Recipe-to-Image tasks on 1 and 10 K test set. (a) The Value of Recall@1 of Our LCWF-GI method for Image-to-Recipe and Recipe-to-Image Retrieval Tasks on 1 K Test Set; (b) The Value of Recall@1 of Our LCWF-GI method for Image-to-Recipe and Recipe-to-Image Retrieval Tasks on 10 K Test Set.

Figure 3. The example of Recipe-to-Image retrieval of our proposed method and HTL baseline method on 10K test set. The first column represents that the query is a piece of recipe text. The second column is the ground truth of the recipe text. The final column is the top 5 retrieval result.

Figure 3. The example of Recipe-to-Image retrieval of our proposed method and HTL baseline method on 10 K test set. The first column represents that the query is a piece of recipe text. The second column is the ground truth of the recipe text. The final column is the top 5 retrieval result.

Figure 4. The results achieved by our LCWF-GI method and the best baseline method (HTL) on two retrieval tasks on 1K test set when using the different training epochs. The solid curve represents our LCWF-GI method and the dashed curve means the HTL method. The solid and dashed curves with orange, green and blue colours represent the metrics of Recall@1, Recall@5 and Recall@10 respectively.

Figure 4. The results achieved by our LCWF-GI method and the best baseline method (HTL) on two retrieval tasks on 1 K test set when using the different training epochs. The solid curve represents our LCWF-GI method and the dashed curve means the HTL method. The solid and dashed curves with orange, green and blue colours represent the metrics of Recall@1, Recall@5 and Recall@10 respectively.

Table 4. The parameter counting and the time of each epoch of our LCWF-GI method and the best baseline method (HTL). M and s represent million and second.

Figure 5. The visualisation of attention mechanism in transformer for the cooking instruction component. (a) The self-attention visualisation of a sentence in the cooking instruction component; (b) The self-attention visualisation of sentence a and the next sentence in the cooking instruction component.

Figure 5. The visualisation of attention mechanism in transformer for the cooking instruction component. (a) The self-attention visualisation of a sentence in the cooking instruction component; (b) The self-attention visualisation of sentence a and the next sentence in the cooking instruction component.