Exploring latent weight factors and global information for food-oriented cross-modal retrieval

Wenyu Zhaoa School of Computer Science and Technology, Hunan University of Science and Technology, Xiangtan, People’s Republic of ChinaView further author information

Dong Zhoub School of Information Science and Technology, Guangdong University of Foreign Studies, Guangzhou, People’s Republic of ChinaCorrespondence[email protected]
View further author information

Buqing Caoa School of Computer Science and Technology, Hunan University of Science and Technology, Xiangtan, People’s Republic of ChinaCorrespondence[email protected]
View further author information

Wei Lianga School of Computer Science and Technology, Hunan University of Science and Technology, Xiangtan, People’s Republic of ChinaView further author information

Nitin Sukhijac Department of Computer Science, Slippery Rock University of Pennsylvania, Slippery Rock, PA, USAView further author information

Article: 2233714 | Received 04 Apr 2023, Accepted 03 Jul 2023, Published online: 28 Jul 2023

Cite this article
https://doi.org/10.1080/09540091.2023.2233714
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

References

Arevalo, J., Solorio, T., Montes-y-Gómez, M., & González, F. A. (2017). Gated multimodal units for information fusion. Proceedings of the 5th international conference on learning Representations (ICLR, Workshop), Toulon, France.
Google Scholar
Carvalho, M., Cadène, R., Picard, D., Soulier, L., Thome, N., & Cord, M. (2018). Cross-modal retrieval in the cooking context: Learning semantic text-image embeddings. Proceedings of the 41st international ACM SIGIR conference on research & development in information retrieval (SIGIR), Melbourne, VIC, Australia.
Google Scholar
Chen, J.-J., Ngo, C.-W., Feng, F.-L., & Chua, T.-S. (2018). Deep understanding of cooking procedure for cross-modal recipe retrieval. Proceedings of the 26th ACM international conference on multimedia, Seoul, Republic of Korea.
Google Scholar
Chen, Y., Zhou, D., Li, L., & Han, J.-M. (2021). Multimodal encoders for food-oriented cross-modal retrieval. Proceedings of the Web and Big data and the 5th international joint conference (APWeb-WAIM), Guangzhou, China.
Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., & Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. Proceedings of the 9th international conference on learning representations (ICLR), online.
Google Scholar
Elsweiler, D., Trattner, C., & Harvey, M. (2017). Exploiting food choice biases for healthier recipe recommendation. Proceedings of the 40th international acm sigir conference on research and development in information retrieval (SIGIR), Shinjuku, Tokyo, Japan.
Google Scholar
Fu, H., Wu, R., Liu, C., & Sun, J. (2020). Mcen: Bridging cross-modal gap between cooking recipes and dish images with latent variable model. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), Seattle, WA, USA.
Google Scholar
Fukui, A., Park, D. H., Yang, D., Rohrbach, A., Darrell, T., & Rohrbach, M. (2016). Multimodal compact bilinear pooling for visual question answering and visual grounding. Proceedings of the 2016 conference on empirical methods in natural language processing (EMNLP), Austin, Texas, USA.
Google Scholar
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA.
Google Scholar
Jin, T., Huang, S., Li, Y., & Zhang, Z. (2020). Dual low-rank multimodal fusion. Findings of the association for computational linguistics: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), Abu Dhabi, United Arab Emirates.
Google Scholar
Kagaya, H., Aizawa, K., & Ogawa, M. (2014). Food detection and recognition using convolutional neural network. Proceedings of the 22nd ACM international conference on multimedia, Orlando, FL, USA.
Google Scholar
Kawano, Y., & Yanai, K. (2014). Food image recognition with deep convolutional features. Proceedings of the 2014 ACM international joint conference on pervasive and ubiquitous computing: Adjunct publication, Seattle, WA, USA.
Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90. https://doi.org/10.1145/3065386
Web of Science ®Google Scholar
Kusmierczyk, T., & Nørvåg, K. (2016). Online food recipe title semantics: Combining nutrient facts and topics. Proceedings of the 25th ACM international on conference on information and knowledge management (CIKM), Indianapolis, IN, USA.
Google Scholar
Lee, K.-H., He, X., Zhang, L., & Yang, L. (2018). Cleannet: Transfer learning for scalable image classifier training with label noise. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Salt Lake, UT, USA.
Google Scholar
Li, J., Sun, J., Xu, X., Yu, W., & Shen, F. (2021). Cross-Modal image-recipe retrieval via intra-and inter-modality hybrid fusion. Proceedings of the 2021 international conference on multimedia retrieval, Taipei, Taiwan.
Google Scholar
Li, Y., Liang, W., Peng, L., Zhang, D., Yang, C., & Li, K.-C. (2022). Predicting drug-target interactions via dual-stream graph neural network. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1–11. https://doi.org/10.1109/TCBB.2022.3204188
Google Scholar
Liang, W., Li, Y., Xie, K., Zhang, D., Li, K.-C., Souri, A., & Li, K. (2022a). Spatial-temporal aware inductive graph neural network for C-ITS data recovery. IEEE Transactions on Intelligent Transportation Systems, 1–12. https://doi.org/10.1109/TITS.2022.3156266
Web of Science ®Google Scholar
Liang, W., Yang, Y., Yang, C., Hu, Y., Xie, S., Li, K.-C., & Cao, J. (2022b). PDPChain: A consortium blockchain-based privacy protection scheme for personal data. IEEE Transactions on Reliability, 1–13. https://doi.org/10.1109/TR.2022.3190932
Web of Science ®Google Scholar
Liu, C., Cao, Y., Luo, Y., Chen, G., Vokkarane, V., & Ma, Y. (2016). Deepfood: Deep learning-based food image recognition for computer-aided dietary assessment. Proceedings of the 14th international conference on smart homes and health telematics (ICOST), Wuhan, China.
Google Scholar
Liu, Z., Shen, Y., Lakshminarasimhan, V. B., Liang, P. P., Zadeh, A., & Morency, L.-P. (2018). Efficient low-rank multimodal fusion with modality-specific factors. Proceedings of the 56th annual meeting of the association for computational linguistics (ACL), Melbourne, VIC, Australia.
Google Scholar
Long, J., Liang, W., Li, K.-C., Wei, Y., & Marino, M. D. (2022). A regularized cross-layer ladder network for intrusion detection in industrial internet of things. IEEE Transactions on Industrial Informatics, 19(2), 1747–1755. https://doi.org/10.1109/TII.2022.3204034
Web of Science ®Google Scholar
Martinel, N., Foresti, G. L., & Micheloni, C. (2018). Wide-slice residual networks for food recognition. Proceedings of the 2018 IEEE winter conference on applications of computer vision (WACV), Lake Tahoe, NV, USA.
Google Scholar
Matsuda, Y., Hoashi, H., & Yanai, K. (2012). Recognition of multiple-food images by detecting candidate regions. Proceedings of the 2012 IEEE international conference on multimedia and expo (ICME), Melbourne, VIC, Australia.
Google Scholar
Min, W., Bao, B.-K., Mei, S., Zhu, Y., Rui, Y., & Jiang, S. (2017). You are what you eat: Exploring rich recipe information for cross-region food analysis. IEEE Transactions on Multimedia, 20(4), 950–964. https://doi.org/10.1109/TMM.2017.2759499
Web of Science ®Google Scholar
Peng, Y., Huang, X., & Zhao, Y. (2017). An overview of cross-media retrieval: Concepts, methodologies, benchmarks, and challenges. IEEE Transactions on Circuits and Systems for Video Technology, 28(9), 2372–2385. https://doi.org/10.1109/TCSVT.2017.2705068
Web of Science ®Google Scholar
Pham, H. X., Guerrero, R., Pavlovic, V., & Li, J. (2021). CHEF: Cross-modal hierarchical embeddings for food domain retrieval. Proceedings of the AAAI conference on artificial intelligence (AAAI), online.
Google Scholar
Salvador, A., Gundogdu, E., Bazzani, L., & Donoser, M. (2021). Revamping cross-modal recipe retrieval with hierarchical transformers and self-supervised learning. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), online.
Google Scholar
Salvador, A., Hynes, N., Aytar, Y., Marin, J., Ofli, F., Weber, I., & Torralba, A. (2017). Learning cross-modal embeddings for cooking recipes and food images. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, USA.
Google Scholar
Sanjo, S., & Katsurai, M. (2017). Recipe popularity prediction with deep visual-semantic fusion. Proceedings of the 2017 ACM on conference on information and knowledge management (CIKM), Singapore.
Google Scholar
Trattner, C., & Elsweiler, D. (2017). Investigating the healthiness of internet-sourced recipes: Implications for meal planning and recommender systems. Proceedings of the 26th international conference on world wide web (WWW), Perth, Australia.
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł, & Polosukhin, I. (2017). Attention is all you need. Proceedings of advances in neural information processing systems (NeurIPS), Long Beach, CA, USA.
Google Scholar
Wang, H., Sahoo, D., Liu, C., Lim, E.-P., & Hoi, S. C. (2019). Learning cross-modal embeddings with adversarial networks for cooking recipes and food images. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), Long Beach, CA, USA.
Google Scholar
Wang, H., Sahoo, D., Liu, C., Shu, K., Achananuparp, P., Lim, E.-P., & Hoi, S. C. (2022). Cross-modal food retrieval: Learning a joint embedding of food images and recipes with semantic consistency and attention mechanism. IEEE Transactions on Multimedia, 24, 2515–2525. https://doi.org/10.1109/TMM.2021.3083109
Web of Science ®Google Scholar
Xie, Z., Liu, L., Li, L., & Zhong, L. (2021). Learning joint embedding with modality alignments for cross-modal retrieval of recipes and food images. Proceedings of the 30th ACM international conference on information & knowledge management (CIKM), Queensland, Australia.
Google Scholar
Xie, Z., Liu, L., Wu, Y., Li, L., & Zhong, L. (2022). Learning tfidf enhanced joint embedding for recipe-image cross-modal retrieval service. IEEE Transactions on Services Computing, 15(6), 3304–3316. https://doi.org/10.1109/TSC.2021.3098834
Web of Science ®Google Scholar
Yanai, K., & Kawano, Y. (2015). Food image recognition using deep convolutional network with pre-training and fine-tuning. Proceedings of 2015 IEEE international conference on multimedia & expo workshops (ICMEW), Turin, Italy.
Google Scholar
Yuan, Z., Zhang, W., Fu, K., Li, X., Deng, C., Wang, H., & Sun, X. (2022a). Exploring a fine-grained multiscale method for cross-modal remote sensing image retrieval. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–19. https://doi.org/10.1109/TGRS.2021.3078451
Web of Science ®Google Scholar
Yuan, Z., Zhang, W., Rong, X., Li, X., Chen, J., Wang, H., Fu, K., & Sun, X. (2021). A lightweight multi-scale crossmodal text-image retrieval method in remote sensing. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–19. https://doi.org/10.1109/TGRS.2021.3124252
Web of Science ®Google Scholar
Yuan, Z., Zhang, W., Tian, C., Mao, Y., Zhou, R., Wang, H., Fu, K., & Sun, X. (2022b). MCRN: A multi-source cross-modal retrieval network for remote sensing. International Journal of Applied Earth Observation and Geoinformation, 115, 103071. https://doi.org/10.1016/j.jag.2022.103071
Web of Science ®Google Scholar
Yuan, Z., Zhang, W., Tian, C., Rong, X., Zhang, Z., Wang, H., Fu, K., & Sun, X. (2022c). Remote sensing cross-modal text-image retrieval based on global and local information. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–16. https://doi.org/10.1109/TGRS.2022.3163706
Web of Science ®Google Scholar
Zadeh, A., Chen, M., Poria, S., Cambria, E., & Morency, L.-P. (2017). Tensor fusion network for multimodal sentiment analysis. Proceedings of the 2017 conference on empirical methods in natural language processing (EMNLP), Copenhagen, Denmark.
Google Scholar
Zhang, S., Hu, B., Liang, W., Li, K.-C., & Gupta, B. B. (2023). A caching-based dual K-anonymous location privacy-preserving scheme for edge computing. IEEE Internet of Things Journal, 1–14. https://doi.org/10.1109/JIOT.2023.3235707
Web of Science ®Google Scholar
Zhao, W., Zhou, D., Cao, B., Zhang, K., & Chen, J. (2023). Adversarial modality alignment network for cross-modal molecule retrieval. IEEE Transactions on Artificial Intelligence, 1–12. https://doi.org/10.1109/TAI.2023.3254518
Google Scholar
Zhou, D., Peng, X., Li, L., & Han, J.-M. (2022a). Cross-lingual embeddings with auxiliary topic models. Expert Systems with Applications, 190, 116194. https://doi.org/10.1016/j.eswa.2021.116194
Web of Science ®Google Scholar
Zhou, D., Qu, W., Li, L., Tang, M., & Yang, A. (2022b). Neural topic-enhanced cross-lingual word embeddings for CLIR. Information Sciences, 608, 809–824. https://doi.org/10.1016/j.ins.2022.06.081
Web of Science ®Google Scholar
Zhu, B., Ngo, C.-W., Chen, J., & Hao, Y. (2019). R2gan: Cross-modal recipe retrieval with generative adversarial network. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), Long Beach, CA, USA.
Google Scholar

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Exploring latent weight factors and global information for food-oriented cross-modal retrieval

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Exploring latent weight factors and global information for food-oriented cross-modal retrieval

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date