533
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Exploring latent weight factors and global information for food-oriented cross-modal retrieval

, , , &
Article: 2233714 | Received 04 Apr 2023, Accepted 03 Jul 2023, Published online: 28 Jul 2023

References

  • Arevalo, J., Solorio, T., Montes-y-Gómez, M., & González, F. A. (2017). Gated multimodal units for information fusion. Proceedings of the 5th international conference on learning Representations (ICLR, Workshop), Toulon, France.
  • Carvalho, M., Cadène, R., Picard, D., Soulier, L., Thome, N., & Cord, M. (2018). Cross-modal retrieval in the cooking context: Learning semantic text-image embeddings. Proceedings of the 41st international ACM SIGIR conference on research & development in information retrieval (SIGIR), Melbourne, VIC, Australia.
  • Chen, J.-J., Ngo, C.-W., Feng, F.-L., & Chua, T.-S. (2018). Deep understanding of cooking procedure for cross-modal recipe retrieval. Proceedings of the 26th ACM international conference on multimedia, Seoul, Republic of Korea.
  • Chen, Y., Zhou, D., Li, L., & Han, J.-M. (2021). Multimodal encoders for food-oriented cross-modal retrieval. Proceedings of the Web and Big data and the 5th international joint conference (APWeb-WAIM), Guangzhou, China.
  • Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., & Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. Proceedings of the 9th international conference on learning representations (ICLR), online.
  • Elsweiler, D., Trattner, C., & Harvey, M. (2017). Exploiting food choice biases for healthier recipe recommendation. Proceedings of the 40th international acm sigir conference on research and development in information retrieval (SIGIR), Shinjuku, Tokyo, Japan.
  • Fu, H., Wu, R., Liu, C., & Sun, J. (2020). Mcen: Bridging cross-modal gap between cooking recipes and dish images with latent variable model. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), Seattle, WA, USA.
  • Fukui, A., Park, D. H., Yang, D., Rohrbach, A., Darrell, T., & Rohrbach, M. (2016). Multimodal compact bilinear pooling for visual question answering and visual grounding. Proceedings of the 2016 conference on empirical methods in natural language processing (EMNLP), Austin, Texas, USA.
  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA.
  • Jin, T., Huang, S., Li, Y., & Zhang, Z. (2020). Dual low-rank multimodal fusion. Findings of the association for computational linguistics: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), Abu Dhabi, United Arab Emirates.
  • Kagaya, H., Aizawa, K., & Ogawa, M. (2014). Food detection and recognition using convolutional neural network. Proceedings of the 22nd ACM international conference on multimedia, Orlando, FL, USA.
  • Kawano, Y., & Yanai, K. (2014). Food image recognition with deep convolutional features. Proceedings of the 2014 ACM international joint conference on pervasive and ubiquitous computing: Adjunct publication, Seattle, WA, USA.
  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90. https://doi.org/10.1145/3065386
  • Kusmierczyk, T., & Nørvåg, K. (2016). Online food recipe title semantics: Combining nutrient facts and topics. Proceedings of the 25th ACM international on conference on information and knowledge management (CIKM), Indianapolis, IN, USA.
  • Lee, K.-H., He, X., Zhang, L., & Yang, L. (2018). Cleannet: Transfer learning for scalable image classifier training with label noise. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Salt Lake, UT, USA.
  • Li, J., Sun, J., Xu, X., Yu, W., & Shen, F. (2021). Cross-Modal image-recipe retrieval via intra-and inter-modality hybrid fusion. Proceedings of the 2021 international conference on multimedia retrieval, Taipei, Taiwan.
  • Li, Y., Liang, W., Peng, L., Zhang, D., Yang, C., & Li, K.-C. (2022). Predicting drug-target interactions via dual-stream graph neural network. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1–11. https://doi.org/10.1109/TCBB.2022.3204188
  • Liang, W., Li, Y., Xie, K., Zhang, D., Li, K.-C., Souri, A., & Li, K. (2022a). Spatial-temporal aware inductive graph neural network for C-ITS data recovery. IEEE Transactions on Intelligent Transportation Systems, 1–12. https://doi.org/10.1109/TITS.2022.3156266
  • Liang, W., Yang, Y., Yang, C., Hu, Y., Xie, S., Li, K.-C., & Cao, J. (2022b). PDPChain: A consortium blockchain-based privacy protection scheme for personal data. IEEE Transactions on Reliability, 1–13. https://doi.org/10.1109/TR.2022.3190932
  • Liu, C., Cao, Y., Luo, Y., Chen, G., Vokkarane, V., & Ma, Y. (2016). Deepfood: Deep learning-based food image recognition for computer-aided dietary assessment. Proceedings of the 14th international conference on smart homes and health telematics (ICOST), Wuhan, China.
  • Liu, Z., Shen, Y., Lakshminarasimhan, V. B., Liang, P. P., Zadeh, A., & Morency, L.-P. (2018). Efficient low-rank multimodal fusion with modality-specific factors. Proceedings of the 56th annual meeting of the association for computational linguistics (ACL), Melbourne, VIC, Australia.
  • Long, J., Liang, W., Li, K.-C., Wei, Y., & Marino, M. D. (2022). A regularized cross-layer ladder network for intrusion detection in industrial internet of things. IEEE Transactions on Industrial Informatics, 19(2), 1747–1755. https://doi.org/10.1109/TII.2022.3204034
  • Martinel, N., Foresti, G. L., & Micheloni, C. (2018). Wide-slice residual networks for food recognition. Proceedings of the 2018 IEEE winter conference on applications of computer vision (WACV), Lake Tahoe, NV, USA.
  • Matsuda, Y., Hoashi, H., & Yanai, K. (2012). Recognition of multiple-food images by detecting candidate regions. Proceedings of the 2012 IEEE international conference on multimedia and expo (ICME), Melbourne, VIC, Australia.
  • Min, W., Bao, B.-K., Mei, S., Zhu, Y., Rui, Y., & Jiang, S. (2017). You are what you eat: Exploring rich recipe information for cross-region food analysis. IEEE Transactions on Multimedia, 20(4), 950–964. https://doi.org/10.1109/TMM.2017.2759499
  • Peng, Y., Huang, X., & Zhao, Y. (2017). An overview of cross-media retrieval: Concepts, methodologies, benchmarks, and challenges. IEEE Transactions on Circuits and Systems for Video Technology, 28(9), 2372–2385. https://doi.org/10.1109/TCSVT.2017.2705068
  • Pham, H. X., Guerrero, R., Pavlovic, V., & Li, J. (2021). CHEF: Cross-modal hierarchical embeddings for food domain retrieval. Proceedings of the AAAI conference on artificial intelligence (AAAI), online.
  • Salvador, A., Gundogdu, E., Bazzani, L., & Donoser, M. (2021). Revamping cross-modal recipe retrieval with hierarchical transformers and self-supervised learning. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), online.
  • Salvador, A., Hynes, N., Aytar, Y., Marin, J., Ofli, F., Weber, I., & Torralba, A. (2017). Learning cross-modal embeddings for cooking recipes and food images. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, USA.
  • Sanjo, S., & Katsurai, M. (2017). Recipe popularity prediction with deep visual-semantic fusion. Proceedings of the 2017 ACM on conference on information and knowledge management (CIKM), Singapore.
  • Trattner, C., & Elsweiler, D. (2017). Investigating the healthiness of internet-sourced recipes: Implications for meal planning and recommender systems. Proceedings of the 26th international conference on world wide web (WWW), Perth, Australia.
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł, & Polosukhin, I. (2017). Attention is all you need. Proceedings of advances in neural information processing systems (NeurIPS), Long Beach, CA, USA.
  • Wang, H., Sahoo, D., Liu, C., Lim, E.-P., & Hoi, S. C. (2019). Learning cross-modal embeddings with adversarial networks for cooking recipes and food images. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), Long Beach, CA, USA.
  • Wang, H., Sahoo, D., Liu, C., Shu, K., Achananuparp, P., Lim, E.-P., & Hoi, S. C. (2022). Cross-modal food retrieval: Learning a joint embedding of food images and recipes with semantic consistency and attention mechanism. IEEE Transactions on Multimedia, 24, 2515–2525. https://doi.org/10.1109/TMM.2021.3083109
  • Xie, Z., Liu, L., Li, L., & Zhong, L. (2021). Learning joint embedding with modality alignments for cross-modal retrieval of recipes and food images. Proceedings of the 30th ACM international conference on information & knowledge management (CIKM), Queensland, Australia.
  • Xie, Z., Liu, L., Wu, Y., Li, L., & Zhong, L. (2022). Learning tfidf enhanced joint embedding for recipe-image cross-modal retrieval service. IEEE Transactions on Services Computing, 15(6), 3304–3316. https://doi.org/10.1109/TSC.2021.3098834
  • Yanai, K., & Kawano, Y. (2015). Food image recognition using deep convolutional network with pre-training and fine-tuning. Proceedings of 2015 IEEE international conference on multimedia & expo workshops (ICMEW), Turin, Italy.
  • Yuan, Z., Zhang, W., Fu, K., Li, X., Deng, C., Wang, H., & Sun, X. (2022a). Exploring a fine-grained multiscale method for cross-modal remote sensing image retrieval. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–19. https://doi.org/10.1109/TGRS.2021.3078451
  • Yuan, Z., Zhang, W., Rong, X., Li, X., Chen, J., Wang, H., Fu, K., & Sun, X. (2021). A lightweight multi-scale crossmodal text-image retrieval method in remote sensing. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–19. https://doi.org/10.1109/TGRS.2021.3124252
  • Yuan, Z., Zhang, W., Tian, C., Mao, Y., Zhou, R., Wang, H., Fu, K., & Sun, X. (2022b). MCRN: A multi-source cross-modal retrieval network for remote sensing. International Journal of Applied Earth Observation and Geoinformation, 115, 103071. https://doi.org/10.1016/j.jag.2022.103071
  • Yuan, Z., Zhang, W., Tian, C., Rong, X., Zhang, Z., Wang, H., Fu, K., & Sun, X. (2022c). Remote sensing cross-modal text-image retrieval based on global and local information. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–16. https://doi.org/10.1109/TGRS.2022.3163706
  • Zadeh, A., Chen, M., Poria, S., Cambria, E., & Morency, L.-P. (2017). Tensor fusion network for multimodal sentiment analysis. Proceedings of the 2017 conference on empirical methods in natural language processing (EMNLP), Copenhagen, Denmark.
  • Zhang, S., Hu, B., Liang, W., Li, K.-C., & Gupta, B. B. (2023). A caching-based dual K-anonymous location privacy-preserving scheme for edge computing. IEEE Internet of Things Journal, 1–14. https://doi.org/10.1109/JIOT.2023.3235707
  • Zhao, W., Zhou, D., Cao, B., Zhang, K., & Chen, J. (2023). Adversarial modality alignment network for cross-modal molecule retrieval. IEEE Transactions on Artificial Intelligence, 1–12. https://doi.org/10.1109/TAI.2023.3254518
  • Zhou, D., Peng, X., Li, L., & Han, J.-M. (2022a). Cross-lingual embeddings with auxiliary topic models. Expert Systems with Applications, 190, 116194. https://doi.org/10.1016/j.eswa.2021.116194
  • Zhou, D., Qu, W., Li, L., Tang, M., & Yang, A. (2022b). Neural topic-enhanced cross-lingual word embeddings for CLIR. Information Sciences, 608, 809–824. https://doi.org/10.1016/j.ins.2022.06.081
  • Zhu, B., Ngo, C.-W., Chen, J., & Hao, Y. (2019). R2gan: Cross-modal recipe retrieval with generative adversarial network. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), Long Beach, CA, USA.