Browse
We’re here to help

Find guidance on Author Services

Search
Browse
We’re here to help

Find guidance on Author Services

Home
All Journals
Advanced Robotics
List of Issues
Volume 36, Issue 5-6
A survey of multimodal deep generative m ....

Search in:

Advanced search

Advanced Robotics Volume 36, 2022 - Issue 5-6: Special Issue on Symbol Emergence in Robotics and Cognitive Systems (II)

Submit an article Journal homepage

2,447

Views

CrossRef citations to date

Altmetric

Survey paper

A survey of multimodal deep generative models

Masahiro SuzukiDepartment of Technology Management for Innovation, The University of Tokyo, Tokyo, JapanCorrespondence[email protected]
View further author information

Yutaka MatsuoDepartment of Technology Management for Innovation, The University of Tokyo, Tokyo, JapanView further author information

Pages 261-278 | Received 17 May 2021, Accepted 21 Nov 2021, Published online: 21 Feb 2022

Cite this article
https://doi.org/10.1080/01691864.2022.2035253
CrossMark

Sample our Behavioral Sciences journals, sign in here to start your access, latest two full volumes FREE to you for 14 days

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
Read this article /doi/full/10.1080/01691864.2022.2035253?needAccess=true

Abstract

Multimodal learning is a framework for building models that make predictions based on different types of modalities. Important challenges in multimodal learning are the inference of shared representations from arbitrary modalities and cross-modal generation via these representations; however, achieving this requires taking the heterogeneous nature of multimodal data into account. In recent years, deep generative models, i.e. generative models in which distributions are parameterized by deep neural networks, have attracted much attention, especially variational autoencoders, which are suitable for accomplishing the above challenges because they can consider heterogeneity and infer good representations of data. Therefore, various multimodal generative models based on variational autoencoders, called multimodal deep generative models, have been proposed in recent years. In this paper, we provide a categorized survey of studies on multimodal deep generative models.

GRAPHICAL ABSTRACT

Keywords:

Deep generative models
multimodal learning

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 $X_{k^{'}}$ means a subset of multimodal data different from $X_{k}$ .

2 In the original text, a shared representation is called a joint representation; we have changed this terminology to match that in this paper. Strictly speaking, there is a subtle difference between shared representation and joint representation: the former refers to a space shared by different modalities while the latter refers to a shared representation of the joint model (see Section 5), i.e. a space in which different modalities are fused.

3 In this paper, we use the term ‘encoder’ to refer to any form of mapping from an input space to a latent space, whether deterministic or probabilistic, and ‘inference distribution’ to refer specifically to the conditional distribution $q_{ϕ} (z | x)$ .

4 Note that the inference distributions are the same when the outputs of the encoder networks (i.e. the parameters of the inference distributions) are the same, not necessarily when the values of the training parameters of these networks are all the same.

5 The 2-Wasserstein distance is the p-Wasserstein distance of order p = 2.

6 In the original paper [Citation66], JVAE and JMVAE are referred to as JMVAE and JMVAE-kl, respectively. However, because some papers refer to JMVAE-kl as JMVAE [Citation65,Citation67], we follow them to avoid confusion in terminology.

7 Recall that deep generative models are generative models whose distributions are parameterized by deep neural networks; therefore, if these models are large, their implementation can be much more complex than the implementation of regular generative models.

Suzuki M, Nakayama K, Matsuo Y. Joint multimodal learning with deep generative models; 2016. Preprint arXiv:1611.01891.

Google Scholar

Wu M, Goodman N. Multimodal generative models for scalable weakly-supervised learning. Montréal (Canada): Advances in Neural Information Processing Systems; 2018.

Google Scholar

Vedantam R, Fischer I, Huang J, et al. Generative models of visually grounded imagination; 2017. Preprint arXiv:1705.10762.

Google Scholar

Additional information

Funding

This paper is based on results obtained from a project, JPNP16007, subsidized by the New Energy and Industrial Technology Development Organization (NEDO).

Notes on contributors

Masahiro Suzuki

Masahiro Suzuki is a project assistant professor in the Graduate School of Engineering at the University of Tokyo. He was formerly a project researcher at the University of Tokyo from 2018 to 2020. He received his PhD from the University of Tokyo in 2018 and his MS degree from Hokkaido University in 2015. His research interests are in deep generative models, multimodal learning, and transfer learning.

Yutaka Matsuo

Yutaka Matsuo is a professor at the Graduate School of Engineering, the University of Tokyo. He received his BS, MS, and PhD degrees from the University of Tokyo in 1997, 1999, and 2002. After working at the National Institute of Advanced Industrial Science and Technology (AIST) and Stanford University, he joined the faculty of University of Tokyo in 2007. He served as Editor-in-chief from 2012 to 2014, and as the chair of the ELSI committee from 2014 to 2018 at Japan Society for Artificial Intelligence (JSAI). He is the president of the Japanese Deep Learning Association (JDLA), and a member of the board of directors at SoftBank Group Corp. He is working on artificial intelligence, especially on deep learning and web mining.

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Username Password

Forgot password?

Keep me logged in (not suitable for shared devices).

You will otherwise be logged out automatically, after a limited period, and will need to log in again.

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later Item saved, go to cart

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 61.00 Add to cart

PDF download + Online access - Online Checkout

Issue Purchase

30 days online access to complete issue
Article PDFs can be downloaded
Article PDFs can be printed

USD 332.00 Add to cart

Issue Purchase - Online Checkout

* Local tax will be added as applicable

Share icon
Back to Top

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Information for

Authors
R&D professionals
Editors
Librarians
Societies

Open access

Overview
Open journals
Open Select
Dove Medical Press
F1000Research

Opportunities

Reprints and e-prints
Advertising solutions
Accelerated publication
Corporate access solutions

Help and information

Help and contact
Newsroom
All journals
Books

Keep up to date

Sign me up

Taylor and Francis Group Facebook page

Taylor and Francis Group X Twitter page

Taylor and Francis Group Linkedin page

Taylor and Francis Group Youtube page

Taylor and Francis Group Weibo page

Registered in England & Wales No. 3099067
5 Howick Place | London | SW1P 1WG

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research