Search in:

Advanced search

Advanced Robotics Volume 36, 2022 - Issue 5-6: Special Issue on Symbol Emergence in Robotics and Cognitive Systems (II)

Submit an article Journal homepage

2,447

Views

CrossRef citations to date

Altmetric

Survey paper

A survey of multimodal deep generative models

Masahiro SuzukiDepartment of Technology Management for Innovation, The University of Tokyo, Tokyo, JapanCorrespondence[email protected]
View further author information

Yutaka MatsuoDepartment of Technology Management for Innovation, The University of Tokyo, Tokyo, JapanView further author information

Pages 261-278 | Received 17 May 2021, Accepted 21 Nov 2021, Published online: 21 Feb 2022

Cite this article
https://doi.org/10.1080/01691864.2022.2035253
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Stein BE, Meredith MA. The merging of the senses. Cambridge (MA): The MIT Press; 1993.
Google Scholar
Baltrušaitis T, Ahuja C, Morency L-P. Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell. 2018;41(2):423–443.
PubMed Web of Science ®Google Scholar
Noda K, Arie H, Suga Y, et al. Multimodal integration learning of robot behavior using deep neural networks. Rob Auton Syst. 2014;62(6):721–736.
Web of Science ®Google Scholar
Ngiam J, Khosla A, Kim M, et al. Multimodal deep learning. In: ICML, 2011.
Google Scholar
Lahat D, Adali T, Jutten C. Multimodal data fusion: an overview of methods, challenges, and prospects. Proc IEEE. 2015;103(9):1449–1477.
Web of Science ®Google Scholar
Weiss K, Khoshgoftaar TM, Wang D. A survey of transfer learning. J Big Data. 2016;3(1):9.
Google Scholar
Srivastava N, Salakhutdinov R. Multimodal learning with deep Boltzmann machines. Advances in neural information processing systems; 2012; Nevada, USA.
Google Scholar
Blei DM, Jordan MI. Modeling annotated data. The 26th annual international ACM SIGIR conference on Research and development in informaion retrieval; 2003; Toronto, Canada.
Google Scholar
Nakamura T, Nagai T, Iwahashi N. Grounding of word meanings in multimodal concepts using LDA. IEEE/RSJ International Conference on Intelligent Robots and Systems; 2009; Missouri, USA.
Google Scholar
Kingma DP, Welling M. Auto-encoding variational Bayes. International Conference on Learning Representations; 2014; Banff, Canada.
Google Scholar
Goodfellow IJ, Pouget-Abadie J, Mirza M. Generative adversarial networks. Advances in Neural Information Processing Systems; 2014; Montreal, Canada.
Google Scholar
van den Oord A, Dieleman S, Zen H, et al. Wavenet: a generative model for raw audio; 2016. Preprint arXiv:1609.03499.
Google Scholar
Van Oord A, Kalchbrenner N, Kavukcuoglu K. Pixel Recurrent Neural Networks.International Conference on Machine Learning; 2016; New York, USA.
Google Scholar
Rezende D, Mohamed S. Variational inference with normalizing flows. International Conference on Machine Learning; 2015; Lille, France.
Google Scholar
Kingma DP, Dhariwal P. Glow: generative flow with invertible 1x1 convolutions. Advances in Neural Information Processing Systems; 2018; Montreal, Canada.
Google Scholar
Gershman S, Goodman N. Amortized inference in probabilistic reasoning. The annual meeting of the Cognitive Science Society; 2014; Quebec City, Canada.
Google Scholar
Atrey PK, Hossain MA, El Saddik A, et al. Multimodal fusion for multimedia analysis: a survey. Multimedia Syst. 2010;16(6):345–379.
Web of Science ®Google Scholar
Xu C, Tao D, Xu C. A survey on multi-view learning; 2013. Preprint arXiv:1304.5634.
Google Scholar
Sun S. A survey of multi-view machine learning. Neural Comput Appl. 2013;23(7):2031–2038.
Web of Science ®Google Scholar
Wang K, Yin Q, Wang W, et al. A comprehensive survey on cross-modal retrieval; 2016. Preprint arXiv:1607.06215.
Google Scholar
Gao J, Li P, Chen Z, et al. A survey on deep learning for multimodal data fusion. Neural Comput. 2020;32(5):829–864.
PubMed Web of Science ®Google Scholar
Zhang C, Yang Z, He X, et al. Multimodal intelligence: representation learning, information fusion, and applications. IEEE J Sel Top Signal Process. 2020;14(3):478–493.
Web of Science ®Google Scholar
Li Y, Yang M, Zhang Z. A survey of multi-view representation learning. IEEE Trans Knowl Data Eng. 2018;31(10):1863–1883.
Web of Science ®Google Scholar
Tian Y, Krishnan D, Isola P. Contrastive multiview coding; 2019. Preprint arXiv:1906.05849.
Google Scholar
Alayrac J-B, Recasens A, Schneider R, et al. Self-supervised multimodal versatile networks; 2020. Preprint arXiv:2006.16228.
Google Scholar
Tsai Y-HH, Wu Y, Salakhutdinov R, et al. Self-supervised learning from a multi-view perspective. International Conference on Learning Representations; 2021; Vienna, Austria.
Google Scholar
Sohn K, Lee H, Yan X. Learning structured output representation using deep conditional generative models. Adv Neural Inf Process Syst. 2015;28:3483–3491.
Google Scholar
Mirza M, Osindero S. Conditional generative adversarial nets; 2014. Preprint arXiv:1411.1784.
Google Scholar
Ivanovic B, Leung K, Schmerling E, et al. Multimodal deep generative models for trajectory prediction: a conditional variational autoencoder approach. IEEE Robot Autom Lett. 2020;6(2):295–302.
Web of Science ®Google Scholar
Shi X, Liu Q, Fan W, et al. Transfer learning on heterogenous feature spaces via spectral transformation. IEEE international conference on data mining; 2010; Sydney, Australia.
Google Scholar
Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng. 2009;22(10):1345–1359.
Web of Science ®Google Scholar
Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell. 2013;35(8):1798–1828.
PubMed Web of Science ®Google Scholar
Tschannen M, Bachem O, Lucic M. Recent advances in autoencoder-based representation learning; 2018. Preprint arXiv:1812.05069.
Google Scholar
Holyoak KJ. Parallel distributed processing: explorations in the microstructure of cognition. Science. 1987;236:992–997.
PubMed Web of Science ®Google Scholar
Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science. 2006;313(5786):504–507.
PubMed Web of Science ®Google Scholar
Xing EP, Yan R, Hauptmann AG. Mining associated text and images with dual-wing harmoniums; 2012. Preprint arXiv:1207.1423.
Google Scholar
Srivastava N, Salakhutdinov R. Learning representations for multimodal data with deep belief nets. In: International Conference on Machine Learning Workshop, Vol. 79; 2012. p. 3.
Google Scholar
Hinton GE, Osindero S, Teh Y-W. A fast learning algorithm for deep belief nets. Neural Comput. 2006;18(7):1527–1554.
PubMed Web of Science ®Google Scholar
Hinton GE. Deep belief networks. Scholarpedia. 2009;4(5):5947.
Google Scholar
Kim Y, Lee H, Provost EM. Deep learning for robust feature generation in audiovisual emotion recognition. IEEE International Conference on Acoustics, Speech and Signal Processing; 2013; Vancouver, Canada.
Google Scholar
Huang J, Kingsbury B. Audio-visual deep learning for noise robust speech recognition. IEEE International Conference on Acoustics, Speech and Signal Processing; 2013; Vancouver, Canada.
Google Scholar
Salakhutdinov R, Hinton G. Deep Boltzmann machines. The International Conference on Artificial Intelligence and Statistics; 2009; Florida, USA.
Google Scholar
Sohn K, Shang W, Lee H. Improved multimodal deep learning with variation of information. Adv Neural Inf Process Syst. 2014;27:2141–2149.
Google Scholar
Ouyang W, Chu X, Wang X. Multi-source deep learning for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2014. p. 2329–2336.
Google Scholar
Suk H-I, Lee S-W, Shen D, et al. Hierarchical feature representation and multimodal fusion with deep learning for ad/mci diagnosis. NeuroImage. 2014;101:569–582.
PubMed Web of Science ®Google Scholar
Pang L, Zhu S, Ngo C-W. Deep multimodal learning for affective analysis and retrieval. IEEE Trans Multimedia. 2015;17(11):2008–2020.
Web of Science ®Google Scholar
Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Machine Learning Res. 2003;3:993–1022.
Web of Science ®Google Scholar
Putthividhy D, Attias HT, Nagarajan SS. Topic regression multi-modal latent Dirichlet allocation for image annotation. IEEE Computer Society Conference on Computer Vision and Pattern Recognition; 2010; San Francisco, CA.
Google Scholar
Nakamura T, Nagai T, Iwahashi N. Bag of multimodal LDA models for concept formation. IEEE International Conference on Robotics and Automation; 2011; Shanghai, China.
Google Scholar
Araki T, Nakamura T, Nagai Twe, et al. Online learning of concepts and words using multimodal lda and hierarchical Pitman-Yor language model. The IEEE/RSJ International Conference on Intelligent Robots and Systems; 2012; Vilamoura, Algarve.
Google Scholar
Zheng Y, Zhang Y-J, Larochelle H. Topic modeling of multimodal data: an autoregressive approach. IEEE Conference on Computer Vision and Pattern Recognition; 2014; Columbus, OH.
Google Scholar
Nakamura T, Nagai T, Funakoshi K, et al. Mutual learning of an object concept and language model based on MLDA and NPYLM. IEEE/RSJ International Conference on Intelligent Robots and Systems; 2014; Chicago, IL.
Google Scholar
Higgins I, Matthey L, Pal A, et al. beta-vae: learning basic visual concepts with a constrained variational framework. ICLR; 2016.
Google Scholar
Vincent P, Larochelle H, Lajoie I, et al. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Machine Learning Res. 2010;11(12):3371–3408.
Google Scholar
Jaques N, Taylor S, Sano A, et al. Multimodal autoencoder: A deep learning approach to filling in missing sensor data and enabling better mood prediction. International Conference on Affective Computing and Intelligent Interaction; 2017; San Antonio, TX.
Google Scholar
Zhang H, Xu T, Li H, et al. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. IEEE International Conference on Computer Vision; 2017; Venice, Italy.
Google Scholar
Zhu J-Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. IEEE International Conference on Computer Vision; 2017; Venice, Italy.
Google Scholar
Isola P, Zhu J-Y, Zhou T, et al. Image-to-image translation with conditional adversarial networks. IEEE Conference on Computer Vision and Pattern Recognition; 2017; Venice, Italy.
Google Scholar
Liu M-Y, Breuel T, Kautz J. Unsupervised image-to-image translation networks; 2017. Preprint arXiv:1703.00848.
Google Scholar
Lee H-Y, Tseng H-Y, Huang J-B, et al. Diverse image-to-image translation via disentangled representations. European Conference on Computer Vision; 2018; Munich, Germany.
Google Scholar
Wu M, Goodman N. Multimodal generative models for compositional representation learning; 2019. Preprint arXiv:1912.05075.
Google Scholar
Suzuki M, Nakayama K, Matsuo Y. Improving bi-directional generation between different modalities with variational autoencoders; 2018. Preprint arXiv:1801.08702.
Google Scholar
Daunhawer I, Sutter TM, Marcinkevičs R, et al. Self-supervised disentanglement of modality-specific and shared factors improves multimodal generative models. Pattern Recognit. 2021;12544:459.
Google Scholar
Sutter TM, Daunhawer I, Vogt JE. Multimodal generative learning utilizing Jensen-Shannon-divergence; 2020. Preprint arXiv:2006.08242.
Google Scholar
Wu M, Goodman N. Multimodal generative models for scalable weakly-supervised learning. Montréal (Canada): Advances in Neural Information Processing Systems; 2018.
Google Scholar
Suzuki M, Nakayama K, Matsuo Y. Joint multimodal learning with deep generative models; 2016. Preprint arXiv:1611.01891.
Google Scholar
Vedantam R, Fischer I, Huang J, et al. Generative models of visually grounded imagination; 2017. Preprint arXiv:1705.10762.
Google Scholar
Higgins I, Sonnerat N, Matthey L, et al. Scan: learning hierarchical compositional visual concepts; 2017. Preprint arXiv:1707.03389.
Google Scholar
Yin H, Melo F, Billard A, et al. Associate latent encodings in learning from demonstrations. Association for the Advancement of Artificial Intelligence; 2017; San Francisco, CA.
Google Scholar
Tian Y, Engel J. Latent translation: crossing modalities by bridging generative models; 2019. Preprint arXiv:1902.08261.
Google Scholar
Wang W, Yan X, Lee H, et al. Deep variational canonical correlation analysis; 2016. Preprint arXiv:1610.03454.
Google Scholar
Schonfeld E, Ebrahimi S, Sinha S, et al. Generalized zero-and few-shot learning via aligned variational autoencoders. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 8247–8255.
Google Scholar
Jo DU, Lee B, Choi J, et al. Cross-modal variational auto-encoder with distributed latent spaces and associators; 2019. Preprint arXiv:1905.12867.
Google Scholar
Korthals T, Rudolph D, Leitner J, et al. Multi-modal generative models for learning epistemic active sensing. International Conference on Robotics and Automation; 2019; Montreal, Canada.
Google Scholar
Shi Y, Siddharth N, Paige B, et al. Variational mixture-of-experts autoencoders for multi-modal deep generative models. Vancouver (Canada): Advances in Neural Information Processing Systems; 2019.
Google Scholar
Yadav R, Sardana A, Namboodiri V. Bridged variational autoencoders for joint modeling of images and attributes. IEEE/CVF Winter Conference on Applications of Computer Vision; 2020; Snowmass, CO.
Google Scholar
Sutter TM, Daunhawer I, Vogt JE. Generalized multimodal ELBO. International Conference on Learning Representations; 2021; Vienna, Austria.
Google Scholar
Hsu W-N, Glass J. Disentangling by partitioning: a representation learning framework for multimodal sensory data; 2018. Preprint arXiv:1805.11264.
Google Scholar
Lee M, Pavlovic V. Private-shared disentangled multimodal vae for learning of hybrid latent representations; 2020. Preprint arXiv:2012.13024.
Google Scholar
Tsai Y-HH, Liang PP, Zadeh A, et al. Learning factorized multimodal representations; 2018. Preprint arXiv:1806.06176.
Google Scholar
Shi Y, Paige B, Torr P, et al. Relating by contrasting: a data-efficient framework for multimodal generative models. International Conference on Learning Representations; 2021; Vienna, Austria.
Google Scholar
Bonneel N, Rabin J, Peyré G, et al. Sliced and radon Wasserstein barycenters of measures. J Math Imaging Vis. 2015;51(1):22–45.
Web of Science ®Google Scholar
Hotelling H. Relations between two sets of variates. Biometrika.1936;28:321–377.
Google Scholar
Xian Y, Schiele B, Akata Z. Zero-shot learning-the good, the bad and the ugly. IEEE Conference on Computer Vision and Pattern Recognition; 2017; Honolulu, HI.
Google Scholar
Pourpanah F, Abdar M, Luo Y, et al. A review of generalized zero-shot learning methods; 2020. Preprint arXiv:2011.08641.
Google Scholar
Jo DU, Lee B, Choi J, et al. Associative variational auto-encoder with distributed latent spaces and associators. AAAI Conference on Artificial Intelligence; 2020; New York, USA.
Google Scholar
Bliss TVP, Collingridge GL. A synaptic model of memory: long-term potentiation in the hippocampus. Nature. 1993;361(6407):31–39.
PubMed Web of Science ®Google Scholar
Wang D, Cui P, Ou M, et al. Deep multimodal hashing with orthogonal regularization. .International Joint Conference on Artificial Intelligence; 2015; Buenos Aires, Argentina.
Google Scholar
Hinton GE. Training products of experts by minimizing contrastive divergence. Neural Comput. 2002;14(8):1771–1800.
PubMed Web of Science ®Google Scholar
Burda Y, Grosse R, Salakhutdinov R. Importance weighted autoencoders; 2015. Preprint arXiv:1509.00519.
Google Scholar
Bouchacourt D, Tomioka R, Nowozin S. Multi-level variational autoencoder: learning disentangled representations from grouped observations. AAAI Conference on Artificial Intelligence; 2018; New Orleans, USA.
Google Scholar
Maddison CJ, Mnih A, Teh YW. The concrete distribution: a continuous relaxation of discrete random variables. International Conference on Learning Representations; 2017; Toulon, France.
Google Scholar
Jang E, Gu S, Poole B. Categorical reparameterization with Gumbel-Softmax. International Conference on Learning Representations; 2017; Toulon, France.
Google Scholar
van den Oord A, Li Y, Vinyals O. Representation learning with contrastive predictive coding; 2018. Preprint arXiv:1807.03748.
Google Scholar
Sugiyama M, Suzuki T, Kanamori T. Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation. Ann Inst Stat Math. 2012;64(5):1009–1044.
Web of Science ®Google Scholar
Tolstikhin I, Bousquet O, Gelly S, et al. Wasserstein auto-encoders. International Conference on Learning Representations; 2018; Vancouver, BC.
Google Scholar
Dieng AB, Tran D, Ranganath R, et al. Variational inference via χ-upper bound minimization. Advances in neural information processing systems; 2017; Long Beach, CA.
Google Scholar
LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–2324.
Web of Science ®Google Scholar
Xiao H, Rasul K, Vollgraf R. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. 2017. Preprint arXiv:1708.07747
Google Scholar
Netzer Y, Wang T, Coates A, et al. Reading digits in natural images with unsupervised feature learning. NIPS Workshop on Deep Learning and Unsupervised Feature Learning; 2011; Granada, Spain.
Google Scholar
Liu Z, Luo P, Wang X, et al. Deep learning face attributes in the wild. International Conference on Computer Vision; 2015; Santiago, Chile.
Google Scholar
Welinder P, Branson S, Mita T, et al. Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001. California Institute of Technology; 2010.
Google Scholar
Wah C, Branson S, Welinder P, et al. The Caltech-UCSD birds-200-2011 dataset. Technical Report; 2011.
Google Scholar
Zimmermann C, Brox T. Learning to estimate 3d hand pose from single rgb images. IEEE International Conference on Computer Vision; 2017; Venice, Italy.
Google Scholar
Zadeh A, Zellers R, Pincus E, et al. Multimodal sentiment intensity analysis in videos: facial gestures and verbal messages. IEEE Intell Syst. 2016;31(6):82–88.
Web of Science ®Google Scholar
Sadeghi M, Alameda-Pineda X. Robust unsupervised audio-visual speech enhancement using a mixture of variational autoencoders. IEEE International Conference on Acoustics, Speech and Signal Processing; 2020; Barcelona, Spain.
Google Scholar
Sadeghi M, Leglaive S, Alameda-Pineda X, et al. Audio-visual speech enhancement using conditional variational auto-encoders. IEEE/ACM Trans Audio Speech Lang Process. 2020;28:1788–1800.
Web of Science ®Google Scholar
Bloesch M, Czarnowski J, Clark R, et al. CodeSLAM—learning a compact, optimisable representation for dense visual SLAM. IEEE Conference on Computer Vision and Pattern Recognition; 2018; Salt Lake City, UT.
Google Scholar
Zambelli M, Cully A, Demiris Y. Multimodal representation models for prediction and control from partial information. Rob Auton Syst. 2020;123:103312.
Web of Science ®Google Scholar
Metta G, Sandini G, Vernon D, et al. The iCub humanoid robot: an open platform for research in embodied cognition. Workshop on performance metrics for intelligent systems; 2008; Gaithersburg, MD.
Google Scholar
Park D, Hoshi Y, Kemp CC. A multimodal anomaly detector for robot-assisted feeding using an lstm-based variational autoencoder. IEEE Robot Autom Lett. 2018;3(3):1544–1551.
Google Scholar
Meo C, Lanillos P. Multimodal VAE active inference controller; 2021. Preprint arXiv:2103.04412.
Google Scholar
Nakamura T, Nagai T, Taniguchi T. Serket: an architecture for connecting stochastic models to realize a large-scale cognitive model. Front Neurorobot. 2018;12:25.
PubMed Web of Science ®Google Scholar
Taniguchi T, Nakamura T, Suzuki M, et al. Neuro-SERKET: development of integrative cognitive system through the composition of deep probabilistic generative models. New Generation Comput. 2020;38:23–48.
Web of Science ®Google Scholar
Baars BJ. A cognitive theory of consciousness. Cambridge: Cambridge University Press; 1988.
Google Scholar
Goyal A, Didolkar A, Lamb A, et al. Coordination among neural modules through a shared global workspace; 2021. Preprint arXiv:2103.01197.
Google Scholar
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Advances in Neural Information Processing System; 2017; Long Beach, CA.
Google Scholar
Goyal A, Lamb A, Hoffmann J, et al. Recurrent independent mechanisms. International Conference on Learning Representations; 2021; Vienna, Austria.
Google Scholar
Ha D, Schmidhuber J. Recurrent world models facilitate policy evolution. Advances in Neural Information Processing Systems; 2018; Montreal, Canada.
Google Scholar
Hafner D, Lillicrap T, Fischer I, et al. Learning latent dynamics for planning from pixels. International Conference on Machine Learning; 2019; Long Beach, CA.
Google Scholar
Hafner D, Lillicrap T, Ba J, et al. Dream to control: learning behaviors by latent imagination. International Conference on Learning Representations; 2019; New Orleans, USA.
Google Scholar
Taniguchi T, Yamakawa H, Nagai T, et al. Whole brain probabilistic generative model toward realizing cognitive architecture for developmental robots; 2021. Preprint arXiv:2103.08183.
Google Scholar
Thrun S, Mitchell TM. Lifelong robot learning. Rob Auton Syst. 1995;15(1–2):25–46.
Web of Science ®Google Scholar
Lesort T, Lomonaco V, Stoian A, et al. Continual learning for robotics: definition, framework, learning strategies, opportunities and challenges. Inform Fusion. 2020;58:52–68.
Web of Science ®Google Scholar
Bingham E, Chen JP, Jankowiak M, et al. Pyro: deep universal probabilistic programming; 2018. Preprint arXiv:1810.09538.
Google Scholar
Suzuki M, Kaneko T, Matsuo Y. Pixyz: a library for developing deep generative models; 2021. Preprint arXiv: 2107.13109.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

A survey of multimodal deep generative models

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

A survey of multimodal deep generative models

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date