Search in:

Advanced search

International Journal of General Systems Volume 40, 2011 - Issue 4: Massive Data Processing by Using Machine Learning

Submit an article Journal homepage

153

Views

CrossRef citations to date

Altmetric

Articles

Real-time speech-driven animation of expressive talking faces

Jia Liu College of Computer Science, Zhejiang University, Hangzhou, 310027, PR China

Mingyu You Department of Control Science and Engineering, Tongji University, Shanghai, 201804, PR China

Chun Chen College of Computer Science, Zhejiang University, Hangzhou, 310027, PR China

Mingli Song College of Computer Science, Zhejiang University, Hangzhou, 310027, PR ChinaCorrespondence[email protected]

Pages 439-455 | Received 05 Feb 2009, Accepted 12 Apr 2009, Published online: 10 Mar 2011

Cite this article
https://doi.org/10.1080/03081079.2010.544896

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Aleksic , P.S. and Katsaggelos , A.K. 2004 . Speech-to-video synthesis using MPEG-4 compliant visual features . IEEE transactions on circuits and systems for video technology , 14 ( 5 ) : 682 – 692 .
Web of Science ®Google Scholar
Amir , N. and Ron , S. 1998 . “ Towards an automatic classification of emotion in speech ” . In Proceedings of the international conference on spoken language processing 225 – 228 . Sydney, Australia
Google Scholar
Beier , T. and Neely , S. 1992 . Feature-based image metamorphosis . Computer graphics , 26 ( 2 ) : 35 – 42 .
Google Scholar
Bellman , R. 1961 . Adaptive control processes: a guided tour , Princeton, NJ : Princeton University Press .
Google Scholar
Bezooijen , R.V. 1984 . The characteristics and recognizability of vocal expression of emotions , Drodrecht : Foris .
Google Scholar
Brand , M. 1999 . “ Voice puppetry ” . In Proceedings of ACM SIGGRAPH 21 – 28 . Los Angeles, CA,
Google Scholar
Bregler , C. , Covell , M. and Slaney , M. 1997 . “ Video rewrite: driving visual speech with audio ” . In Proceedings of ACM SIGGRAPH 353 – 360 . Los Angeles, CA,
Google Scholar
Chang , Y.J. , Heish , C.K. , Hsu , P.W. and Chen , Y.C. 2003 . “ Speech-assisted facial expression analysis and synthesis for virtual conferencing systems ” . In Proceedings of the International conference on multimedia and expo Vol. 3 , 529 – 532 . Baltimore, MA
Google Scholar
Chibelushi , C.C. , Deravi , F. and Mason , J.S.D. 2002 . A review of speech based bimodal recognition . IEEE transactions on multimedia , 4 : 23 – 27 .
Web of Science ®Google Scholar
Choi , K. , Luo , Y. and Hwang , J. 2001 . Hidden Markov model inversion for audio-to-visual conversion in an MPEG-4 facial animation system . Journal of VLSI signal processing , 29 ( 1 ) : 51 – 61 .
Google Scholar
Cootes , T.F. , Taylor , C.J. , Cooper , D.H. and Graham , J. 1995 . Active shape models-their training and application . Computer vision and image understanding , 61 ( 1 ) : 38 – 59 .
Web of Science ®Google Scholar
Cosatto , E. and Graf , H.P. 2000 . Photo-realistic talking heads from image samples . IEEE transactions on multimedia , 2 ( 3 ) : 152 – 163 .
Web of Science ®Google Scholar
Cowie , R. , Cowie , E.D. , Tsapatsoulis , N. , Votsis , G. , Kollias , S. , Fellenz , W. and Taylor , J.G. 2001 . Emotion recognition in human–computer interaction . IEEE signal processing magazine , 18 ( 1 ) : 32 – 80 .
Web of Science ®Google Scholar
Cox , T. and Cox , M. 2001 . Multidimensional scaling , 2nd ed. , Boca Raton, FL : Chapman and Hall .
Google Scholar
Davies , S.B. and Mermelstein , P. 1980 . Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences . IEEE transactions on acoustics, speech and signal processing , 28 ( 4 ) : 357 – 366 .
Google Scholar
Dellaert , F. , Polzin , T. and Waibel , A. 1996 . Recognizing emotion in speech . Proceedings of the international conference on spoken language processing , : 1970 – 1973 . Philadelphia, PA
Google Scholar
Dupont , S. and Luettin , J. 2000 . Audio-visual speech modeling for continuous speech recognition . IEEE transactions on multimedia , 2 : 141 – 151 .
Web of Science ®Google Scholar
Ezzat , T. , Geiger , G. and Poggio , T. 2002 . Trainable videorealistic speech animation . ACM transactions on graphics , 21 ( 3 ) : 388 – 397 .
Web of Science ®Google Scholar
Freund , Y. and Schapire , R.E. 1996 . “ Experiments with a new boosting algorithm ” . In Proceedings of the international conference on machine learning 148 – 156 . Bari, Italy,
Google Scholar
Gong , Y. 1995 . Speech recognition in noisy environments: a survey . Speech communication , 16 ( 3 ) : 261 – 291 .
Web of Science ®Google Scholar
Guenter , B. , Grimm , C. , Wood , D. , Malvar , H. and Pighin , F. 1998 . “ Making faces ” . In Proceedings of ACM SIGGRAPH 55 – 66 . Chicago, IL
Google Scholar
Gutierrez-Osuna , R. , Kakumanu , P.K. , Esposito , A. , Garcia , O.N. , Bojorquez , A. , Castillo , J.L. and Rudomin , I. 2005 . Speech-driven facial animation with realistic dynamics . IEEE transactions on multimedia , 7 ( 1 ) : 33 – 42 .
Web of Science ®Google Scholar
Hermansky , H. 1990 . Perceptual linear predictive (PLP) analysis of speech . Journal of the acoustic society of America , 87 ( 4 ) : 1738 – 1752 .
PubMed Web of Science ®Google Scholar
Hong , P. , Wen , Z. and Huang , T.S. 2001 . Iface: a 3d synthetic talking face . International journal of image graphics , 1 ( 1 ) : 19 – 26 .
Google Scholar
Hong , P. , Wen , Z. and Huang , T.S. 2002 . Real-time speech-driven face animation with expressions using neural networks . IEEE transactions on neural networks , 13 ( 4 ) : 916 – 927 .
PubMed Web of Science ®Google Scholar
Itakura , F. 1975 . Line spectrum representation of linear prediction coefficients of speech signal . Journal of the acoustic society of America , 57 : 535a
Web of Science ®Google Scholar
Kaehler , K. , Haber , J. , Yamauchi , H. and Seidel , H. 2002 . “ Head shop: generating animated head models with anatomical structure ” . In Proceedings of ACM SIGGRAPH 55 – 63 . San Antonio, TX
Google Scholar
Kakumanu , P. , Esposito , A. , Garcia , O.N. and Gutierrez-Osuna , R. 2006 . A comparison of acoustic coding models for speech-driven facial animation . Speech communication , 48 ( 6 ) : 598 – 615 .
Web of Science ®Google Scholar
Kwon , O.W. , Chan , K.L. , Hao , J. and Lee , T.W. 2003 . “ Emotion recognition by speech signals ” . In Proc. European conf. speech communication and technology Vol. 1 , 125 – 128 .
Google Scholar
Lavagetto , F. 1995 . Converting speech into lip movements: a multimedia telephone for hard of hearing people . IEEE transactions on rehabilitation engineering , 3 ( 1 ) : 90 – 102 .
Google Scholar
Lee , Y. , Terzopoulos , D. and Waters , K. 1995 . “ Realistic modeling for facial animation ” . In Proceedings of ACM SIGGRAPH 55 – 62 . Anaheim, CA
Google Scholar
Li , Y. , Yu , F. , Xu , Y. , Chang , E. and Shum , H. 2001 . “ Speech-driven cartoon animation with emotions ” . In Proceedings of ACM multimedia 365 – 371 . Ottawa, ON
Google Scholar
Lippman , R. 1997 . Speech recognition by machines and humans . Speech communication , 22 ( 1 ) : 1 – 15 .
Web of Science ®Google Scholar
Massaro , D.W. 1998 . Perceiving talking faces: from speech perception to a behavioral principle , Cambridge, MA : MIT Press .
Google Scholar
Massaro , D.W. , Beskow , J. , Cohen , M.M. , Fry , C.L. and Rodriguez , T. 1999 . “ Picture my voice: audio to visual speech synthesis using artificial neural networks ” . In Proceedings of auditory-visual speech processing 133 – 138 . Santa Cruz, CA,
Google Scholar
Mehrabian , A. 1968 . Communication without words . Psychology today , 2 ( 4 ) : 53 – 56 .
Google Scholar
Murray , I. and Arnott , J. 1993 . Toward a simulation of emotion in synthetic speech: a review of the literature on human vocal emotion . Journal of the acoustic society of America , 93 ( 2 ) : 1097 – 1108 .
PubMed Web of Science ®Google Scholar
Nicholson , J. , Takabashi , K. and Nakatsu , R. 2000 . Emotion recognition in speech using neural network . Neural computing and applications , 9 : 290 – 296 .
Web of Science ®Google Scholar
Ostermann , J. and Weissenfeld , A. 2004 . “ Talking faces technologies and applications ” . In Proceedings of the International conference on pattern recognition Vol. 3 , 826 – 833 . Cambridge, UK
Google Scholar
Parke , F.I. 1982 . A parameterized model for facial animation . IEEE computer graphics and applications , 2 ( 9 ) : 61 – 68 .
Web of Science ®Google Scholar
Picard , R.W. , Vyzas , E. and Healey , J. 2001 . Toward machine emotional intelligence: analysis of affective physiological state . IEEE transactions on pattern analysis and machine intelligence , 23 ( 10 ) : 1175 – 1191 .
Web of Science ®Google Scholar
Pighin , F. , Hecker , J. , Lischinski , D. , Szeliski , R. and Salesin , D.H. 1998 . “ Synthesizing realistic facial expressions from photographs ” . In Proceedings of ACM SIGGRAPH 75 – 84 . Chicago, IL
Google Scholar
Potamianos , G. , Neti , C. , Gravier , G. , Garg , A. and Senior , A.W. 2003 . Recent advances in the automatic recognition of audio-visual speech . Proceedings of IEEE , 91 ( 9 ) : 1306 – 1326 .
Web of Science ®Google Scholar
Quinlan , J.R. 1993 . C4.5: programs for machine learning , Morgan Kaufmann .
Google Scholar
Rabiner , L.R. and Schafer , R.W. 1978 . Digital processing of speech signals , Englewood Cliffs, NJ : Prentice-Hall .
Google Scholar
Schuller , B. , Rigoll , G. and Lang , M. 2003 . “ Hidden Markov model-based speech emotion recognition ” . In Proceedings of the international conference on acoustics, speech, and signal processing Vol. 2 , 6 – 10 . Hong Kong
Google Scholar
Scott , A.K. and Richard , E.P. 2005 . Creating speech-synchronized animation . IEEE transactions on visualization and computer graphics , 11 ( 3 ) : 341 – 352 .
PubMed Web of Science ®Google Scholar
Simons , A. and Cox , S. 1990 . “ Generation of mouth shapes for a synthetic talking head ” . In Proceedings of the institute of acoustics Vol. 12 , 475 – 482 .
Google Scholar
Tato , R.S. , Kompe , R. and Pardo , J.M. 2002 . “ Emotional space improves emotion recognition ” . In Proceedings of the international conference on spoken language processing 2029 – 2032 . Denver, CO
Google Scholar
Tenenbaum , J.B. 1998 . Mapping a manifold of perceptual observations . Advances in neural information processing systems , 10 : 682 – 687 .
Google Scholar
Tenenbaum , J.B. , de Silva , V. and Langford , J.C. 2000 . A global genmetric framework for nonlinear dimensionality reduction . Science , 290 : 2319 – 2323 .
PubMed Web of Science ®Google Scholar
Verma , A. , Subramaniam , L.V. , Rajput , N. , Neti , C. and Faruquie , T.A. 2004 . Animating expressive faces across languages . IEEE transactions on multimedia , 6 ( 6 ) : 791 – 800 .
Web of Science ®Google Scholar
Ververidis , D. , Kotropoulos , C. and Pitas , I. 2004 . “ Automatic emotional speech classification ” . In Proceedings of the international conference on acoustics, speech, and signal processing Vol. 1 , 593 – 596 . Montreal QC
Google Scholar
Waters , K. 1987 . “ A muscle model for animation three-dimensional facial expression ” . In Proceedings of ACM SIGGRAPH 17 – 24 . Salt Lake City, UT
Google Scholar
Xie , L. and Liu , Z. 2007 . Realisitc mouth-synching for speech-driven talking face using articulatory modelling . IEEE transactions on multimedia , 9 ( 3 ) : 500 – 510 .
Web of Science ®Google Scholar
Yamamoto , E. , Nakamura , S. and Shikano , K. 1998 . Lip movement synthesis from speech based on hidden Markov models . Speech communication , 26 ( 1 ) : 105 – 115 .
Web of Science ®Google Scholar
Young , S. , Kershaw , D. , Odell , J. , Ollason , D. , Valtchev , V. and Woodland , P. 2002 . The HTK book , Cambridge : Entropic Ltd .
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Real-time speech-driven animation of expressive talking faces

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Real-time speech-driven animation of expressive talking faces

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date