153
Views
3
CrossRef citations to date
0
Altmetric
Articles

Real-time speech-driven animation of expressive talking faces

, , &
Pages 439-455 | Received 05 Feb 2009, Accepted 12 Apr 2009, Published online: 10 Mar 2011

References

  • Aleksic , P.S. and Katsaggelos , A.K. 2004 . Speech-to-video synthesis using MPEG-4 compliant visual features . IEEE transactions on circuits and systems for video technology , 14 ( 5 ) : 682 – 692 .
  • Amir , N. and Ron , S. 1998 . “ Towards an automatic classification of emotion in speech ” . In Proceedings of the international conference on spoken language processing 225 – 228 . Sydney, Australia
  • Beier , T. and Neely , S. 1992 . Feature-based image metamorphosis . Computer graphics , 26 ( 2 ) : 35 – 42 .
  • Bellman , R. 1961 . Adaptive control processes: a guided tour , Princeton, NJ : Princeton University Press .
  • Bezooijen , R.V. 1984 . The characteristics and recognizability of vocal expression of emotions , Drodrecht : Foris .
  • Brand , M. 1999 . “ Voice puppetry ” . In Proceedings of ACM SIGGRAPH 21 – 28 . Los Angeles, CA,
  • Bregler , C. , Covell , M. and Slaney , M. 1997 . “ Video rewrite: driving visual speech with audio ” . In Proceedings of ACM SIGGRAPH 353 – 360 . Los Angeles, CA,
  • Chang , Y.J. , Heish , C.K. , Hsu , P.W. and Chen , Y.C. 2003 . “ Speech-assisted facial expression analysis and synthesis for virtual conferencing systems ” . In Proceedings of the International conference on multimedia and expo Vol. 3 , 529 – 532 . Baltimore, MA
  • Chibelushi , C.C. , Deravi , F. and Mason , J.S.D. 2002 . A review of speech based bimodal recognition . IEEE transactions on multimedia , 4 : 23 – 27 .
  • Choi , K. , Luo , Y. and Hwang , J. 2001 . Hidden Markov model inversion for audio-to-visual conversion in an MPEG-4 facial animation system . Journal of VLSI signal processing , 29 ( 1 ) : 51 – 61 .
  • Cootes , T.F. , Taylor , C.J. , Cooper , D.H. and Graham , J. 1995 . Active shape models-their training and application . Computer vision and image understanding , 61 ( 1 ) : 38 – 59 .
  • Cosatto , E. and Graf , H.P. 2000 . Photo-realistic talking heads from image samples . IEEE transactions on multimedia , 2 ( 3 ) : 152 – 163 .
  • Cowie , R. , Cowie , E.D. , Tsapatsoulis , N. , Votsis , G. , Kollias , S. , Fellenz , W. and Taylor , J.G. 2001 . Emotion recognition in human–computer interaction . IEEE signal processing magazine , 18 ( 1 ) : 32 – 80 .
  • Cox , T. and Cox , M. 2001 . Multidimensional scaling , 2nd ed. , Boca Raton, FL : Chapman and Hall .
  • Davies , S.B. and Mermelstein , P. 1980 . Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences . IEEE transactions on acoustics, speech and signal processing , 28 ( 4 ) : 357 – 366 .
  • Dellaert , F. , Polzin , T. and Waibel , A. 1996 . Recognizing emotion in speech . Proceedings of the international conference on spoken language processing , : 1970 – 1973 . Philadelphia, PA
  • Dupont , S. and Luettin , J. 2000 . Audio-visual speech modeling for continuous speech recognition . IEEE transactions on multimedia , 2 : 141 – 151 .
  • Ezzat , T. , Geiger , G. and Poggio , T. 2002 . Trainable videorealistic speech animation . ACM transactions on graphics , 21 ( 3 ) : 388 – 397 .
  • Freund , Y. and Schapire , R.E. 1996 . “ Experiments with a new boosting algorithm ” . In Proceedings of the international conference on machine learning 148 – 156 . Bari, Italy,
  • Gong , Y. 1995 . Speech recognition in noisy environments: a survey . Speech communication , 16 ( 3 ) : 261 – 291 .
  • Guenter , B. , Grimm , C. , Wood , D. , Malvar , H. and Pighin , F. 1998 . “ Making faces ” . In Proceedings of ACM SIGGRAPH 55 – 66 . Chicago, IL
  • Gutierrez-Osuna , R. , Kakumanu , P.K. , Esposito , A. , Garcia , O.N. , Bojorquez , A. , Castillo , J.L. and Rudomin , I. 2005 . Speech-driven facial animation with realistic dynamics . IEEE transactions on multimedia , 7 ( 1 ) : 33 – 42 .
  • Hermansky , H. 1990 . Perceptual linear predictive (PLP) analysis of speech . Journal of the acoustic society of America , 87 ( 4 ) : 1738 – 1752 .
  • Hong , P. , Wen , Z. and Huang , T.S. 2001 . Iface: a 3d synthetic talking face . International journal of image graphics , 1 ( 1 ) : 19 – 26 .
  • Hong , P. , Wen , Z. and Huang , T.S. 2002 . Real-time speech-driven face animation with expressions using neural networks . IEEE transactions on neural networks , 13 ( 4 ) : 916 – 927 .
  • Itakura , F. 1975 . Line spectrum representation of linear prediction coefficients of speech signal . Journal of the acoustic society of America , 57 : 535a
  • Kaehler , K. , Haber , J. , Yamauchi , H. and Seidel , H. 2002 . “ Head shop: generating animated head models with anatomical structure ” . In Proceedings of ACM SIGGRAPH 55 – 63 . San Antonio, TX
  • Kakumanu , P. , Esposito , A. , Garcia , O.N. and Gutierrez-Osuna , R. 2006 . A comparison of acoustic coding models for speech-driven facial animation . Speech communication , 48 ( 6 ) : 598 – 615 .
  • Kwon , O.W. , Chan , K.L. , Hao , J. and Lee , T.W. 2003 . “ Emotion recognition by speech signals ” . In Proc. European conf. speech communication and technology Vol. 1 , 125 – 128 .
  • Lavagetto , F. 1995 . Converting speech into lip movements: a multimedia telephone for hard of hearing people . IEEE transactions on rehabilitation engineering , 3 ( 1 ) : 90 – 102 .
  • Lee , Y. , Terzopoulos , D. and Waters , K. 1995 . “ Realistic modeling for facial animation ” . In Proceedings of ACM SIGGRAPH 55 – 62 . Anaheim, CA
  • Li , Y. , Yu , F. , Xu , Y. , Chang , E. and Shum , H. 2001 . “ Speech-driven cartoon animation with emotions ” . In Proceedings of ACM multimedia 365 – 371 . Ottawa, ON
  • Lippman , R. 1997 . Speech recognition by machines and humans . Speech communication , 22 ( 1 ) : 1 – 15 .
  • Massaro , D.W. 1998 . Perceiving talking faces: from speech perception to a behavioral principle , Cambridge, MA : MIT Press .
  • Massaro , D.W. , Beskow , J. , Cohen , M.M. , Fry , C.L. and Rodriguez , T. 1999 . “ Picture my voice: audio to visual speech synthesis using artificial neural networks ” . In Proceedings of auditory-visual speech processing 133 – 138 . Santa Cruz, CA,
  • Mehrabian , A. 1968 . Communication without words . Psychology today , 2 ( 4 ) : 53 – 56 .
  • Murray , I. and Arnott , J. 1993 . Toward a simulation of emotion in synthetic speech: a review of the literature on human vocal emotion . Journal of the acoustic society of America , 93 ( 2 ) : 1097 – 1108 .
  • Nicholson , J. , Takabashi , K. and Nakatsu , R. 2000 . Emotion recognition in speech using neural network . Neural computing and applications , 9 : 290 – 296 .
  • Ostermann , J. and Weissenfeld , A. 2004 . “ Talking faces technologies and applications ” . In Proceedings of the International conference on pattern recognition Vol. 3 , 826 – 833 . Cambridge, UK
  • Parke , F.I. 1982 . A parameterized model for facial animation . IEEE computer graphics and applications , 2 ( 9 ) : 61 – 68 .
  • Picard , R.W. , Vyzas , E. and Healey , J. 2001 . Toward machine emotional intelligence: analysis of affective physiological state . IEEE transactions on pattern analysis and machine intelligence , 23 ( 10 ) : 1175 – 1191 .
  • Pighin , F. , Hecker , J. , Lischinski , D. , Szeliski , R. and Salesin , D.H. 1998 . “ Synthesizing realistic facial expressions from photographs ” . In Proceedings of ACM SIGGRAPH 75 – 84 . Chicago, IL
  • Potamianos , G. , Neti , C. , Gravier , G. , Garg , A. and Senior , A.W. 2003 . Recent advances in the automatic recognition of audio-visual speech . Proceedings of IEEE , 91 ( 9 ) : 1306 – 1326 .
  • Quinlan , J.R. 1993 . C4.5: programs for machine learning , Morgan Kaufmann .
  • Rabiner , L.R. and Schafer , R.W. 1978 . Digital processing of speech signals , Englewood Cliffs, NJ : Prentice-Hall .
  • Schuller , B. , Rigoll , G. and Lang , M. 2003 . “ Hidden Markov model-based speech emotion recognition ” . In Proceedings of the international conference on acoustics, speech, and signal processing Vol. 2 , 6 – 10 . Hong Kong
  • Scott , A.K. and Richard , E.P. 2005 . Creating speech-synchronized animation . IEEE transactions on visualization and computer graphics , 11 ( 3 ) : 341 – 352 .
  • Simons , A. and Cox , S. 1990 . “ Generation of mouth shapes for a synthetic talking head ” . In Proceedings of the institute of acoustics Vol. 12 , 475 – 482 .
  • Tato , R.S. , Kompe , R. and Pardo , J.M. 2002 . “ Emotional space improves emotion recognition ” . In Proceedings of the international conference on spoken language processing 2029 – 2032 . Denver, CO
  • Tenenbaum , J.B. 1998 . Mapping a manifold of perceptual observations . Advances in neural information processing systems , 10 : 682 – 687 .
  • Tenenbaum , J.B. , de Silva , V. and Langford , J.C. 2000 . A global genmetric framework for nonlinear dimensionality reduction . Science , 290 : 2319 – 2323 .
  • Verma , A. , Subramaniam , L.V. , Rajput , N. , Neti , C. and Faruquie , T.A. 2004 . Animating expressive faces across languages . IEEE transactions on multimedia , 6 ( 6 ) : 791 – 800 .
  • Ververidis , D. , Kotropoulos , C. and Pitas , I. 2004 . “ Automatic emotional speech classification ” . In Proceedings of the international conference on acoustics, speech, and signal processing Vol. 1 , 593 – 596 . Montreal QC
  • Waters , K. 1987 . “ A muscle model for animation three-dimensional facial expression ” . In Proceedings of ACM SIGGRAPH 17 – 24 . Salt Lake City, UT
  • Xie , L. and Liu , Z. 2007 . Realisitc mouth-synching for speech-driven talking face using articulatory modelling . IEEE transactions on multimedia , 9 ( 3 ) : 500 – 510 .
  • Yamamoto , E. , Nakamura , S. and Shikano , K. 1998 . Lip movement synthesis from speech based on hidden Markov models . Speech communication , 26 ( 1 ) : 105 – 115 .
  • Young , S. , Kershaw , D. , Odell , J. , Ollason , D. , Valtchev , V. and Woodland , P. 2002 . The HTK book , Cambridge : Entropic Ltd .

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.