394
Views
10
CrossRef citations to date
0
Altmetric
Original Article

Speech intelligibility of virtual humans

, , &
Pages 914-922 | Received 13 Nov 2017, Accepted 08 Aug 2018, Published online: 27 Sep 2018

References

  • Akeroyd, M. A., S. Arlinger, R. A. Bentler, A. Boothroyd, N. Dillier, W. A. Dreschler, J. P. Gagné, et al. 2015. “International Collegium of Rehabilitative Audiology (ICRA) Recommendations for the Construction of Multilingual Speech Tests: ICRA Working Group on Multilingual Speech Tests.” International Journal of Audiology 54 (sup2): 17–22. doi:10.3109/14992027.2015.1030513.
  • Annosoft Lipsync Tool. 2015. Retrieved from http://www.annosoft.com.
  • Arnal, L. H., V. Wyart, and A. L. Giraud. 2011. “Transitions in Neural Oscillations Reflect Prediction Errors Generated in Audiovisual Speech.” Nature Neuroscience 14 (6): 797–801. doi:10.1038/nn.2810.
  • Benoît, C., and B. Le Goff. 1998. “Audio-Visual Speech Synthesis from French Text: Eight Years of Models, Designs and Evaluation at the ICP.” Speech Communication 26 (1–2): 117–129. doi:10.1016/S0167-6393(98)00045-4.
  • Bernstein, L. E., E. T. Auer, and S. Takayanagi. 2004. “Auditory Speech Detection in Noise Enhanced by Lipreading.” Speech Communication 44 (1–4): 5–18. doi:10.1016/j.specom.2004.10.011.
  • Bernstein, J. G., and K. W. Grant. 2009. “Auditory and Auditory-Visual Intelligibility of Speech in Fluctuating Maskers for Normal-Hearing and Hearing-Impaired Listeners.” The Journal of the Acoustical Society of America 125 (5): 3358–3372. doi:10.1121/1.3110132.
  • Beskow, J. 2004. “Trainable Articulatory Control Models for Visual Speech Synthesis.” International Journal of Speech Technology 7 (4): 335–349. doi:10.1023/B:IJST.0000037076.86366.8d.
  • Blender. 2015. Retrieved from https://www.blender.org/, September 14, 2015.
  • Boothroyd, A., and S. Nittrouer. 1988. “Mathematical Treatment of Context Effects in Phoneme and Word Recognition.” The Journal of the Acoustical Society of America 84 (1): 101–114. doi:10.1121/1.396976.
  • Brungart, D. S., B. M. Sheffield, and L. R. Kubli. 2014. “Development of a Test Battery for Evaluating Speech Perception in Complex Listening Environments.” The Journal of the Acoustical Society of America 136 (2): 777–790. doi:10.1121/1.4887440.
  • Calvert, G. A., R. Campbell, and M. J. Brammer. 2000. “Evidence from Functional Magnetic Resonance Imaging of Crossmodal Binding in the Human Heteromodal Cortex.” Current Biology 10 (11): 649–657. doi:10.1016/S0960-9822(00)00513-3.
  • Chandrasekaran, C., L. Lemus, and A. A. Ghazanfar. 2013. “Dynamic Faces Speed up the Onset of Auditory Cortical Spiking Responses during Vocal Detection.” Proceedings of the National Academy of Sciences of the United States of America 110 (48): E4668–E4677. doi:10.1073/pnas.1312518110.
  • Dey, P., S. C. Maddock, and R. Nicolson. 2010. “Evaluation of A Viseme-Driven Talking Head.” In EG UK Theory and Practice of Computer Graphics 10, ed. John Collomosse and Ian Grimstead, 139–142. The Eurographics Association.
  • Fagel, S., G. Bailly, and F. Elisei. 2007. “Intelligibility of natural and 3D-cloned German speech,” In International Conference on Auditory-Visual Speech processing, AVSP. Hilvarenbeek, The Netherlands. The Netherlands, p. 56–61. Santa Cruz, California: s.n.
  • Fagel, S., and C. Clemens. 2004. “An Articulation Model for Audiovisual Speech Synthesis — Determination, Adjustment, Evaluation.” Speech Communication 44 (1–4): 141–154. doi:10.1016/j.specom.2004.10.006.
  • Fisher, C. G. 1968. “Confusions among Visually Perceived Consonants.” Journal of Speech and Hearing Research 11 (4): 796–804. doi:10.1044/jshr.1104.796.
  • Francart, T., A. van Wieringen, and J. Wouters. 2011. “Comparison of Fluctuating Maskers for Speech Recognition Tests.” International Journal of Audiology 50 (1): 2–13. doi:10.3109/14992027.2010.505582.
  • Geiger, G., T. Ezzat, and T. Poggio. 2003. Perceptual evaluation of video-Realistic Speech. Tech report: CBCL paper 224/AI memo 2003-003. MIT Artificial Intelligence Laboratory, Cambridge.
  • Ghazanfar, A. A. 2009. “The Multisensory Roles for Auditory Cortex in Primate Vocal Communication.” Hearing Research 258 (1–2): 113–120. doi:10.1016/j.heares.2009.04.003.
  • Gosselin, P. A., and J. P. Gagné. 2011. “Older Adults Expend More Listening Effort than Young Adults Recognizing Audiovisual Speech in Noise.” International Journal of Audiology 50 (11): 786–792. doi:10.3109/14992027.2011.599870.
  • Grange, J. A., and J. F. Culling. 2016. “Head Orientation Benefit to Speech Intelligibility in Noise for Cochlear Implant Users and in Realistic Listening Conditions.” The Journal of the Acoustical Society of America 140 (6): 4061–4072. doi:10.1121/1.4968515.
  • Grant, K. W., and P. Seitz. 2000. “The Use of Visible Speech Cues for Improving Auditory Detection of Spoken Sentences.” The Journal of the Acoustical Society of America 108 (3 Pt 1): 1197–1208. doi:10.1121/1.1288668.
  • Grant, K. W., and P. F. Seitz. 2000. “The Recognition of Isolated Words and Words in Sentences: Individual Variability in the Use of Sentence Context.” The Journal of the Acoustical Society of America 107 (2): 1000–1011. doi:10.1121/1.428280.
  • Grant, K. W., B. E. Walden, and P. F. Seitz. 1998. “Auditory-Visual Speech Recognition by Hearing-Impaired Subjects: Consonant Recognition, Sentence Recognition, and Auditory-Visual Integration.” The Journal of the Acoustical Society of America 103 (5): 2677–2690. doi:10.1121/1.422788.
  • Jansen, S., R. Koning, J. Wouters, and A. van Wieringen. 2014. “Development and Validation of the Leuven Intelligibility Sentence Test with Male Speaker (LIST-m).” International Journal of Audiology 53 (1): 55–59. doi:10.3109/14992027.2013.839886.
  • Lawrence, M. A. 2016. ‘ez: Easy Analysis and Visualization of Factorial Experiments. R package version 4.4-0’. Retrieved from https://cran.r-project.org/package=ez, June 1, 2017.
  • MacLeod, A., and Q. Summerfield. 1987. “Quantifying the Contribution of Vision to Speech Perception in Noise.” British Journal of Audiology 21 (2): 131–141. doi:10.3109/03005368709077786.
  • MacLeod, A., and Q. Summerfield. 1990. “A Procedure for Measuring Auditory and Audiovisual Speech-Reception Thresholds for Sentences in Noise: Rationale, Evaluation, and Recommendations for Use.” British Journal of Audiology 24 (1): 29–43. doi:10.3109/03005369009077840.
  • Ma, J., R. Cole, B. Pellom, W. Ward, and B. Wise. 2006. “Accurate Visible Speech Synthesis Based on Concatenating Variable Length Motion Capture Data.” IEEE Transactions on Visualization and Computer Graphics 12 (2): 266–276. doi:10.1109/TVCG.2006.18.
  • Mair, P., F. Schoenbrodt, and R. Wilcox. 2017. ‘WRS2: Wilcox robust estimation and testing’.
  • Make Human. 2015. Retrieved from https://www.makehuman.org, October 13, 2015.
  • Martin, L. F. A., G. M. Clark, P. M. Seligman, and Y. C. Tong. 1983. “A Lip-Reading Assessment for Profoundly Deaf Patients.” The Journal of Laryngology and Otology 97 (4): 343–350. doi:10.1017/S0022215100094214.
  • Massaro, D. W., and M. M. Cohen. 1990. “Perception of Synthesized Audible and Visible Speech.” Psychological Science 1 (1): 55–63. doi:10.1111/j.1467-9280.1990.tb00068.x.
  • Massaro, D. W., and J. Light. 2004. “Using Visible Speech to Train Perception and Production of Speech for Individuals with Hearing Loss.” Journal of Speech, Language, and Hearing Research: Jslhr 47 (2): 304–320. doi:10.1044/1092-4388(2004/025).
  • Mattheyses, W., and W. Verhelst. 2015. “Audiovisual Speech Synthesis: An Overview of the State-of-the-Art.” Speech Communication 66:182–217. doi:10.1016/j.specom.2014.11.001.
  • Ma, W. J., X. Zhou, L. A. Ross, J. J. Foxe, and L. C. Parra. 2009. “Lip-Reading Aids Word Recognition Most in Moderate Noise: A Bayesian Explanation Using High-Dimensional Feature Space.” PLoS One 4 (3): e4638. doi:10.1371/journal.pone.0004638.
  • McGurk, H., and J. MacDonald. 1976. “Hearing Lips and Seeing Voices.” Nature 264 (5588): 746–748. doi:10.1038/264746a0.
  • Nilsson, M., S. D. Soli, and J. A. Sullivan. 1994. “Development of the Hearing in Noise Test for the Measurement of Speech Reception Thresholds in Quiet and in Noise.” The Journal of the Acoustical Society of America 95 (2): 1085–1099. doi:10.1121/1.408469.
  • Okada, K., J. H. Venezia, W. Matchin, K. Saberi, and G. Hickok. 2013. “An fMRI Study of Audiovisual Speech Perception Reveals Multisensory Interactions in Auditory Cortex.” PLoS One 8 (6): e68959. doi:10.1371/journal.pone.0068959.
  • Peng, X., H. Chen, L. Wang, and H. Wang. 2018. “Evaluating a 3-D Virtual Talking Head on Pronunciation Learning.” International Journal of Human-Computer Studies 109: 26–40. doi:10.1016/j.ijhcs.2017.08.001.
  • Pichora-Fuller, M. K., S. E. Kramer, M. A. Eckert, B. Edwards, B. W. Y. Hornsby, L. E. Humes, U. Lemke, T., et al. 2016. “Hearing Impairment and Cognitive Energy: The Framework for Understanding Effortful Listening (FUEL).” Ear and Hearing 37: 5S–27S. doi:10.1097/AUD.0000000000000312.
  • Pinheiro, J., D. Bates, S. DebRoy, D. Sarkar, and R. C. Team. 2017. ‘nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-131’.
  • Processing. 2016. Retrieved from https://processing.org, May 18, 2016.
  • Pure data. 2015. Retrieved from https://puredata.info, November 19, 2015.
  • R Core Team 2017. ‘R: A language and environment for statistical computing’. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.r-project.org, May 10, 2017.
  • Rosenblum, L. D. 2008. “Speech Perception as a Multimodal Phenomenon.” Current Directions in Psychological Science 17 (6): 405–409. doi:10.1111/j.1467-8721.2008.00615.x.
  • Salvi, G., J. Beskow, S. Al Moubayed, and B. Granström. 2009. “SynFace — Speech-Driven Facial Animation for Virtual Speech-Reading Support.” EURASIP Journal on Audio, Speech, and Music Processing 2009 (1): 1. doi:10.1155/2009/191940.
  • Schreitmüller, S., M. Frenken, L. Bentz, M. Ortmann, M. Walger, and H. Meister. 2018. “Validating a Method to Assess Lipreading, Audiovisual Gain, and Integration during Speech Reception with Cochlear-Implanted and Normal-Hearing Subjects Using a Talking Head.” Ear and Hearing 39 (3): 503–516. doi:10.1097/AUD.0000000000000502.
  • Schwartz, J. L., F. Berthommier, and C. Savariaux. 2004. “Seeing to Hear Better: evidence for Early Audio-Visual Interactions in Speech Identification.” Cognition 93 (2): B69–B78. doi:10.1016/j.cognition.2004.01.006.
  • Siciliano, C., G. Williams, J. Beskow, and A. Faulkner. 2003. “Evaluation of a Multilingual Synthetic Talking Face as a Communication Aid for the Hearing Impaired.” In International Congress of Phonetic Sciences, ICPhS. Barcelona, Spain, Dordrecht: Foris Publications, 1–4.
  • Sifakis, E., I. Neverov, and R. Fedkiw. 2005. “Automatic Determination of Facial Muscle Activations from Sparse Motion Capture Marker Data.” ACM Transactions on Graphics 24 (3): 417–425. doi:10.1145/1073204.1073208.
  • Stevens, C. J., G. Gibert, Y. Leung, and Z. Zhang. 2013. “Evaluating a Synthetic Talking Head Using a Dual Task: Modality Effects on Speech Understanding and Cognitive Load.” International Journal of Human-Computer Studies 71 (4): 440–454. doi:10.1016/j.ijhcs.2012.12.003.
  • Sumby, W. H., and I. Pollack. 1954. “Visual Contribution to Speech Intelligibility in Noise.” The Journal of the Acoustical Society of America 26 (2): 212–215. doi:10.1121/1.1907309.
  • Tye-Murray, N., M. Sommers, B. Spehar, J. Myerson, and S. Hale. 2010. “Aging, Audiovisual Integration, and the Principle of Inverse Effectiveness.” Ear and Hearing 31 (5): 636–644. doi:10.1097/AUD.0b013e3181ddf7ff.
  • Unity. 2016. Retrieved from https://unity3d.com, April 18, 2016.
  • van Son, N., T. M. I. Huiskamp, A. J. Bosman, and G. F. Smoorenburg. 1994. “Viseme Classifications of Dutch Consonants and Vowels.” The Journal of the Acoustical Society of America 96 (3): 1341–1355. doi:10.1121/1.411324.
  • van Wieringen, A., and J. Wouters. 2008. “LIST and LINT: sentences and Numbers for Quantifying Speech Understanding in Severely Impaired Listeners for Flanders and The Netherlands.” International Journal of Audiology 47 (6): 348–355. doi:10.1080/14992020801895144.
  • van Wieringen, A., and J. Wouters. 2015. “What Can We Expect of Normally-Developing Children Implanted at a Young Age with respect to Their Auditory, Linguistic and Cognitive Skills?” Hearing Research 322:171–1719. doi:10.1016/j.heares.2014.09.002.
  • Ward, W., R. Cole, D. Bolaños, C. Buchenroth-Martin, E. Svirsky, and T. Weston. 2013. “My Science Tutor: A Conversational Multimedia Virtual Tutor.” Journal of Educational Psychology 105 (4): 1115–1125. doi:10.1037/a0031589.
  • Williams, L. 1990. “Performance-Driven Facial Animation” Computer Graphics 24 (4): 235–242. doi:10.1145/97880.97906.
  • Woodhouse, L., L. Hickson, and B. Dodd. 2009. “Review of Visual Speech Perception by Hearing and Hearing-Impaired People: clinical Implications.” International Journal of Language and Communication Disorders 44 (3): 253–270. doi:10.1080/13682820802090281.
  • Wouters, J., W. Damman, and A. J. Bosman. 1994. “Vlaamse Opname Van Woordenlijsten Voor Spraakaudiometrie.” Logopedie: informatiemedium Van De Vlaamse Vereniging Voor Logopedisten 7 (6): 28–34.
  • Wright, T. M., K. A. Pelphrey, T. Allison, M. J. McKeown, and G. McCarthy. 2003. “Polysensory Interactions along Lateral Temporal Regions Evoked by Audiovisual Speech.” Cerebral Cortex 13 (10): 1034–1043. doi:10.1093/cercor/13.10.1034.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.