Search in:

Advanced search

International Journal of Audiology Volume 57, 2018 - Issue 12

Submit an article Journal homepage

394

Views

CrossRef citations to date

Altmetric

Original Article

Speech intelligibility of virtual humans

Annelies DevesseKU Leuven, – University of Leuven, Department of Neurosciences, ExpORL, Leuven, BelgiumCorrespondence[email protected]
View further author information

Alexander DudekKU Leuven, – University of Leuven, Department of Neurosciences, ExpORL, Leuven, BelgiumView further author information

Astrid van WieringenKU Leuven, – University of Leuven, Department of Neurosciences, ExpORL, Leuven, BelgiumView further author information

Jan WoutersKU Leuven, – University of Leuven, Department of Neurosciences, ExpORL, Leuven, BelgiumView further author information

Pages 914-922 | Received 13 Nov 2017, Accepted 08 Aug 2018, Published online: 27 Sep 2018

Cite this article
https://doi.org/10.1080/14992027.2018.1511922
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Akeroyd, M. A., S. Arlinger, R. A. Bentler, A. Boothroyd, N. Dillier, W. A. Dreschler, J. P. Gagné, et al. 2015. “International Collegium of Rehabilitative Audiology (ICRA) Recommendations for the Construction of Multilingual Speech Tests: ICRA Working Group on Multilingual Speech Tests.” International Journal of Audiology 54 (sup2): 17–22. doi:10.3109/14992027.2015.1030513.
PubMed Web of Science ®Google Scholar
Annosoft Lipsync Tool. 2015. Retrieved from http://www.annosoft.com.
Google Scholar
Arnal, L. H., V. Wyart, and A. L. Giraud. 2011. “Transitions in Neural Oscillations Reflect Prediction Errors Generated in Audiovisual Speech.” Nature Neuroscience 14 (6): 797–801. doi:10.1038/nn.2810.
PubMed Web of Science ®Google Scholar
Benoît, C., and B. Le Goff. 1998. “Audio-Visual Speech Synthesis from French Text: Eight Years of Models, Designs and Evaluation at the ICP.” Speech Communication 26 (1–2): 117–129. doi:10.1016/S0167-6393(98)00045-4.
Web of Science ®Google Scholar
Bernstein, L. E., E. T. Auer, and S. Takayanagi. 2004. “Auditory Speech Detection in Noise Enhanced by Lipreading.” Speech Communication 44 (1–4): 5–18. doi:10.1016/j.specom.2004.10.011.
Web of Science ®Google Scholar
Bernstein, J. G., and K. W. Grant. 2009. “Auditory and Auditory-Visual Intelligibility of Speech in Fluctuating Maskers for Normal-Hearing and Hearing-Impaired Listeners.” The Journal of the Acoustical Society of America 125 (5): 3358–3372. doi:10.1121/1.3110132.
PubMed Web of Science ®Google Scholar
Beskow, J. 2004. “Trainable Articulatory Control Models for Visual Speech Synthesis.” International Journal of Speech Technology 7 (4): 335–349. doi:10.1023/B:IJST.0000037076.86366.8d.
Google Scholar
Blender. 2015. Retrieved from https://www.blender.org/, September 14, 2015.
Google Scholar
Boothroyd, A., and S. Nittrouer. 1988. “Mathematical Treatment of Context Effects in Phoneme and Word Recognition.” The Journal of the Acoustical Society of America 84 (1): 101–114. doi:10.1121/1.396976.
PubMed Web of Science ®Google Scholar
Brungart, D. S., B. M. Sheffield, and L. R. Kubli. 2014. “Development of a Test Battery for Evaluating Speech Perception in Complex Listening Environments.” The Journal of the Acoustical Society of America 136 (2): 777–790. doi:10.1121/1.4887440.
PubMed Web of Science ®Google Scholar
Calvert, G. A., R. Campbell, and M. J. Brammer. 2000. “Evidence from Functional Magnetic Resonance Imaging of Crossmodal Binding in the Human Heteromodal Cortex.” Current Biology 10 (11): 649–657. doi:10.1016/S0960-9822(00)00513-3.
PubMed Web of Science ®Google Scholar
Chandrasekaran, C., L. Lemus, and A. A. Ghazanfar. 2013. “Dynamic Faces Speed up the Onset of Auditory Cortical Spiking Responses during Vocal Detection.” Proceedings of the National Academy of Sciences of the United States of America 110 (48): E4668–E4677. doi:10.1073/pnas.1312518110.
PubMed Web of Science ®Google Scholar
Dey, P., S. C. Maddock, and R. Nicolson. 2010. “Evaluation of A Viseme-Driven Talking Head.” In EG UK Theory and Practice of Computer Graphics 10, ed. John Collomosse and Ian Grimstead, 139–142. The Eurographics Association.
Google Scholar
Fagel, S., G. Bailly, and F. Elisei. 2007. “Intelligibility of natural and 3D-cloned German speech,” In International Conference on Auditory-Visual Speech processing, AVSP. Hilvarenbeek, The Netherlands. The Netherlands, p. 56–61. Santa Cruz, California: s.n.
Google Scholar
Fagel, S., and C. Clemens. 2004. “An Articulation Model for Audiovisual Speech Synthesis — Determination, Adjustment, Evaluation.” Speech Communication 44 (1–4): 141–154. doi:10.1016/j.specom.2004.10.006.
Web of Science ®Google Scholar
Fisher, C. G. 1968. “Confusions among Visually Perceived Consonants.” Journal of Speech and Hearing Research 11 (4): 796–804. doi:10.1044/jshr.1104.796.
PubMedGoogle Scholar
Francart, T., A. van Wieringen, and J. Wouters. 2011. “Comparison of Fluctuating Maskers for Speech Recognition Tests.” International Journal of Audiology 50 (1): 2–13. doi:10.3109/14992027.2010.505582.
PubMed Web of Science ®Google Scholar
Geiger, G., T. Ezzat, and T. Poggio. 2003. Perceptual evaluation of video-Realistic Speech. Tech report: CBCL paper 224/AI memo 2003-003. MIT Artificial Intelligence Laboratory, Cambridge.
Google Scholar
Ghazanfar, A. A. 2009. “The Multisensory Roles for Auditory Cortex in Primate Vocal Communication.” Hearing Research 258 (1–2): 113–120. doi:10.1016/j.heares.2009.04.003.
PubMed Web of Science ®Google Scholar
Gosselin, P. A., and J. P. Gagné. 2011. “Older Adults Expend More Listening Effort than Young Adults Recognizing Audiovisual Speech in Noise.” International Journal of Audiology 50 (11): 786–792. doi:10.3109/14992027.2011.599870.
PubMed Web of Science ®Google Scholar
Grange, J. A., and J. F. Culling. 2016. “Head Orientation Benefit to Speech Intelligibility in Noise for Cochlear Implant Users and in Realistic Listening Conditions.” The Journal of the Acoustical Society of America 140 (6): 4061–4072. doi:10.1121/1.4968515.
PubMed Web of Science ®Google Scholar
Grant, K. W., and P. Seitz. 2000. “The Use of Visible Speech Cues for Improving Auditory Detection of Spoken Sentences.” The Journal of the Acoustical Society of America 108 (3 Pt 1): 1197–1208. doi:10.1121/1.1288668.
PubMed Web of Science ®Google Scholar
Grant, K. W., and P. F. Seitz. 2000. “The Recognition of Isolated Words and Words in Sentences: Individual Variability in the Use of Sentence Context.” The Journal of the Acoustical Society of America 107 (2): 1000–1011. doi:10.1121/1.428280.
PubMed Web of Science ®Google Scholar
Grant, K. W., B. E. Walden, and P. F. Seitz. 1998. “Auditory-Visual Speech Recognition by Hearing-Impaired Subjects: Consonant Recognition, Sentence Recognition, and Auditory-Visual Integration.” The Journal of the Acoustical Society of America 103 (5): 2677–2690. doi:10.1121/1.422788.
PubMed Web of Science ®Google Scholar
Jansen, S., R. Koning, J. Wouters, and A. van Wieringen. 2014. “Development and Validation of the Leuven Intelligibility Sentence Test with Male Speaker (LIST-m).” International Journal of Audiology 53 (1): 55–59. doi:10.3109/14992027.2013.839886.
PubMed Web of Science ®Google Scholar
Lawrence, M. A. 2016. ‘ez: Easy Analysis and Visualization of Factorial Experiments. R package version 4.4-0’. Retrieved from https://cran.r-project.org/package=ez, June 1, 2017.
Google Scholar
MacLeod, A., and Q. Summerfield. 1987. “Quantifying the Contribution of Vision to Speech Perception in Noise.” British Journal of Audiology 21 (2): 131–141. doi:10.3109/03005368709077786.
PubMedGoogle Scholar
MacLeod, A., and Q. Summerfield. 1990. “A Procedure for Measuring Auditory and Audiovisual Speech-Reception Thresholds for Sentences in Noise: Rationale, Evaluation, and Recommendations for Use.” British Journal of Audiology 24 (1): 29–43. doi:10.3109/03005369009077840.
PubMedGoogle Scholar
Ma, J., R. Cole, B. Pellom, W. Ward, and B. Wise. 2006. “Accurate Visible Speech Synthesis Based on Concatenating Variable Length Motion Capture Data.” IEEE Transactions on Visualization and Computer Graphics 12 (2): 266–276. doi:10.1109/TVCG.2006.18.
PubMed Web of Science ®Google Scholar
Mair, P., F. Schoenbrodt, and R. Wilcox. 2017. ‘WRS2: Wilcox robust estimation and testing’.
Google Scholar
Make Human. 2015. Retrieved from https://www.makehuman.org, October 13, 2015.
Google Scholar
Martin, L. F. A., G. M. Clark, P. M. Seligman, and Y. C. Tong. 1983. “A Lip-Reading Assessment for Profoundly Deaf Patients.” The Journal of Laryngology and Otology 97 (4): 343–350. doi:10.1017/S0022215100094214.
PubMed Web of Science ®Google Scholar
Massaro, D. W., and M. M. Cohen. 1990. “Perception of Synthesized Audible and Visible Speech.” Psychological Science 1 (1): 55–63. doi:10.1111/j.1467-9280.1990.tb00068.x.
Web of Science ®Google Scholar
Massaro, D. W., and J. Light. 2004. “Using Visible Speech to Train Perception and Production of Speech for Individuals with Hearing Loss.” Journal of Speech, Language, and Hearing Research: Jslhr 47 (2): 304–320. doi:10.1044/1092-4388(2004/025).
PubMed Web of Science ®Google Scholar
Mattheyses, W., and W. Verhelst. 2015. “Audiovisual Speech Synthesis: An Overview of the State-of-the-Art.” Speech Communication 66:182–217. doi:10.1016/j.specom.2014.11.001.
Web of Science ®Google Scholar
Ma, W. J., X. Zhou, L. A. Ross, J. J. Foxe, and L. C. Parra. 2009. “Lip-Reading Aids Word Recognition Most in Moderate Noise: A Bayesian Explanation Using High-Dimensional Feature Space.” PLoS One 4 (3): e4638. doi:10.1371/journal.pone.0004638.
PubMed Web of Science ®Google Scholar
McGurk, H., and J. MacDonald. 1976. “Hearing Lips and Seeing Voices.” Nature 264 (5588): 746–748. doi:10.1038/264746a0.
PubMed Web of Science ®Google Scholar
Nilsson, M., S. D. Soli, and J. A. Sullivan. 1994. “Development of the Hearing in Noise Test for the Measurement of Speech Reception Thresholds in Quiet and in Noise.” The Journal of the Acoustical Society of America 95 (2): 1085–1099. doi:10.1121/1.408469.
PubMed Web of Science ®Google Scholar
Okada, K., J. H. Venezia, W. Matchin, K. Saberi, and G. Hickok. 2013. “An fMRI Study of Audiovisual Speech Perception Reveals Multisensory Interactions in Auditory Cortex.” PLoS One 8 (6): e68959. doi:10.1371/journal.pone.0068959.
PubMed Web of Science ®Google Scholar
Peng, X., H. Chen, L. Wang, and H. Wang. 2018. “Evaluating a 3-D Virtual Talking Head on Pronunciation Learning.” International Journal of Human-Computer Studies 109: 26–40. doi:10.1016/j.ijhcs.2017.08.001.
Web of Science ®Google Scholar
Pichora-Fuller, M. K., S. E. Kramer, M. A. Eckert, B. Edwards, B. W. Y. Hornsby, L. E. Humes, U. Lemke, T., et al. 2016. “Hearing Impairment and Cognitive Energy: The Framework for Understanding Effortful Listening (FUEL).” Ear and Hearing 37: 5S–27S. doi:10.1097/AUD.0000000000000312.
PubMed Web of Science ®Google Scholar
Pinheiro, J., D. Bates, S. DebRoy, D. Sarkar, and R. C. Team. 2017. ‘nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-131’.
Google Scholar
Processing. 2016. Retrieved from https://processing.org, May 18, 2016.
Google Scholar
Pure data. 2015. Retrieved from https://puredata.info, November 19, 2015.
Google Scholar
R Core Team 2017. ‘R: A language and environment for statistical computing’. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.r-project.org, May 10, 2017.
Google Scholar
Rosenblum, L. D. 2008. “Speech Perception as a Multimodal Phenomenon.” Current Directions in Psychological Science 17 (6): 405–409. doi:10.1111/j.1467-8721.2008.00615.x.
PubMed Web of Science ®Google Scholar
Salvi, G., J. Beskow, S. Al Moubayed, and B. Granström. 2009. “SynFace — Speech-Driven Facial Animation for Virtual Speech-Reading Support.” EURASIP Journal on Audio, Speech, and Music Processing 2009 (1): 1. doi:10.1155/2009/191940.
Google Scholar
Schreitmüller, S., M. Frenken, L. Bentz, M. Ortmann, M. Walger, and H. Meister. 2018. “Validating a Method to Assess Lipreading, Audiovisual Gain, and Integration during Speech Reception with Cochlear-Implanted and Normal-Hearing Subjects Using a Talking Head.” Ear and Hearing 39 (3): 503–516. doi:10.1097/AUD.0000000000000502.
PubMed Web of Science ®Google Scholar
Schwartz, J. L., F. Berthommier, and C. Savariaux. 2004. “Seeing to Hear Better: evidence for Early Audio-Visual Interactions in Speech Identification.” Cognition 93 (2): B69–B78. doi:10.1016/j.cognition.2004.01.006.
PubMed Web of Science ®Google Scholar
Siciliano, C., G. Williams, J. Beskow, and A. Faulkner. 2003. “Evaluation of a Multilingual Synthetic Talking Face as a Communication Aid for the Hearing Impaired.” In International Congress of Phonetic Sciences, ICPhS. Barcelona, Spain, Dordrecht: Foris Publications, 1–4.
Google Scholar
Sifakis, E., I. Neverov, and R. Fedkiw. 2005. “Automatic Determination of Facial Muscle Activations from Sparse Motion Capture Marker Data.” ACM Transactions on Graphics 24 (3): 417–425. doi:10.1145/1073204.1073208.
Web of Science ®Google Scholar
Stevens, C. J., G. Gibert, Y. Leung, and Z. Zhang. 2013. “Evaluating a Synthetic Talking Head Using a Dual Task: Modality Effects on Speech Understanding and Cognitive Load.” International Journal of Human-Computer Studies 71 (4): 440–454. doi:10.1016/j.ijhcs.2012.12.003.
Web of Science ®Google Scholar
Sumby, W. H., and I. Pollack. 1954. “Visual Contribution to Speech Intelligibility in Noise.” The Journal of the Acoustical Society of America 26 (2): 212–215. doi:10.1121/1.1907309.
Web of Science ®Google Scholar
Tye-Murray, N., M. Sommers, B. Spehar, J. Myerson, and S. Hale. 2010. “Aging, Audiovisual Integration, and the Principle of Inverse Effectiveness.” Ear and Hearing 31 (5): 636–644. doi:10.1097/AUD.0b013e3181ddf7ff.
PubMed Web of Science ®Google Scholar
Unity. 2016. Retrieved from https://unity3d.com, April 18, 2016.
Google Scholar
van Son, N., T. M. I. Huiskamp, A. J. Bosman, and G. F. Smoorenburg. 1994. “Viseme Classifications of Dutch Consonants and Vowels.” The Journal of the Acoustical Society of America 96 (3): 1341–1355. doi:10.1121/1.411324.
Web of Science ®Google Scholar
van Wieringen, A., and J. Wouters. 2008. “LIST and LINT: sentences and Numbers for Quantifying Speech Understanding in Severely Impaired Listeners for Flanders and The Netherlands.” International Journal of Audiology 47 (6): 348–355. doi:10.1080/14992020801895144.
PubMed Web of Science ®Google Scholar
van Wieringen, A., and J. Wouters. 2015. “What Can We Expect of Normally-Developing Children Implanted at a Young Age with respect to Their Auditory, Linguistic and Cognitive Skills?” Hearing Research 322:171–1719. doi:10.1016/j.heares.2014.09.002.
PubMed Web of Science ®Google Scholar
Ward, W., R. Cole, D. Bolaños, C. Buchenroth-Martin, E. Svirsky, and T. Weston. 2013. “My Science Tutor: A Conversational Multimedia Virtual Tutor.” Journal of Educational Psychology 105 (4): 1115–1125. doi:10.1037/a0031589.
Web of Science ®Google Scholar
Williams, L. 1990. “Performance-Driven Facial Animation” Computer Graphics 24 (4): 235–242. doi:10.1145/97880.97906.
Google Scholar
Woodhouse, L., L. Hickson, and B. Dodd. 2009. “Review of Visual Speech Perception by Hearing and Hearing-Impaired People: clinical Implications.” International Journal of Language and Communication Disorders 44 (3): 253–270. doi:10.1080/13682820802090281.
PubMed Web of Science ®Google Scholar
Wouters, J., W. Damman, and A. J. Bosman. 1994. “Vlaamse Opname Van Woordenlijsten Voor Spraakaudiometrie.” Logopedie: informatiemedium Van De Vlaamse Vereniging Voor Logopedisten 7 (6): 28–34.
Google Scholar
Wright, T. M., K. A. Pelphrey, T. Allison, M. J. McKeown, and G. McCarthy. 2003. “Polysensory Interactions along Lateral Temporal Regions Evoked by Audiovisual Speech.” Cerebral Cortex 13 (10): 1034–1043. doi:10.1093/cercor/13.10.1034.
PubMed Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Speech intelligibility of virtual humans

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Speech intelligibility of virtual humans

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date