241
Views
0
CrossRef citations to date
0
Altmetric
Research Articles

A large-scale comparison of two voice synthesis techniques on intelligibility, naturalness, preferences, and attitudes toward voices banked by individuals with amyotrophic lateral sclerosis

ORCID Icon, ORCID Icon, ORCID Icon, , & ORCID Icon
Pages 31-45 | Received 05 Sep 2022, Accepted 12 Sep 2023, Published online: 04 Oct 2023

References

  • Acapela Group. (2022). My-own-voice. https://mov.acapela-group.com/
  • ALS Association. (2020). FYI: A guide to voice banking services. https://www.als.org/navigating-als/resources/fyi-guide-voice-banking-services
  • ALS Association. (2021). A look back at over $16 million in research grants awarded during 2018. http://web.alsa.org/site/PageNavigator/blog_050319.html
  • ALS Association. (2022). Understanding ALS. https://www.als.org/understanding-als
  • Aylett, M., Vinciarelli, A., & Wester, M. (2020). Speech synthesis for the generation of artificial personality. IEEE Transactions on Affective Computing, 11(2), 361–372. doi:10.1109/TAFFC.2017.2763134
  • Baird, A., Parada-Cabaleiro, E., Hantke, S., Burkhardt, F., Cummins, N., & Schuller, B. (2018). The perception and analysis of the likeability and human likeness of synthesized speech [Paper presentation]. Proceedings of INTERSPEECH, 2863–2867. doi:10.21437/Interspeech.2018-1093
  • Ball, L., Beukelman, D., & Pattee, G. (2004). Acceptance of augmentative and alternative communication technology by persons with amyotrophic lateral sclerosis. Augmentative and Alternative Communication, 20(2), 113–122. doi:10.1080/0743461042000216596
  • Benoit, C., Grice, M., & Hazan, V. (1996). The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences. Speech Communication, 18(4), 381–392. doi:10.1016/0167-6393(96)00026-X
  • Beukelman, D., Ball, L., & Pattee, G. (2004). Intervention decision-making for persons with amyotrophic lateral sclerosis. The ASHA Leader, 9(22), 4–5. doi:10.1044/leader.FTR2.09222004.4
  • Black, A., Zen, H., Tokuda, K. (2007). Statistical parametric speech synthesis. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.154.9874&rep=rep1&type=pdf
  • Borrie, S. A., McAuliffe, M. J., Liss, J. M., Kirk, C., O'Beirne, G. A., & Anderson, T. (2012). Familiarisation conditions and the mechanisms that underlie improved recognition of dysarthric speech. Language and Cognitive Processes, 27(7–8), 1039–1055. doi:10.1080/01690965.2011.610596
  • Bunnell, H. T. (2010). Crafting small databases for unit selection TTS: Effects on intelligibility. Proceedings of the 7th ISCA Speech Synthesis Workshop, 40–44. https://www.isca-speech.org/archive_v0/ssw7/papers/ssw7_040.pdf
  • Bunnell, H. T., & Lilley, J. (2007). Analysis methods for assessing TTS intelligibility [Paper presentation]. 6th ISCA Workshop on Speech Synthesis. https://www.isca-speech.org/archive_open/archive_papers/ssw6/ssw6_374.pdf
  • Bunnell, H. T., Lilley, J., & McGrath, K. (2017). The ModelTalker project: A web-based voice banking pipeline for ALS/MND patients. Interspeech.
  • Bunnell, H. T., & Pennington, C. (2010). Advances in computer speech synthesis and implications for assistive technology. In J. Mullennix & S. Stern (Eds.), Computer synthesized speech technologies: Tools for aiding impairment (pp. 71–91). IGI Global. doi:10.4018/978-1-61520-725-1
  • Capes, T., Coles, P., Conkie, A., Golipour, L., Hadjitarkhani, A., Hu, Q., Huddleston, N., Hunt, M., Li, J., Neeracher, M., Prahallad, K., Raitio, T., Rasipuram, R., Townsend, G., Williamson, B., Winarsky, D., Wu, Z., & Zhang, H. (2017). Siri on-device deep learning-guided unit selection text-to-speech system. Proceedings of INTERSPEECH, 4011–4015. https://www.isca-speech.org/archive/interspeech_2017/capes17_interspeech.html, doi:10.21437/INTERSPEECH.2017-1798
  • CereProc Ltd. (2022). CereProc Text-to-Speech. https://www.cereproc.com/en/cerevoice-me
  • Chen, M., Hyppa-Martin, J., Bunnell, H. T., Lilley, J., Foo, C., Tan, H. W., & Lim, W. S. (2023). Voice banking to support individuals who use speech-generating devices: Development and evaluation of Singaporean-accented English synthetic voices and a Singapore Colloquial English recording inventory. Augmentative and Alternative Communication, 1–11. doi:10.1080/07434618.2023.2181213
  • Costello, J. (2009). Last Words, last Connections: How AAC can support children facing end of life. The ASHA Leader, 14(16), 8–11. doi:10.1044/leader.FTR2.14162009.8
  • Creer, S., Cunningham, S., Green, P., & Yamagishi, J. (2013). Building personalised synthetic voices for individuals with severe speech impairment. Computer Speech & Language, 27(6), 1178–1193. doi:10.1016/j.csl.2012.10.001
  • Doyle, M., & Phillips, B. (2001). Trends in augmentative and alternative communication use by individuals with amyotrophic lateral sclerosis. Augmentative and Alternative Communication, 17(3), 167–178. doi:10.1080/aac.17.3.167.178
  • Duffy, J. (2013). Motor speech disorders: Substrates, differential diagnosis, and management (3rd ed.). Elsevier Mosby.
  • Eagly, A., & Chaiken, S. (2007). The advantages of an inclusive definition of attitude. Social Cognition, 25(5), 582–602. doi:10.1521/soco.2007.25.5.582
  • Fairbanks, G. (1941). Voice and articulation drillbook. The Laryngoscope, 51(12), 1141. doi:10.1288/00005537-194112000-00007
  • Gibson, E., Pearlmutter, N., Canseco-Gonzalez, E., & Hickok, G. (1996). Recency preference in the human sentence processing mechanism. Cognition, 59(1), 23–59. doi:10.1016/0010-0277(95)00687-7
  • Gorenflo, C., & Gorenflo, D. (1991). The effects of information and augmentative communication technique on attitudes toward nonspeaking individuals. Journal of Speech and Hearing Research, 34(1), 19–26. doi:10.1044/jshr.3401.19
  • Hanson, E., Yorkston, K., & Britton, D. (2011). Dysarthria in amyotrophic lateral sclerosis: A systematic review of characteristics, speech treatment, and augmentative and alternative communication options. Journal of Medical Speech-Language Pathology, 19(3), 12–31.
  • Hecht, M., Hillemacher, T., Gräsel, E., Tigges, S., Winterholler, M., Heuss, D., Hilz, M.-J., & Neundörfer, B. (2002). Subjective experience and coping in ALS. Amyotrophic Lateral Sclerosis and Other Motor Neuron Disorders, 3(4), 225–231. doi:10.1080/146608202760839009
  • Hinterleitner, F. (2017). Quality of synthetic speech: Perceptual dimensions, influencing factors, and instrumental assessment. Springer.
  • Holmes, E., To, G., & Johnsrude, I. (2021). How long does it take for a voice to become familiar? Psychological Science, 32(6), 903–915. doi:10.1177/0956797621991137
  • Honorof, D., McCullough, J., Somerville, B. (2000). Comma gets a cure. Newcastle University. https://research.ncl.ac.uk/necte2/documents/comma.pdf
  • Hunt, A., & Black, A. (1996). Unit selection in a concatenative speech synthesis system using a large speech database [Paper presentation]. IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings. doi:10.1109/ICASSP.1996.541110
  • Hustad, K., & Cahill, M. (2003). Effects of presentation mode and repeated familiarization on intelligibility of dysarthric speech. American Journal of Speech-Language Pathology, 12(2), 198–208. doi:10.1044/1058-0360(2003/066)
  • Hyppa-Martin, J., Chen, M., Janka, E., & Halverson, N. (2021). Effect of partner reauditorization on young adults’ attitudes toward a child who communicated using nonelectronic augmentative and alternative communication. Augmentative and Alternative Communication (Baltimore, Md. 1985), 37(2), 141–153. doi:10.1080/07434618.2021.1916075
  • IEEE Subcommittee. (1969). Harvard sentences. Columbia University. https://www.cs.columbia.edu/∼hgs/audio/harvard.html
  • Kraker, D. (2018). ALS robbing them of speech, but they won’t be silenced. MPR News. https://www.mprnews.org/story/2018/08/06/voice-banking-preserves-voices-of-people-who-might-lose-ability-to-speak-als
  • Kraus, S. (1995). Attitudes and the prediction of behavior: A metaanalysis of the empirical literature. Personality and Social Psychology Bulletin, 21(1), 58–75. doi:10.1177/0146167295211007
  • Kühnlein, P., Gdynia, H., Sperfeld, A., Lindner-Pfleghar, B., Ludolph, A., Prosiegel, M., & Riecker, A. (2008). Diagnosis and treatment of bulbar symptoms in amyotrophic lateral sclerosis. Nature Clinical Practice. Neurology, 4(7), 366–374. doi:10.1038/ncpneuro0853
  • Kuligowska, K., Kisielewicz, P., & Włodarz, A. (2018). Speech synthesis systems: Disadvantages and limitations. International Journal of Engineering & Technology, 7(2.28), 234. doi:10.14419/ijet.v7i2.28.12933
  • Lansford, L. (2014).
  • Lilley, J., Hyppa-Martin, J., & Bunnell, H. T. (2020). A large-scale comparison of the intelligibility of unit-selection and deep-neural-network parametric synthetic voices generated from dysarthric speech. The Journal of the Acoustical Society of America, 148, 2582–2582. doi:10.1121/1.5147169
  • Ling, Z., Wu, Y., Wang, Y., Qin, L., Wang, R. (2006). USTC system for blizzard challenge 2006 an improved HMM-based speech synthesis method. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.130.7143&rep=rep1&type=pdf
  • Mayo, C., Clark, R., & King, S. (2011). Listeners’ weighting of acoustic cues to synthetic speech naturalness: A multidimensional scaling analysis. Speech Communication, 53(3), 311–326. doi:10.1016/j.specom.2010.10.003
  • McCarthy, J., & Light, J. (2005). Attitudes toward individuals who use augmentative and alternative communication: Research review. Augmentative and Alternative Communication, 21(1), 41–55. doi:10.1080/07434610410001699753
  • Mills, T., Bunnell, H. T., & Patel, R. (2014). Towards personalized speech synthesis for augmentative and alternative communication. Augmentative and Alternative Communication (Baltimore, Md. 1985), 30(3), 226–236. doi:10.3109/07434618.2014.924026
  • Mitchell, J. D., & Borasio, G. D. (2007). Amyotrophic lateral sclerosis. Lancet (London, England), 369(9578), 2031–2041. doi:10.1016/S0140-6736(07)60944-1
  • Morise, M., Yokomori, F., & Ozawa, K. (2016). WORLD: A vocoder-based high-quality speech synthesis system for real-time applications. IEICE Transactions on Information and Systems, E99.D(7), 1877–1884. doi:10.1587/transinf.2015EDP7457
  • Mulligan, M., Carpenter, J., Riddel, J., Delaney, M. K., Badger, G., Krusinski, P., & Tandan, R. (1994). Intelligibility and the acoustic characteristics of speech in amyotrophic lateral sclerosis (ALS). Journal of Speech and Hearing Research, 37(3), 496–503. doi:10.1044/jshr.3703.496
  • Nemours Children’s Health. (2022). ModelTalker: Creating personal voices for all. https://www.modeltalker.org/
  • Nusbaum, H., Francis, A., & Henly, A. (1995). Measuring the naturalness of synthetic speech. International Journal of Speech Technology, 1(1), 7–19. doi:10.1007/BF02277176
  • Shaver, J., Curtis, C., & Strong, C. (1989). The modification of attitudes toward persons with disabilities: Is there a best way? International Journal of Special Education, 4, 33–57.
  • Shen, J., Pang, R., Weiss, R., Schuster, M., Jaitly, N., Yang, Z., Chen, Z., Zhang, Y., Wang, Y., Skerry-Ryan, R. J., Saurous, R. A., Agiomyrgiannakis, Y., & Wu, Y. (2018). Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions [Paper presentation]. IEEE International Conference on Acoustics, Speech and Signal Processing. doi:10.48550/arXiv.1712.05884
  • Tomik, B., Krupinski, J., Glodzik-Sobanska, L., Bala-Slodowska, M., Wszolek, W., Kusiak, M., & Lechwacka, A. (1999). Acoustic analysis of dysarthria profile in ALS patients. Journal of the Neurological Sciences, 169(1-2), 35–42. doi:10.1016/S0022-510X(99)00213-0
  • Turner, G., & Weismer, G. (1993). Characteristics of speaking rate in the dysarthria associated with amyotrophic lateral sclerosis. Journal of Speech and Hearing Research, 36(6), 1134–1144. doi:10.1044/jshr.3606.1134
  • van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., Kavukcuoglu, K. (2016). WaveNet: A generative model for raw audio. http://arxiv.org/abs/1609.03499
  • Veaux, C., Yamagishi, J., & King, S. (2011). Voice banking and voice reconstruction for MND patients [Paper presentation]. The Proceedings of the 13th International ACM SIGACCESS Conference on Computers and Accessibility. doi:10.1145/2049536.2049619
  • Weismer, G., Jeng, J.-Y., Laures, J. S., Kent, R. D., & Kent, J. F. (2001). Acoustic and intelligibility characteristics of sentence production in neurogenic speech disorders. Folia Phoniatrica et Logopaedica, 53(1), 1–18. doi:10.1159/000052649
  • Westley, M., Sutherland, D., & Bunnell, H. T. (2019). Voice banking to support people who use speech-generating devices: New Zealand voice donors’ perspectives. Perspectives of the ASHA Special Interest Groups, 4(4), 593–600. doi:10.1044/2019_PERS-SIG2-2018-0011
  • Wu, Z., Watts, O., King, S. (2009). Merlin: An open source neural network speech synthesis system. Proceedings of the 9th ISCA Speech Synthesis Workshop, 218–233. http://ssw9.talp.cat/papers/ssw9_PS2-13_Wu.pdf
  • Yamagishi, J., Veaux, C., King, S., & Renals, S. (2012). Speech synthesis technologies for individuals with vocal disabilities: Voice banking and reconstruction. Acoustical Science and Technology, 33(1), 1–5. doi:10.1250/ast.33.1
  • Zen, H., Tokuda, K., & Black, A. (2009). Statistical parametric speech synthesis. Speech Communication, 51(11), 1039–1064. doi:10.1016/j.specom.2009.04.004

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.