Publication Cover
Assistive Technology
The Official Journal of RESNA
Volume 35, 2023 - Issue 4
768
Views
4
CrossRef citations to date
0
Altmetric
Review

Interaction between people with dysarthria and speech recognition systems: A review

, MScORCID Icon, , PhDORCID Icon & , PhDORCID Icon
Pages 330-338 | Accepted 28 Mar 2022, Published online: 18 Apr 2022

References

  • Abowd, G. D., & Beale, R. (1991). Users, systems and interfaces: A unifying framework for interaction. In D. Diaper, & N. Hammond (Eds.), Proceedings of the Sixth Conference of the British Computer Society Human Computer Interaction Specialist Group: People and computers IV (pp. 73–87). Cambridge University Press.
  • Allison, K. M., Yunusova, Y., & Green, J. R. (2019). Shorter sentence length maximizes intelligibility and speech motor performance in persons with dysarthria due to amyotrophic lateral sclerosis. American Journal of Speech-Language Pathology, 28(1), 96–107. https://doi.org/10.1044/2018_ajslp-18-0049
  • Balaji, V., & Sadashivappa, G. (2015). Speech disabilities in adults and the suitable speech recognition software tools—a review. In J. Wu, & C. B. Westphall (Eds.), 2015 International Conference on Computing and Network Communications (CoCoNet) (pp. 559–564). Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/coconet.2015.7411243
  • Ballati, F., Corno, F., & De Russis, L. (2018a). Assessing virtual assistant capabilities with Italian dysarthric speech. In F. Hwang (Ed.), Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility (pp. 93–101). Association for Computing Machinery. https://doi.org/10.1145/3234695.3236354
  • Ballati, F., Corno, F., & De Russis, L. (2018b). “Hey Siri, do you understand me?”: Virtual assistants and dysarthria. In I. Chatzigiannakis, Y. Tobe, P. Novais, & O. Amft (Eds.), Intelligent Environments 2018: Workshop Proceedings of the 14th International Conference on Intelligent Environments (Vol. 23, pp. 557–566). IOS Press.
  • Cai, S., Lillianfeld, L., Seaver, K., Green, J., Brenner, M., Nelson, P., & Sculley, D. (n.d.). A voice-activated switch for persons with motor and speech impairments: Isolated-vowel spotting using neural networks. Retrieved March 8, 2022, from https://storage.googleapis.com/pub-tools-public-publication-data/pdf/70c549e87cbf7e8a3d9c7a3c8e6f862362a9dbee.pdf
  • Darley, F. L., Aronson, A. E., & Brown, J. R. (1969). Differential diagnostic patterns of Dysarthria. Journal of Speech and Hearing Research, 12(2), 246–269. https://doi.org/10.1044/jshr.1202.246
  • De Russis, L., & Corno, F. (2019). On the impact of dysarthric speech on contemporary ASR cloud platforms. Journal of Reliable Intelligent Environments, 5(3), 163–172. https://doi.org/10.1007/s40860-019-00085-y
  • Derboven, J., Huyghe, J., & De Grooff, D. (2014). Designing voice interaction for people with physical and speech impairments. In V. Roto (Ed.), Proceedings of the 8th Nordic Conference on Human–Computer Interaction: Fun, fast, foundational (pp. 217–226). Association for Computing Machinery. https://doi.org/10.1145/2639189.2639252
  • Dhanalakshmi, M., Mariya Celin, T. A., Nagarajan, T., & Vijayalakshmi, P. (2018). Speech-Input speech-output communication for dysarthric speakers using HMM-based speech recognition and adaptive synthesis system. Circuits, Systems, and Signal Processing, 37(2), 674–703. https://doi.org/10.1007/s00034-017-0567-9
  • Espaňa-Bonet, C., & Fonollosa, J. A. R. (2016). Automatic speech recognition with deep neural networks for impaired speech. In A. Abad (Ed.), Advances in speech and language technologies for Iberian languages. IberSPEECH 2016. Lecture notes in computer science (Vol. 10077, pp. 97–107). Springer. https://doi.org/10.1007/978-3-319-49169-1_10
  • Fager, S. K., Beukelman, D. R., Jakobs, T., & Hosom, J.-P. (2010). Evaluation of a speech recognition prototype for speakers with moderate and severe Dysarthria: A preliminary report. Augmentative and Alternative Communication, 26(4), 267–277. https://doi.org/10.3109/07434618.2010.532508
  • Ferrier, L., Shane, H., Ballard, H., Carpenter, T., & Benoit, A. (1995). Dysarthric speakers’ intelligibility and speech characteristics in relation to computer speech recognition. Augmentative and Alternative Communication, 11(3), 165–175. https://doi.org/10.1080/07434619512331277289
  • Fried-Oken, M. (1985). Voice recognition device as a computer interface for motor and speech impaired people. Archives of Physical Medicine and Rehabilitation, 66(10), 678–681.
  • Gemmeke, J., Ons, B., Tessema, N. M., Van Hamme, H., Van de Loo, J., De Pauw, G., Daelemans, W., Huyghe, J., Derboven, J., Vuegen, L., Van Den Broeck, B., Karsmakers, P., & Vanrumste, B. (2013, August 1). Self-Taught assistive vocal interfaces: An overview of the ALADIN project. Lirias.kuleuven.be; ISCA; Baixas. https://doi.org/10.10/lirias.kuleuven.be/301023?limo=0
  • Gemmeke, J. F., Sehgal, S., Cunningham, S., & Van Hamme, H. (2014). Dysarthric vocal interfaces with minimal training data. In Institute of Electrical and Electronics Engineers (Ed.), 2014 IEEE Spoken Language Technology Workshop (SLT 2014) (pp. 248–253). IEEE. https://doi.org/10.1109/SLT.2014.7078582
  • Geng, M., Liu, S., Yu, J., Xie, X., Hu, S., Ye, Z., Jin, Z., Liu, X., & Meng, H. (2021). Spectro-Temporal deep features for disordered speech assessment and recognition. Interspeech 2021. https://doi.org/10.21437/interspeech.2021-60
  • Green, J. R., MacDonald, R. L., Jiang, P.-P., Cattiau, J., Heywood, R., Cave, R., Seaver, K., Ladewig, M. A., Tobin, J., Brenner, M. P., Nelson, P. C., & Tomanek, K. (2021). Automatic speech recognition of disordered speech: Personalized Models outperforming human listeners on short phrases. Interspeech 2021. https://doi.org/10.21437/interspeech.2021-1384
  • Hamidi, F., Baljko, M., Ecomomopoulos, C., Livingston, N. J., & Spalteholz, L. G. (2015). Co-Designing a speech interface for people with dysarthria. Journal of Assistive Technologies, 9(3), 159–173. https://doi.org/10.1108/jat-10-2014-0026
  • Hamidi, F., Baljko, M., Livingston, N., & Spalteholz, L. (2010). CanSpeak: A customizable speech interface for people with dysarthric speech. In K. Miesenberger, J. Klaus, W. Zagler, & A. Karshmer (Eds.), Computers helping people with special needs: 12th International Conference, ICCHP 2010: Proceedings (pp. 605–612). Springer. https://doi.org/10.1007/978-3-642-14097-6_97
  • Hawley, M. S., Enderby, P., Green, P., Cunningham, S., & Palmer, R. (2006). Development of a voice-input voice-output communication aid (VIVOCA) for people with severe dysarthria. In K. Miesenberger, J. Klaus, W. L. Zagler, & A. I. Karshmer (Eds.), Computers helping people with special needs: 10th International Conference, ICCHP 2006: Proceedings (pp. 882–885). Springer. https://doi.org/10.1007/11788713_128
  • Hermann, E., & Magimai Doss, M. (2020). Dysarthric speech recognition with lattice-free MMI. In Institute of Electrical and Electronics Engineers (Ed.), ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6109–6113). IEEE. https://doi.org/10.1109/ICASSP40776.2020.9053549
  • Holyfield, C., & Drager, K. (2021). Integrating familiar listeners and speech recognition technologies into augmentative and alternative communication intervention for adults with down syndrome: Descriptive exploration. Assistive Technology. https://doi.org/10.1080/10400435.2021.1934610
  • Jaddoh, A., Loizides, F., & Rana, O. (2021). Non-Verbal Interaction with Virtual Home Assistants for People with Dysarthria. https://scholarworks.csun.edu/bitstream/handle/10211.3/219936/JTPD-2021-p071.pdf?sequence=1
  • Jin, Z., Geng, M., Xie, X., Yu, J., Liu, S., Liu, X., & Meng, H. (2021). Adversarial data augmentation for disordered speech recognition. ArXiv:2108.00899 [Cs, Eess]. https://arxiv.org/abs/2108.0089
  • Joy, N. M., & Umesh, S. (2018). Improving acoustic models in TORGO dysarthric speech database. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 26(3), 637–645. https://doi.org/10.1109/tnsre.2018.2802914
  • Këpuska, V. Z., & Bohouta, G. (2017). Comparing speech recognition systems (Microsoft API, Google API and CMU Sphinx). International Journal of Engineering Research and Application, 7(3), 20–24. https://doi.org/10.9790/9622-0703022024
  • Kim, H., Hasegawa-Johnson, M., Perlman, A., Gunderson, J., Huang, T. S., Watkin, K., & Frame, S. (2008). Dysarthric speech database for universal access research. In International Speech Communication Association (Ed.), 9th Annual Conference of the International Speech Communication Association (INTERSPEECH 2008) (pp. 1741–1744). Curran Associates.
  • Kim, S., Hwang, Y., Shin, D., Yang, C.-Y., Lee, S.-Y., Kim, J., Kong, B., Chung, J., Cho, N., Kim, J.-H., & Chung, M. (2013). VUI development for Korean people with dysarthria. Journal of Assistive Technologies, 7(3). https://doi.org/10.1108/jat-10-2012-0031
  • Kim, Y., Kent, R. D., & Weismer, G. (2011). An acoustic study of the relationships among neurologic disease, dysarthria type, and severity of dysarthria. Journal of Speech, Language, and Hearing Research, 54(2), 417–429. https://doi.org/10.1044/1092-4388(2010/10-0020)
  • Kim, M., Kim, Y., Yoo, J., Wang, J., & Kim, H. (2017). Regularized speaker adaptation of KL-HMM for dysarthric speech recognition. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 25(9), 1581–1591. https://doi.org/10.1109/tnsre.2017.2681691
  • Kim, M., Wang, J., & Kim, H. (2016). Dysarthric speech recognition using Kullback–Leibler divergence-based hidden Markov model. In International Speech Communication Association (Ed.), 17th Annual Conference of the International Speech Communication Association (INTERSPEECH 2016): Understanding speech processing in humans and machines (pp. 2671–2675). Curran Associates. https://doi.org/10.21437/Interspeech.2016-776
  • Kim, M., Yoo, J., & Kim, H. (2013). Dysarthric speech recognition using dysarthria-severity-dependent and speaker-adaptive models. In F. Bimbot (Ed.), 14th Annual Conference of the International Speech Communication Association (INTERSPEECH 2013): Speech in life sciences and human societies (pp. 3622–3626). International Speech Communication Association.
  • Ko, T., Peddinti, V., Povey, D., Seltzer, M. L., & Khudanpur, S. (2017). A study on data augmentation of reverberant speech for robust speech recognition. In Institute of Electrical and Electronics Engineers (Ed.), ICASSP 2017—2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5220–5224). IEEE. https://doi.org/10.1109/icassp.2017.7953152
  • Lea, C., Huang, Z., Jain, D., Tooley, L., Liaghat, Z., Thelapurath, S., Findlater, L., & Bigham, J. P. (2022). Nonverbal sound detection for disordered speech. ArXiv:2202.07750 [Cs, Eess]. https://arxiv.org/abs/2202.07750
  • Liu, S., Geng, M., Hu, S., Xie, X., Cui, M., Yu, J., Liu, X., & Meng, H. (2021). Recent progress in the CUHK dysarthric speech recognition system. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 2267–2281. https://doi.org/10.1109/TASLP.2021.3091805
  • Liu, S., Hu, S., Liu, X., & Meng, H. (2019). On the use of pitch features for disordered speech recognition. In International Speech Communication Association (Ed.), 20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019): Crossroads of speech and language (pp. 4130–4134). Curran Associates.
  • Malavasi, M., Turri, E., Motolese, M. R., Marxer, R., Farwer, J., Christensen, H., Desideri, L., Tamburini, F., & Green, P. (2016). An innovative speech-based interface to control AAL and IoT solutions to help people with speech and motor disability. In N. Cassidu, C. Porfirione, A. Monteriù, & F. Cavallo (Eds.), Ambient assisted living: Italian forum 2017 (pp. 269–278). Springer.
  • Marini, M., Vigano, M., Corbo, M., Zettin, M., Simoncini, G., Fattori, B., D’Anna, C., Donati, M., & Fanucci, L. (2021). IDEA: An Italian dysarthric speech database. In 2021 IEEE Spoken Language Technology Workshop (SLT). https://doi.org/10.1109/slt48900.2021.9383467
  • Mariya Celin, T. A., Nagarajan, T., & Vijayalakshmi, P. (2020). Data augmentation using virtual microphone array synthesis and multi-resolution feature extraction for isolated word dysarthric speech recognition. IEEE Journal of Selected Topics in Signal Processing, 14(2), 346–354. https://doi.org/10.1109/jstsp.2020.2972161
  • McCowan, I. A., Moore, D., Dines, J., Gatica-Perez, D., Flynn, M., Wellner, P., & Bourlard, H. (2005). On the use of information retrieval measures for speech recognition evaluation (Research Report 04-73). Idiap Research Institute.
  • Menendez-Pidal, X., Polikoff, J. B., Peters, S. M., Leonzio, J. E., & Bunnell, H. T. (1996). The nemours database of dysarthric speech. In Proceeding of Fourth International Conference on Spoken Language Processing: ICSLP’96 (Vol. 3, pp. 1962–1965). Institute of Electrical and Electronics Engineers.
  • Mengistu, K. T., & Rudzicz, F. (2011). Adapting acoustic and lexical models to dysarthric speech. In Institute of Electrical and Electronics Engineers (Ed.), 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4924–4927). IEEE. https://doi.org/10.1109/ICASSP.2011.5947460
  • Moher, D., Shamseer, L., Clarke, M., Ghersi, D., Liberati, A., Petticrew, M., Shekelle, P., & Stewart, L. A. (2015). Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Systematic Reviews, 4(1). https://doi.org/10.1186/2046-4053-4-1
  • Moore, M., Saxon, M., Venkateswara, H., Berisha, V., & Panchanathan, S. (2019). Say what? A dataset for exploring the error patterns that two ASR engines make. In International Speech Communication Association (Ed.), 20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019): Crossroads of speech and language (pp. 2528–2532). Curran Associates.
  • Moore, M., Venkateswara, H., & Panchanathan, S. (2018). Whistle-Blowing ASRs: Evaluating the need for more inclusive automatic speech recognition systems. In International Speech Communication Association (Ed.), 19th Annual Conference of the International Speech Communication Association (INTERSPEECH 2018): Speech research for emerging markets in multilingual societies. Curran Associates.
  • Mustafa, M. B., Rosdi, F., Salim, S. S., & Mughal, M. U. (2015). Exploring the influence of general and specific factors on the recognition accuracy of an ASR system for dysarthric speaker. Expert Systems with Applications, 42(8), 3924–3932. https://doi.org/10.1016/j.eswa.2015.01.033
  • Na, M., & Chung, M. (2016). Optimizing vocabulary modeling for dysarthric speech recognition. In K. Miesenberger, C. Bühler, & P. Penaz (Eds.), Computers helping people with special needs. ICCHP 2016. Lecture notes in computer science (Vol. 9759, pp. 507–510). Springer. https://doi.org/10.1007/978-3-319-41267-2_71
  • Nicolao, M., Christensen, H., Cunningham, S., Green, P., & Hain, T. (2016, May 27). A framework for collecting realistic recordings of dysarthric speech - the homeservice corpus. Proceedings of LREC 2016. https://eprints.whiterose.ac.uk/109262/
  • Nordberg, A., Miniscalco, C., & Lohmander, A. (2014). Consonant production and overall speech characteristics in school-aged children with cerebral palsy and speech impairment. International Journal of Speech-Language Pathology, 16(4), 386–395. https://doi.org/10.3109/17549507.2014.917440
  • Park, J. H., Seong, W. K., & Kim, H. K. (2011). Preprocessing of dysarthric speech in noise based on CV–dependent wiener filtering. In Proceedings of the Paralinguistic Information and Its Integration in Spoken Dialogue Systems Workshop (pp. 41–47). https://doi.org/10.1007/978-1-4614-1335-6_6
  • Parker, M., Cunningham, S., Enderby, P., Hawley, M., & Green, P. (2006). Automatic speech recognition and training for severely dysarthric users of assistive technology: The STARDUST project. Clinical Linguistics & Phonetics, 20(2–3), 149–156. https://doi.org/10.1080/02699200400026884
  • Rosen, K., & Yampolsky, S. (2000). Automatic speech recognition and a review of its functioning with dysarthric speech. Augmentative and Alternative Communication, 16(1), 48–60. https://doi.org/10.1080/07434610012331278904
  • Rudzicz, F. (2012). Using articulatory likelihoods in the recognition of dysarthric speech. Speech Communication, 54(3), 430–444. https://doi.org/10.1016/j.specom.2011.10.006
  • Rudzicz, F. (2013). Adjusting dysarthric speech signals to be more intelligible. Computer Speech & Language, 27(6), 1163–1177. https://doi.org/10.1016/j.csl.2012.11.001
  • Rudzicz, F., Namasivayam, A. K., & Wolff, T. (2012). The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Language Resources and Evaluation, 46(4), 523–541. https://doi.org/10.1007/s10579-011-9145-0
  • Seong, W. K., Kim, N., Ha, H. K., & Kim, H. K. (2016). A discriminative training method incorporating pronunciation variations for dysarthric automatic speech recognition. In Institute of Electrical and Electronics Engineers (Ed.), 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) (pp. 96–100). IEEE. https://doi.org/10.1109/APSIPA.2016.7820840
  • Sriranjani, R., Ramasubba Reddy, M., & Umesh, S. (2015). Improved acoustic modeling for automatic dysarthric speech recognition. In Indian Institute of Technology (Ed.), 2015 Twenty First National Conference on Communications (NCC 2015) (pp. 242–247). Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/NCC.2015.7084856
  • Takashima, R., Takiguchi, T., & Ariki, Y. (2020). Two-Step acoustic model adaptation for dysarthric speech recognition. In Institute of Electrical and Electronics Engineers (Ed.), ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6104–6108). IEEE.
  • Thomas-Stonell, N., Kotler, A.-L., Leeper, H., & Doyle, P. (1998). Computerized speech recognition: Influence of intelligibility and perceptual consistency on recognition accuracy. Augmentative and Alternative Communication, 14(1), 51–56. https://doi.org/10.1080/07434619812331278196
  • Turrisi, R., Braccia, A., Emanuele, M., Giulietti, S., Pugliatti, M., Sensi, M., Fadiga, L., & Badino, L. (2021). EasyCall corpus: A dysarthric speech dataset. ArXiv:2104.02542 [Cs]. https://arxiv.org/abs/2104.02542
  • Vachhani, B., Bhat, C., & Kopparapu, S. K. (2018). Data augmentation using healthy speech for dysarthric speech recognition. In International Speech Communication Association (Ed.), 19th Annual Conference of the International Speech Communication Association (INTERSPEECH 2018): Speech research for emerging markets in multilingual societies (pp. 471–475). Curran Associates.
  • Wang, D., Wang, X., & Lv, S. (2019). An overview of end-to-end automatic speech recognition. Symmetry, 11(8), 1018. https://doi.org/10.3390/sym11081018
  • Xiong, F., Barker, J., & Christensen, H. (2018). Deep learning of articulatory-based representations and applications for improving dysarthric speech recognition. In S. Doclo & P. Jax (Eds.), Proceedings of the 13th ITG Conference on Speech Communication (pp. 1–5). VDE Verlag.
  • Xiong, F., Barker, J., & Christensen, H. (2019). Phonetic analysis of dysarthric speech tempo and applications to robust personalised dysarthric speech recognition. In Institute of Electrical and Electronics Engineers (Ed.), ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5836–5840). IEEE. https://doi.org/10.1109/icassp.2019.8683091
  • Yorkston, K., Strand, E., & Kennedy, M. (1996). Comprehensibility of dysarthric speech. American Journal of Speech-Language Pathology, 5(1), 55–66. https://doi.org/10.1044/1058-0360.0501.55
  • Young, V., & Mihailidis, A. (2010). Difficulties in automatic speech recognition of dysarthric speakers and implications for speech-based applications used by the elderly: A literature review. Assistive Technology, 22(2), 99–112. https://doi.org/10.1080/10400435.2010.483646
  • Yue, Z., Xiong, F., Christensen, H., & Barker, J. (2020). Exploring appropriate acoustic and language modelling choices for continuous dysarthric speech recognition. In Institute of Electrical and Electronics Engineers (Ed.), ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.