Publication Cover
Assistive Technology
The Official Journal of RESNA
Volume 36, 2024 - Issue 4
937
Views
1
CrossRef citations to date
0
Altmetric
Research Article

A proof-of-concept study for automatic speech recognition to transcribe AAC speakers’ speech from high-technology AAC systems

, PHD, CCC-SLPORCID Icon, , BS & , PHD
Pages 319-326 | Accepted 13 Sep 2023, Published online: 05 Oct 2023

References

  • Alharbi, S., Hasan, M., Simons, A. J. H., Brumfitt, S., & Green, P. (2017). Detecting stuttering events in transcripts of children’s speech. In N. Camelin, Y. Estève, & C. Martín-Vide (Eds.), Statistical language and speech processing (Vol. 10583, pp. 217–228). Springer International Publishing. https://doi.org/10.1007/978-3-319-68456-7_18
  • AssistiveWare. (2019). Proloquo2go [Mobile App]. http://www.assistiveware.com/product/proloquo2go
  • Audacity Team. (2021). Audacity(r): Free audio editor and recorder [Computer application]. Version 3.0.0. Retrieved March 17, 2021, from https://audacityteam.org/
  • Beukelman, D. R., & Light, J. C. (2020). Augmentative & alternative communication: Supporting children and adults with complex communication needs (5th ed.). Paul H. Brookes Publishing Co., Inc.
  • Binger, C., Kent-Walsh, J., Harrington, N., & Hollerbach, Q. C. (2020). Tracking early sentence-building progress in graphic symbol communication. Language, Speech, and Hearing Services in Schools, 51(2), 317–328. https://doi.org/10.1044/2019_LSHSS-19-00065
  • Bohouta, G., & Këpuska, V. (2017). Comparing speech recognition systems (Microsoft API, Google API and CMU Sphinx). International Journal of Engineering Research and Applications, 2248–9622(03), 20–24. https://doi.org/10.9790/9622-0703022024
  • Chen, S.-H. K., Wadhwa, S., & Nyberg, E. (2019). Design and analysis of interoperable data logs for augmentative communication practice. The 21st International ACM SIGACCESS Conference on Computers and Accessibility, 533–535. https://doi.org/10.1145/3308561.3354614
  • The CMU Pronouncing Dictionary. (n.d.). In Carnegie Mellon speech group. https://github.com/cmusphinx/cmudict
  • CMUSphinx [ Computer software]. (n.d.). https://cmusphinx.github.io/
  • CoughDrop. INC. (2019). CoughDrop [Mobile App]. www.mycoughdrop.com
  • Cross, R. T., & Segalman, B. (2016). The realize language system: An online SGD data log analysis tool. Assistive Technology Outcomes & Benefits (ATOB), 10(1), 75–93.
  • Dada, S., & Alant, E. (2009). The effect of aided language stimulation on vocabulary acquisition in children with little or no functional speech. American Journal of Speech-Language Pathology, 18(1), 50–64. https://doi.org/10.1044/1058-0360(2008/07-0018)
  • Del Rio, M., Delworth, N., Westerman, R., Huang, M., Bhandari, N., Palakapilly, J., McNamara, Q., Dong, J., Zelasko, P., & Jette, M. (2021). Earnings-21: A practical benchmark for ASR in the wild. Interspeech, 2021, 3465–3469. https://doi.org/10.21437/Interspeech.2021-1915
  • De Russis, L., & Corno, F. (2019). On the impact of dysarthric speech on contemporary ASR cloud platforms. Journal of Reliable Intelligent Environments, 5(3), 163–172. https://doi.org/10.1007/s40860-019-00085-y
  • Dhankar, A. (2017). Study of deep learning and CMU sphinx in automatic speech recognition. 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2296–2301. https://doi.org/10.1109/ICACCI.2017.8126189
  • Dharmale, G., Thakare, V. M., & Patil, D. (2019). Implementation of efficient speech recognition system on mobile device for Hindi and English language. International Journal of Advanced Computer Science & Applications, 10(2), 83–87. https://doi.org/10.14569/IJACSA.2019.0100212
  • Errattahi, R., El Hannani, A., & Ouahmane, H. (2018). Automatic speech recognition errors detection and correction: A review. Procedia Computer Science, 128, 32–37. https://doi.org/10.1016/j.procs.2018.03.005
  • Fox, C. B., Israelsen-Augenstein, M., Jones, S., & Gillam, S. L. (2021). An evaluation of expedited transcription methods for school-age children’s narrative language: Automatic speech recognition and real-time transcription. Journal of Speech, Language, & Hearing Research, 64(9), 3533–3548. https://doi.org/10.1044/2021_JSLHR-21-00096
  • Gales, M., & Young, S. (2007). The application of hidden Markov models in speech recognition. Foundations and trends® in Signal Processing, 1(3), 195–304. https://doi.org/10.1561/2000000004
  • Google Speech-to-Text. (n.d.). Google. https://cloud.google.com/speech-to-text/
  • Green, J. R., MacDonald, R. L., Jiang, P., Cattiau, J., Heywood, R., Cave, R., Seaver, K., Ladewig, M. A., Tobin, J., Brenner, M. P., Nelson, P. C., & Tomanek, K. (2021). Automatic speech recognition of disordered speech: Personalized models outperforming human listeners on short phrases. Interspeech, 2021, 4778–4782. https://doi.org/10.21437/Interspeech.2021-1384
  • Hill, K. (2004). Augmentative and alternative communication and language: Evidence‐based practice and language activity monitoring. Topics in Language Disorders, 24(1), 18–30. https://doi.org/10.1097/00011363-200401000-00004
  • Hill, K. J., & Romich, B. A. (2001). A language activity monitor for supporting AAC evidence-based clinical practice. Assistive Technology: The Official Journal of RESNA, 13(1), 12–22. https://doi.org/10.1080/10400435.2001.10132030
  • Huh, J., Park, S., Lee, J. E., & Ye, J. C. (2023). Improving medical speech-to-text accuracy with vision-language pre-training model ( arXiv:2303.00091). arXiv. http://arxiv.org/abs/2303.00091
  • Jongman, S. R., Khoe, Y. H., & Hintz, F. (2021). Vocabulary size influences spontaneous speech in native language users: Validating the use of automatic speech recognition in individual differences research. Language and Speech, 64(1), 35–51. https://doi.org/10.1177/0023830920911079
  • Joshy, A. A., & Rajan, R. (2022). Automated dysarthria severity classification: A study on acoustic features and deep learning techniques. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 30, 1147–1157. https://doi.org/10.1109/TNSRE.2022.3169814
  • Kafle, S., & Huenerfauth, M. (2019). Predicting the understandability of imperfect English captions for people who are deaf or hard of hearing. ACM Transactions on Accessible Computing, 12(2), 1–32. https://doi.org/10.1145/3325862
  • Kemp, K., & Klee, T. (1997). Clinical language sampling practices: Results of a survey of speech-language pathologists in the United States. Child Language Teaching and Therapy, 13(2), 161–176. https://doi.org/10.1177/026565909701300204
  • Kim, M., Cao, B., An, K., & Wang, J. (2018). Dysarthric speech recognition using convolutional lstm neural network. Interspeech, 2018, 2948–2952. https://doi.org/10.21437/Interspeech.2018-2250
  • Klatte, I. S., Van Heugten, V., Zwitserlood, R., & Gerrits, E. (2022). Language sample analysis in clinical practice: Speech-language pathologists’ barriers, facilitators, and needs. Language, Speech & Hearing Services in Schools, 53(1), 1–16. https://doi.org/10.1044/2021_LSHSS-21-00026
  • Kovacs, T., & Hill, K. (2015). A tutorial on reliability testing in AAC language sample transcription and analysis. Augmentative and Alternative Communication, 31(2), 159–169. https://doi.org/10.3109/07434618.2015.1036118
  • Kovacs, T., & Hill, K. (2017). Language samples from children who use speech-generating devices: Making sense of small samples and utterance length. American Journal of Speech-Language Pathology, 26(3), 939–950. https://doi.org/10.1044/2017_AJSLP-16-0114
  • Le, D., & Provost, E. M. (2016). Improving automatic recognition of aphasic speech with aphasiabank. Interspeech, 2016, 2681–2685. https://doi.org/10.21437/Interspeech.2016-213
  • Lesher, G. W., Rinkus, G. J., Moulton, B. J., & Higginbotham, D. J. (2000). Logging and analysis of augmentative communication. Proceedings of the RESNA 2000 Annual Conference, 82–85. Orlando, FL.
  • MacDonald, R. L., Jiang, P.-P., Cattiau, J., Heywood, R., Cave, R., Seaver, K., Ladewig, M. A., Tobin, J., Brenner, M. P., Nelson, P. C., Green, J. R., & Tomanek, K. (2021). Disordered speech data collection: Lessons learned at 1 million utterances from project euphonia. Interspeech, 2021, 4833–4837. https://doi.org/10.21437/Interspeech.2021-697
  • Manfredi, E. by C. (2021). Models and analysis of vocal emissions for biomedical applications (Florence) [Text]. https://books.fupress.com/catalogue/models-and-analysis-of-vocal-emissions-for-biomedical-applications/7364
  • Mendelev, V., Raissi, T., Camporese, G., & Giollo, M. (2021). Improved robustness to disfluencies in rnn-transducer based speech recognition. ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6878–6882. https://doi.org/10.1109/ICASSP39728.2021.9413618
  • Mishra, T., Ljolje, A., & Gilbert, M. (2011). Predicting human perceived accuracy of ASR systems. Proceedings of Interspeech -2011, 1945–1948. https://doi.org/10.21437/Interspeech.2011-364
  • Mustafa, M. B., Rosdi, F., Salim, S. S., & Mughal, M. U. (2015). Exploring the influence of general and specific factors on the recognition accuracy of an ASR system for dysarthric speaker. Expert Systems with Applications, 42(8), 3924–3932. https://doi.org/10.1016/j.eswa.2015.01.033
  • Pavelko, S. L., Owens, R. E., Ireland, M., & Hahs-Vaughn, D. L. (2016). Use of language sample analysis by school-based SLPs: Results of a nationwide survey. Language Speech and Hearing Services in Schools, 47(3), 246–246. https://doi.org/10.1044/2016_LSHSS-15-0044
  • Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlıcˇek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., & Vesely, K. (n.d.). The Kaldi speech recognition toolkit. 4.
  • PRC Co. (2020). LAMP words for life [Mobile App]. https://www.prentrom.com/prc_advantage/lamp-words-for-life-language-system
  • Refaeilzadeh, P., Tang, L., & Liu, H. (2016). Cross-Validation. In L. Liu & M. T. Özsu (Eds.), Encyclopedia of database systems (pp. 1–7). Springer New York. https://doi.org/10.1007/978-1-4899-7993-3_565-2
  • Rohlfing, M. L., Buckley, D. P., Piraquive, J., Stepp, C. E., & Tracy, L. F. (2021). Hey siri: How effective are common voice recognition systems at recognizing dysphonic voices? The Laryngoscope, 131(7), 1599–1607. https://doi.org/10.1002/lary.29082
  • Savaldi-Harussi, G., & Soto, G. (2016). Using SALT: Considerations for the analysis of language transcripts of children who use SGDs. Perspectives of the ASHA Special Interest Groups, 1(12), 110–124. https://doi.org/10.1044/persp1.sig12.110
  • Shonibare, O., Tong, X., & Ravichandran, V. (2022). Enhancing ASR for stuttered speech with limited data using detect and pass. Cureus, 14(9). https://doi.org/10.48550/ARXIV.2202.05396
  • Soto, G. (2022). The protocol for the analysis of aided language samples in Spanish: A tutorial. Perspectives of the ASHA Special Interest Groups, 7(2), 523–532. https://doi.org/10.1044/2021_PERSP-21-00236
  • Soto, G., & Hartmann, E. (2006). Analysis of narratives produced by four children who use augmentative and alternative communication. Journal of Communication Disorders, 39(6), 456–480. https://doi.org/10.1016/j.jcomdis.2006.04.005
  • Speak for yourself LLC. (2019). Speak for yourself [Mobile App]. https://www.speakforyourself.org/
  • Van Tatenhove, G. (2014). Issues in language sample collection and analysis with children using AAC. Perspectives on Augmentative and Alternative Communication, 23(2), 65–65. https://doi.org/10.1044/aac23.2.65
  • Vyas, G., Dutta, M. K., & Prinosil, J. (2017). Improving the computational complexity and word recognition rate for dysarthria speech using robust frame selection algorithm. International Journal of Signal & Imaging Systems Engineering, 10(3), 136. https://doi.org/10.1504/IJSISE.2017.086037
  • Wang, D., Wang, X., & Lv, S. (2019). An overview of end-to-end automatic speech recognition. Symmetry, 11(8), 1018. https://doi.org/10.3390/sym11081018
  • Xiong, F., Barker, J., & Christensen, H. (2019). Phonetic analysis of dysarthric speech tempo and applications to robust personalised dysarthric speech recognition. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5836–5840. https://doi.org/10.1109/ICASSP.2019.8683091