2,005
Views
3
CrossRef citations to date
0
Altmetric
Commentaries

Harnessing automatic speech recognition to realise Sustainable Development Goals 3, 9, and 17 through interdisciplinary partnerships for children with communication disability

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, , ORCID Icon, ORCID Icon, , & ORCID Icon show all

Abstract

Purpose

To showcase how applications of automatic speech recognition (ASR) technology could help solve challenges in speech-language pathology practice with children with communication disability, and contribute to the realisation of the Sustainable Development Goals (SDGs).

Result

ASR technologies have been developed to address the need for equitable, efficient, and accurate assessment and diagnosis of communication disability in children by automating the transcription and analysis of speech and language samples and supporting dual-language assessment of bilingual children. ASR tools can automate the measurement of and help optimise intervention fidelity. ASR tools can also be used by children to engage in independent speech production practice without relying on feedback from speech-language pathologists (SLPs), thus bridging the long-standing gap between recommended and received intervention intensity. These innovative technologies and tools have been generated from interdisciplinary partnerships between SLPs, engineers, data scientists, and linguists.

Conclusion

To advance equitable, efficient, and effective speech-language pathology services for children with communication disability, SLPs would benefit from integrating ASR solutions into their clinical practice. Ongoing interdisciplinary research is needed to further advance ASR technologies to optimise children’s outcomes. This commentary paper focusses on industry, innovation and infrastructure (SDG 9) and partnerships for the goals (SDG 17). It also addresses SDG 1, SDG 3, SDG 4, SDG 8, SDG 10, SDG 11, and SDG 16.

Introduction

Communication disability in childhood is common, impacting approximately 1 in 10 children (McGregor, Citation2020). Communication disability is a collective term referring to children with impairments in speech and language, such as speech sound disorder (SSD) and developmental language disorder (DLD) (cf. Hussain et al., Citation2018). Interdisciplinary partnerships are necessary to drive innovative solutions if we are to tackle the longstanding global challenges of providing these children with equitable and sufficient access to high-quality speech-language pathology services to accelerate learning. Without timely and adequate help, young children with communication disability face increased risks of poor outcomes relevant to multiple Sustainable Development Goals (SDGs) (United Nations, Citation2015) including no poverty (SDG 1) (Olusanya et al., Citation2006), good health and well-being (SDG 3) (St Clair et al., Citation2019), quality education (SDG 4) (Lewis et al., Citation2019), and decent work and economic growth (SDG 8) (Conti-Ramsden et al., Citation2018). They also face a risk of reduced inequalities (SDG 10) socially, economically, and politically (Murphy et al., Citation2018), inequalities in access to adequate, safe, and affordable housing (Tually et al., Citation2011) relevant to sustainable cities and communities (SDG 11), and inequalities in access to and experience of the justice system (Snow, Citation2019) relevant to peace, justice and strong institutions (SDG 16). In this commentary we highlight how two SDGs – industry, innovation and infrastructure (SDG 9) and partnerships for the goals (SDG 17) – could advance speech-language pathologists’ (SLPs) practice and in doing so, mitigate the risk of poor outcomes across multiple SDGs for children with communication disability. Our focus is on clinical applications of automatic speech recognition (ASR) – technologies that have emerged through innovation, and interdisciplinary partnerships.

What are automatic speech recognition (ASR) technologies?

Automatic speech recognition (ASR) technologies convert human speech signals to text. ASR is an interdisciplinary field within computer science encompassing signal processing, natural language processing (NLP), machine learning (ML), and artificial intelligence (AI). Many commercial ASR systems (e.g. Google Cloud Speech-to-Text) additionally feature conversational agents (e.g. Google Assistant) utilising a mixture of ASR, NLP, and Text-To-Speech (TTS) or speech synthesis to realise human-machine conversation (Allouch et al., Citation2021). Current state-of-the-art ASR systems are designed for clear adult speech and are limited to low-noise conditions.

Applications of ASR in the delivery of speech-language pathology services for children

Time and clinical resources are finite in practice, and clinical decisions on the best use of time and resources can be fraught with compromise. In this section, we explore how innovations in ASR could minimise such compromises for three challenges in practice.

Innovations in ASR to help ensure equitable, efficient, accurate assessments

SLPs have looked to ASR to reduce the prohibitive time demands of language sampling transcription (Miller et al., Citation2016), with promising ASR solutions emerging, at least in particular languages. For example, Fox et al. (Citation2021) discovered that ASR (using Google Cloud Speech-to-Text) was more accurate than SLPs in orthographically transcribing narrative samples from English-monolingual school-age children with DLD in real time. Interdisciplinary teams comprising SLPs and engineers have also developed ASR tools that can identify and classify errors in English-speaking children’s speech (e.g. Tabby Talks; Shahin et al., Citation2015).

Worldwide, multilingualism is the norm and for assessments to be valid, SLPs must consider children’s culture and the language(s) they speak and hear (Mdlalo et al., Citation2019). When access to professional interpreters is limited or absent, SLP services for multilingual children become inequitable (Newbury et al., Citation2020). Albudoor and Peña (Citation2022) investigated whether assessment of bilingual children’s responses on an expressive language assessment task could be automated with ASR. They found moderate agreement between human and ASR (Google Cloud Speech-to-Text) transcription for English-Spanish bilingual children with typical language acquisition (n = 59) and children with DLD (n = 25). Given the rapid pace and development of ASR technology, Albudoor and Peña (Citation2022) suggest that accurate language assessment for children is possible using ASR, irrespective of the language(s) spoken, thus contributing to good health and well-being (SDG 3) for all children with communication disability.

Innovations in ASR to measure intervention fidelity

Measurement of fidelity (i.e. determining if an intervention was implemented as intended) is challenging and often unreported (Baker et al., Citation2022). It typically involves SLPs retrospectively completing intervention fidelity checklists, reviewing recordings of intervention sessions, or relying on extra personnel to count intervention elements in real time (e.g. Alt et al., Citation2021; Munro et al., Citation2021). Such approaches to measurement are inefficient and do not expedite optimal fidelity in real time. An intervention approach where real-time measurement of fidelity is critical is Vocabulary Acquisition and Usage for Late Talkers (VAULT) treatment (Alt et al., Citation2021). VAULT is an input play-based intervention in which SLPs need to say predetermined treatment words at a prescribed dose (e.g. say 90 productions of put, baby, and chair totalling 270 inputs) during a 30-minute session (Alt et al., Citation2021). Such a dose is not feasible for SLPs to count as they engage toddlers in varied play activities. Although ASR tools that convert speech-to-text can successfully support physicians’ clinical documentation (Latif et al., Citation2021), such tools do not count words. One ASR tool that does count adult words spoken during adult–child interactions is the Language ENvironment Analysis (LENATM) system (Wang et al., Citation2020). However, LENATM counts all adult words spoken rather than specific predetermined words. Six SLPs who are authors of the current paper (EB, RH, SM, MA, KT, NM) have been conducting research into VAULT. They contacted engineers, data scientists, and linguists with expertise in ASR to design a solution. Through this interdisciplinary partnership, the Speech Therapy Word Tracker (STeWoT) prototype tool was developed.

The STeWoT prototype was developed in Python and interacts with existing commercial ASR technology, either Microsoft Azure (Microsoft, Citation2022) or Google Cloud (Google, Citation2022). As depicted in , SLPs’ talk during a VAULT session can be converted to text. This text can then be analysed in real time to yield a count (on a PC screen) of the occurrence of the pre-determined treatment words. In this way, real-time feedback is received on input dose. Early trials of role-play VAULT session activities have shown promising alignment with a human count (see ). Pending further trials of the prototype tool during VAULT sessions, the tool not only has the potential to automate efficient and precise measurement of fidelity, but automate the collection of assessment, baseline, treatment, and generalisation data for children and adults with communication disability.

Figure 1. Speech Therapy Word Tracker (STeWoT) Tool workflow.

Figure 1. Speech Therapy Word Tracker (STeWoT) Tool workflow.

Table I. Comparison of machine versus human count of speech-language pathologist’s production of treatment words during two VAULTa treatment role-play session activities.

Innovations in ASR to increase intervention intensity

Intensity is a multidimensional construct encompassing the parameters of dose within a session plus the frequency, duration, and total number of sessions (Warren et al., Citation2007). Intervention outcomes can be compromised when the intensity received is inadequate, such as children being seen for fewer sessions than empirically recommended (e.g. Glogowska et al., Citation2000), under-dosing of trials within a session, and sessions being less frequent than required (Hegarty et al., Citation2021). This gap between recommended and received intensity is long-standing and recalcitrant (Law & Conti-Ramsden, Citation2000), with SLPs calling for time-saving evidence-based tools (Hegarty et al., Citation2021).

Interdisciplinary teams have been using ASR to bridge the gap between recommended and received intensity. For instance, a team of SLPs, engineers, and computer scientists worked with children and their carers to develop a therapeutic video game – Apraxia World (more recently renamed Say Bananas!) – to provide real-time feedback on children’s speech accuracy using ASR technology (Hair et al., Citation2021). The team designed the game to motivate and engage children in long-term independent speech practice. The emerging evidence suggests it can help increase session dose, frequency, and total number of sessions. In a pilot study of 10 children, improvements in the children’s speech were considered comparable to face-to-face SLP services (Hair et al., Citation2021). The inherent automation of the tool also has the potential to optimise and measure fidelity of implementation. Indeed, automated feedback can be more accurate and consistent compared to adults’ feedback, given adults’ tendency to be lenient (Hair et al., Citation2021).

Across the examples highlighted, some ASR applications such as Say Bananas! are available for clinical use, while others such as the STeWoT tool are prototypes at the frontier of scientific investigation paving the way for future clinical application. A systematic review would be needed to provide an in-depth understanding of the breadth of tools available and in development. Additionally, the example ASR solutions in this commentary emerged from Minority world countries such as Australia and the United States, where speech-language pathology services exist. Although it may seem fanciful for children living in Majority world countries with limited or non-existent speech-language pathology services to benefit from ASR technologies, equity of access and opportunity for their use could become a reality as children living in these countries have increased access to computers, the Internet and telepractice (Zahir et al., Citation2021).

Summary and conclusion

Innovative solutions are possible when interdisciplinary partners collaborate to solve vexing challenges in clinical practice. To advance equitable, efficient, and effective speech-language pathology services for children with communication disability and realise children’s good health and well-being (SDG 3), SLPs are encouraged to integrate evidence-based ASR solutions into practice. Going forward, further interdisciplinary research is needed to advance ASR solutions. The groundswell of interest in ASR also needs to be matched by practice-based research considering clinical reach, utility, and effectiveness.

Declaration of interest

The authors declare that they developed the prototype of STeWoT tool mentioned in the manuscript. The authors alone are responsible for the content and writing of this article.

References

  • Albudoor, N., & Peña, E.D. (2022). Identifying language disorder in bilingual children using automatic speech recognition. Journal of Speech, Language, and Hearing Research, 65, 2648–2661. doi:10.1044/2022_JSLHR-21-00667
  • Allouch, M., Azaria, A., & Azoulay, R. (2021). Conversational agents: Goals, technologies, vision and challenges. Sensors, 21, 8448. https://www.mdpi.com/1424-8220/21/24/8448 doi:10.3390/s21248448
  • Baker, E., Masso, S., Huynh, K., & Sugden, E. (2022). Optimizing outcomes for children with phonological impairment: A systematic search and review of outcome and experience measures reported in intervention research. Language, Speech, and Hearing Services in Schools, 53, 732–748. doi:10.1044/2022_LSHSS-21-00132
  • Conti-Ramsden, G., Durkin, K., Toseeb, U., Botting, N., & Pickles, A. (2018). Education and employment outcomes of young adults with a history of developmental language disorder. International Journal of Language and Communication Disorders, 53, 237–255. doi:10.1111/1460-6984.12338
  • Fox, C.B., Israelsen-Augenstein, M., Jones, S., & Gillam, S.L. (2021). An evaluation of expedited transcription methods for school-age children’s narrative language: Automatic speech recognition and real-time transcription. Journal of Speech, Language, and Hearing Research, 64, 3533–3548. doi:10.1044/2021_JSLHR-21-00096
  • Glogowska, M., Roulstone, S., Enderby, P., & Peters, T. J. (2000). Randomised controlled trial of community based speech and language therapy in preschool children. BMJ (Clinical Research Ed.), 321(7266), 923–926. https://doi.org/10.1136/bmj.321.7266.923 11030677
  • Google. (2022). Cloud speech-to-text basics. Google Cloud. https://cloud.google.com/speech-to-text/docs/basics
  • Hair, A., Ballard, K.J., Markoulli, C., Monroe, P., Mckechnie, J., Ahmed, B., & Gutierrez-Osuna, R. (2021). A longitudinal evaluation of tablet-based child speech therapy with Apraxia World. ACM Transactions on Accessible Computing, 14, 3. doi:10.1145/3433607
  • Hegarty, N., Titterington, J., & Taggart, L. (2021). A qualitative exploration of speech-language pathologists’ intervention and intensity provision for children with phonological impairment. International Journal of Speech-Language Pathology, 23, 213–224. doi:10.1080/17549507.2020.1769728
  • Hussain, N., Jagoe, C., Mullen, R., O’Shea, A., Sutherland, D., Williams, C., & Wright, M. (2018). The importance of speech, language and communication to the United Nations Sustainable Development Goals: A summary of evidence. International Communication Project. https://internationalcommunicationproject.com/wp-content/uploads/2018/12/ICP-Sustainable-Development-Goals.pdf
  • Latif, S., Qadir, J., Qayyum, A., Usama, M., & Younis, S. (2021). Speech technology for healthcare: Opportunities, challenges, and state of the art. IEEE Reviews in Biomedical Engineering, 14, 342–356. doi:10.1109/RBME.2020.3006860
  • Law, J., & Conti-Ramsden, G. (2000). Treating children with speech and language impairments. BMJ (Clinical Research Ed.), 321(7266), 908–909. https://doi.org/10.1136/bmj.321.7266.908 11030659
  • Lewis, B.A., Freebairn, L., Tag, J., Igo, R.P. Jr., Ciesla, A., Iyengar, S.K., … Taylor, H.G. (2019). Differential long-term outcomes for individuals with histories of preschool speech sound disorders. American Journal of Speech-Language Pathology, 28, 1582–1596. doi:10.1044/2019_ajslp-18-0247
  • McGregor, K.K. (2020). How we fail children with developmental language disorder. Language, Speech, and Hearing Services in Schools, 51, 981–992. doi:10.1044/2020_LSHSS-20-00003
  • Mdlalo, T., Joubert, R.W., & Flack, P.S. (2019). The cat on a hot tin roof? Critical considerations in multilingual language assessments. South African Journal of Communication Disorders, 66, 1–7. doi:10.4102/sajcd.v66i1.610
  • Microsoft. (2022). Speech Services. Microsoft Azure. https://azure.microsoft.com/en-us/products/cognitive-services/speechservices/
  • Miller, J.F., Andriacchi, K., & Nockerts, A. (2016). Using language sample analysis to assess spoken language production in adolescents. Language, Speech, and Hearing Services in Schools, 47, 99–112. doi:10.1044/2015_LSHSS-15-0051
  • Munro, N., Baker, E., Masso, S., Carson, L., Lee, T., Wong, A.M.-Y., & Stokes, S.F. (2021). Vocabulary acquisition and usage for late talkers treatment: Effect on expressive vocabulary and phonology. Journal of Speech, Language, and Hearing Research, 64, 2682–2697. doi:10.1044/2021_JSLHR-20-00680
  • Murphy, D., Lyons, R., Carroll, C., Caulfield, M., & de Paor, G. (2018). Communication as a human right: Citizenship, politics and the role of the speech-language pathologist. International Journal of Speech-Language Pathology, 20, 16–20. 10.1080/17549507.2018.1404129
  • Newbury, J., Bartoszewicz Poole, A., & Theys, C. (2020). Current practices of New Zealand speech-language pathologists working with multilingual children. International Journal of Speech-Language Pathology, 22, 571–582. 10.1080/17549507.2020.1712476
  • Olusanya, B.O., Ruben, R.J., & Parving, A. (2006). Reducing the burden of communication disorders in the developing world: An opportunity for the millennium development project. JAMA, 296, 441–444. doi:10.1001/jama.296.4.441
  • Shahin, M., Ahmed, B., Parnandi, A., Karappa, V., McKechnie, J., Ballard, K.J., & Gutierrez-Osuna, R. (2015). Tabby Talks: An automated tool for the assessment of childhood apraxia of speech. Speech Communication, 70, 49–64. doi:10.1016/j.specom.2015.04.002
  • Snow, P.C. (2019). Speech-language pathology and the youth offender: Epidemiological overview and roadmap for future speech-language pathology research and scope of practice. Language, Speech, and Hearing Services in Schools, 50, 324–339. doi:10.1044/2018_LSHSS-CCJS-18-0027
  • St Clair, M. C., Forrest, C. L., Yew, S. G. K., & Gibson, J. L. (2019). Early risk factors and emotional difficulties in children at risk of developmental language disorder: A population cohort study. Journal of Speech, Language, and Hearing Research, 62(8), 2750–2771. https://doi.org/10.1044/2018_JSLHR-L-18-0061
  • Tually, S., Beer, A., & McLoughlin, P. (2011) Housing assistance, social inclusion and people living with a disability. AHURI Final Report No.178. Melbourne: Australian Housing and Urban Research Institute. https://www.ahuri.edu.au/research/final-reports/178
  • United Nations. (2015). Sustainable Development Goals: 17 goals to transform our world. https://www.un.org/sustainabledevelopment/sustainable-development-goals/
  • Wang, Y., Williams, R., Dilley, L., & Houston, D.M. (2020). A meta-analysis of the predictability of LENA™ automated measures for child language development. Developmental Review, 57, 100921. doi:10.1016/j.dr.2020.100921
  • Warren, S.F., Fey, M.E., & Yoder, P.J. (2007). Differential treatment intensity research: A missing link to creating optimally effective communication interventions. Mental Retardation and Developmental Disabilities Research Reviews, 13, 70–77. doi:10.1002/mrdd.20139
  • Zahir, M.Z., Miles, A., Hand, L., & Ward, E.C. (2021). Information and communication technology in schools: Its contribution to equitable speech-language therapy services in an underserved small island developing state. Language, Speech, and Hearing Services in Schools, 52, 644–660. doi:10.1044/2020_LSHSS-20-00100