462
Views
25
CrossRef citations to date
0
Altmetric
Original Articles

Automatic prediction of intelligible speaking rate for individuals with ALS from speech acoustic and articulatory samples

ORCID Icon, , , ORCID Icon, , , , & ORCID Icon show all
Pages 669-679 | Received 30 Apr 2017, Accepted 28 Jul 2018, Published online: 08 Nov 2018
 

Abstract

Purpose: This research aimed to automatically predict intelligible speaking rate for individuals with Amyotrophic Lateral Sclerosis (ALS) based on speech acoustic and articulatory samples.

Method: Twelve participants with ALS and two normal subjects produced a total of 1831 phrases. NDI Wave system was used to collect tongue and lip movement and acoustic data synchronously. A machine learning algorithm (i.e. support vector machine) was used to predict intelligible speaking rate (speech intelligibility × speaking rate) from acoustic and articulatory features of the recorded samples.

Result: Acoustic, lip movement, and tongue movement information separately, yielded a R2 of 0.652, 0.660, and 0.678 and a Root Mean Squared Error (RMSE) of 41.096, 41.166, and 39.855 words per minute (WPM) between the predicted and actual values, respectively. Combining acoustic, lip and tongue information we obtained the highest R2 (0.712) and the lowest RMSE (37.562 WPM).

Conclusion: The results revealed that our proposed analyses predicted the intelligible speaking rate of the participant with reasonably high accuracy by extracting the acoustic and/or articulatory features from one short speech sample. With further development, the analyses may be well-suited for clinical applications that require automatic speech severity prediction.

Acknowledgement

We would like to thank Dr. Anusha Thomas, Jennifer McGlothlin, Brian Richburg, Kristin Teplansky, Jana Mueller, Saara Raja, Heather Xiao, and the volunteering participants.

Declaration of interest

The authors report no conflicts of interest. The authors alone are responsible for the content and writing of this article.

Additional information

Funding

This work was supported by the National Institutes of Health [R01DC013547, R03DC013990, and K24DC016312] and by the American Speech-Language-Hearing Foundation through a New Century Scholar Research Grant.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.