ABSTRACT
Background
Large-shared databases and automated language analyses allow for the application of new data analysis techniques that can shed new light on the connected speech of people with aphasia (PWA).
Aims
To identify coherent clusters of PWA based on language output using unsupervised statistical algorithms and to identify features that are most strongly associated with those clusters.
Methods & Procedures
Clustering and classification methods were applied to language production data from 168 PWA. Language samples were from a standard discourse protocol tapping four genres: free speech personal narratives, picture descriptions, Cinderella storytelling, and procedural discourse.
Outcomes & Results
Seven distinct clusters of PWA were identified by the K-means algorithm. Using the random forest algorithm, a classification tree was proposed and validated, showing 91% agreement with the cluster assignments. This representative tree used only two variables to divide the data into distinct groups: total words from free speech tasks and total closed-class words from the Cinderella storytelling task.
Conclusion
Connected speech data can be used to distinguish PWA into coherent groups, providing insight into traditional aphasia classifications, factors that may guide discourse research and clinical work.
Acknowledgments
Open Access funding provided by the Qatar National Library.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Supplementary material
Supplemental data for this article can be accessed here.
Notes
1. Participant-related data are password protected and restricted to members of the AphasiaBank consortium group. Licensed SLPs, educators, and researchers who would like access can send an email request to Brian MacWhinney ([email protected]) with contact information, affiliation, and a brief general statement about how they envision using the resources.