252
Views
11
CrossRef citations to date
0
Altmetric
Original Articles

Voice Pathology Assessment Systems for Dysphonic Patients: Detection, Classification, and Speech Recognition

 

ABSTRACT

In the past decade, much research has been done on automatic detection and classification of vocal fold disorders, and these tasks continue to require further investigation. The aim of this study is to develop systems that may help in diagnosing patients from their speech. The systems will perform voice disorder detection, classification of voice disorders, and digit recognition. To find the best system, we will compare the system performance when using different voice features. We are the first to explore the use relative spectral transform perceptual linear predictive (RASTA-PLP) feature for speech pathology. The speech samples used in most of the literature are sustained vowels, while the speech samples we worked on are words, which are more natural. To evaluate the performance of the developed system, we used a database containing five types of vocal fold disorders. The database includes a total of 142 speakers half of them were normal speakers. The best accuracy achieved for the voice disorder detection system was 92.40%. In the voice disorder classification system, the maximum obtained recognition rate by using words was 73%. For the digit recognition system, a recognition rate of 98.57% was obtained. PLP and RASTA_PLP showed better performance in the developed pathology assessment systems.

ACKNOWLEDGEMENTS

The database was provided by the chair of Communication and Swallowing Disorders Unit (CSDU), ENT Department, King Abdulaziz University Hospital, Riyadh, Saudi Arabia. The author is thankful for this cooperation. Dr Tamer Mesallam from the chair of Communication and Swallowing Disorders Unit gave some valuable suggestions to improve the paper. The author is also grateful for these suggestions.

Additional information

Funding

This research was supported by NSTIP strategic technologies program [grant number 12-MED2474-02] in the Kingdom of Saudi Arabia. The authors are thankful for this support.

Notes on contributors

Mansour Alsulaiman

Mansour Alsulaiman, PhD, is associate professor in Department of Computer Engineering at King Saud University, Riyadh, Saudi Arabia. He obtained his PhD degree from Iowa State University, USA in 1987. Since 1988, he is associated with computer engineering department, King Saud University. He is editor-in-chief of King Saud University Journal – Computer and Information Systems. His research areas include Automatic Speech/Speaker Recognition, Automatic Voice Pathology Assessment Systems, Computer-aided Pronunciation Training System, and Robotics.

E-mail: [email protected]

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.