252
Views
11
CrossRef citations to date
0
Altmetric
Original Articles

Voice Pathology Assessment Systems for Dysphonic Patients: Detection, Classification, and Speech Recognition

Pages 156-167 | Published online: 19 Jun 2014
 

ABSTRACT

In the past decade, much research has been done on automatic detection and classification of vocal fold disorders, and these tasks continue to require further investigation. The aim of this study is to develop systems that may help in diagnosing patients from their speech. The systems will perform voice disorder detection, classification of voice disorders, and digit recognition. To find the best system, we will compare the system performance when using different voice features. We are the first to explore the use relative spectral transform perceptual linear predictive (RASTA-PLP) feature for speech pathology. The speech samples used in most of the literature are sustained vowels, while the speech samples we worked on are words, which are more natural. To evaluate the performance of the developed system, we used a database containing five types of vocal fold disorders. The database includes a total of 142 speakers half of them were normal speakers. The best accuracy achieved for the voice disorder detection system was 92.40%. In the voice disorder classification system, the maximum obtained recognition rate by using words was 73%. For the digit recognition system, a recognition rate of 98.57% was obtained. PLP and RASTA_PLP showed better performance in the developed pathology assessment systems.

ACKNOWLEDGEMENTS

The database was provided by the chair of Communication and Swallowing Disorders Unit (CSDU), ENT Department, King Abdulaziz University Hospital, Riyadh, Saudi Arabia. The author is thankful for this cooperation. Dr Tamer Mesallam from the chair of Communication and Swallowing Disorders Unit gave some valuable suggestions to improve the paper. The author is also grateful for these suggestions.

Additional information

Funding

This research was supported by NSTIP strategic technologies program [grant number 12-MED2474-02] in the Kingdom of Saudi Arabia. The authors are thankful for this support.

Notes on contributors

Mansour Alsulaiman

Mansour Alsulaiman, PhD, is associate professor in Department of Computer Engineering at King Saud University, Riyadh, Saudi Arabia. He obtained his PhD degree from Iowa State University, USA in 1987. Since 1988, he is associated with computer engineering department, King Saud University. He is editor-in-chief of King Saud University Journal – Computer and Information Systems. His research areas include Automatic Speech/Speaker Recognition, Automatic Voice Pathology Assessment Systems, Computer-aided Pronunciation Training System, and Robotics.

E-mail: [email protected]

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 100.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.