117
Views
0
CrossRef citations to date
0
Altmetric
Articles

Dynamic Pronunciation Modelling for Unsupervised Learning of ASR Systems

, &
 

ABSTRACT

There is a large gap between the capabilities of the human beings and the automatic speech recognition (ASR) systems in recognizing pronunciation variations. ASR systems learn from labelled speech corpus, whereas the humans use “Everyday Speech” for adapting pronunciation variability. Labelling huge speech corpus in real time is impracticable, expensive, and time-consuming. In this paper, we present an algorithm using unsupervised learning techniques for adapting the easily available “Everyday Speech”. The algorithm is implemented using Java. The data sets are extracted from CMUDICT pronunciation directory, TIMIT database, and “The Hindu” daily newspaper. The results have shown a significant improvement in word error rate (WER) measurements over the existing ASR system. The addition of dynamic pronunciation model enables the ASR system to learn from the unlabelled “Everyday Speech” and makes it inexpensive and fast.

DISCLOSURE STATEMENT

No potential conflict of interest was reported by the authors.

Additional information

Notes on contributors

Akella Amarendra Babu

Akella Amarendra Babu received BTech (ECE) degree from JNU and MTech (CSE) degree from IIT Madras, Chennai. He served Indian Army for 23 years as Lt Colonel in Corps of Signals and has 12 years of senior project management experience in corporate IT industry. He has two and half years of research experience on mega defence projects in DLRL, DRDO, and worked as Professor and HOD of CSE department in Engineering Colleges at Hyderabad. He published a few research papers in various national and international conferences and journals. His research interests include speech processing, information security, and telecommunications. He is a Fellow of IETE, member of CSI and IAENG.

E-mail: [email protected]

Ramadevi Yellasiri

R. Yellasiri received BE degree from Osmania University in 1991 and MTech (CSE) degree from JNT University in 1997. She received her PhD degree from Central University, Hyderabad in 2009. She is a Professor in the Chaitanya Bharathi Institute of Technology, Hyderabad. Her research interests include speech and image processing, soft computing, data mining, and bio-informatics. She is a member for IEEE, ISTE, IETE, IAENG, and IE. She has published more than 50 research publications in various national, international conferences, proceedings, and journals.

E-mail: [email protected]

Ananda Rao Akepogu

Ananda Rao Akepogu received BSc (MPC) degree from Silver Jubilee Govt. College, SV University, Andhra Pradesh, India. He received BTech degree in Computer Science & Engineering and MTech degree in AI & Robotics from University of Hyderabad, India. He received PhD from Indian Institute of Technology, Madras, India. He is a Professor of Computer Science & Engineering and Director of IR & P at JNTUA, Anantapur, India. He has published more than hundred research papers in international journals and conferences, and authored three books. His main research interest includes speech processing software engineering and data mining. He received best teachers award from Government of Andhra Pradesh in September 2014.

E-mail: [email protected]

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.