39
Views
0
CrossRef citations to date
0
Altmetric
Original Articles

Investigation of Automatic Speech Recognition Performance and Mean Opinion Scores for Different Standard Speech and Audio Codecs

, &
Pages 121-129 | Published online: 01 Sep 2014
 

Abstract

Usage of Automatic Speech Recognition (ASR) systems is increasing day-by-day for voice centric applications in mobile handheld and Voice over Internet Protocol (VoIP) devices. The necessity is also increasing to find out the ASR performance under different network impediments. Among them, speech and audio coding standards is the one, which affects the ASR performance greatly, when, using them with different sampling and bit rates in the practical systems. Another common impediment which influences the ASR accuracy is the bit errors in the wireless networks and packet drop conditions in the VoIP networks. ASR performance with some of the speech coding standards under noise conditions for the wireless networks is reported in the literature. However, each study is reporting the ASR performance for few narrowband codecs with different speech databases and different ASR toolkits like RAPHEL, HTK, SPHINX, etc. In this paper, the analysis on ASR performance while using both narrowband and wideband speech and audio coding standards, which are currently accepted for GSM mobile and VoIP networks, using the common speech database-TIMIT, and using ASR toolkit-SPHINX, is presented. The Mean Opinion Score (MOS), which is the generally accepted speech quality measurement technique, is also analyzed for all the speech and audio coding standards, using the same speech database. The results of the studies carried out for the ASR word accuracies and MOS values for different narrowband and wideband speech and audio codecs under no-loss conditions are presented. Results for different rates of packet drop condition which is the common noise scenario in wired networks such as VoIP (which is also merging with wireless networks) are also presented. The observation is that though some of the codecs are showing poor MOS performance at lower bit rates, the corresponding ASR performance is comparable with other codecs at higher bit rates.

Additional information

Notes on contributors

A. V. Ramana

A. V. Ramana received B.E and M.E in Electronics and Communications Engineering from Osmania University in the years 1993 and 1999 respectively. He is currently pursuing his Ph.D in Osmania University. He started his career as a Design Engineer in Advanced Radio Masts Pvt. Ltd in the year 1994, and subsequently worked for ECIL, Analog Devices India Pvt. Ltd, and currently is a Senior Manager at IKANOS Communications India Pvt. Ltd, Bangalore, leading the VoIP software development team. He has wide exposure of various signal processing algorithms including speech, audio and video codecs and programming using different high end DSPs and RISC processors. His areas of interests include Automatic Speech Recognition, Speech, Audio and Video coding algorithms.

E-mail: [email protected]

Laxminarayana Parayitam

Laxminarayana Parayitam received M.E and Ph.D in Electronics and Communication Engineering from Osmania University in the years 1994 and 2000 respectively. In 1995, he joined Research and Training Unit for Navigational Electronics (NERTU), Osmania University as Senior Research Assistant and subsequently became Senior Scientist/Associate Professor. He has established and led a 15-member DSP and Audio Processing Technology Group, at Hyderabad Design Center of Analog Devices Inc, during 2003–2005. His interests include, Automatic Speech Recognition, Face Recognition, Global Navigational Satellite Systems and Bioinformatics. He is a life member of ISTE, Indian Science Congress and Acoustic Society of India, senior member of IEEE and fellow of IETE and Institute of Engineers. He has contributed 35 technical papers in various journals, conferences and technical reports. He has completed 09 sponsored/consultancy projects.

E-mail: [email protected]

Mythili Sharan Pala

Mythili Sharan Pala received M.E. in Electronics and Communication Engineering from Osmania University in the year 2002. In 2002, he joined in Bharath Sanchar Nigam Limited (BSNL) as Junior Telecom Officer. Later in 2008, he joined as senior research fellow in Multimedia and ADSPs laboratory. His research interests include Automatic Speech Recognition, Adaptive Signal Processing and multi rate signal processing.

E-mail: [email protected]

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.