Refinement of HMM Model Parameters for Punjabi Automatic Speech Recognition (PASR) System

Virender KadyanDepartment of Computer Science & Engineering, Chitkara University Institute of Engineering & Technology, Chitkara University, Rajpura, Punjab, IndiaView further author information

Archana MantriDepartment of Electronics & Communication Engineering, Chitkara University Institute of Engineering & Technology, Chitkara University, Rajpura, Punjab, IndiaView further author information

R. K. AggarwalDepartment of Computer Engineering, NIT, Kurukshetra, IndiaView further author information

ABSTRACT

An automatic speech recognition system follows an approach of pattern matching, which consists of a training phase and testing phase. Despite advancement in training phase, the performance of the acoustic model is adverse while adopting the statistical technique like hidden Markov model (HMM). However, HMM-based speech system faces high computational complexity and becomes challenging to provide accuracy during isolated Punjabi lexicon. As the corpus of the system increases, the complexity of training phase will also increase drastically. The redundancy and confusion occurred between feature distributions in training phase of the system. This paper proposes an approach for the generation of HMM parameters using two hybrid classifiers such as GA+HMM and DE+HMM. The proposed technique focuses on refinement of processed feature vectors after calculating its mean and variance. The refined parameters are further employed in the generation of HMM parameters that help in reduction of training complexity of the system. The proposed techniques are compared with an existing technique such as HMM on benchmark database and self-developed corpus in clean, noisy, and real-time environments. The results show the performance improvement in pattern matching of spoken utterance when demonstrated on large vocabulary isolated Punjabi lexicons.

KEYWORDS:

ACKNOWLEDGEMENTS

This work is a part of the partially funded project in the Punjabi language supported by IEEE Sight. The Punjabi speech corpus and its transcription are verified by Assistant Professor Dr Amitoj Singh of MRSPTU, Bathinda, Punjab. Special thanks to TDIL for providing sample speech corpus as a benchmark for collecting Punjabi Speech Corpus in a different dialect and my Speech and Multimodal Laboratory members Ms Nancy and Ms Ashima.

DISCLOSURE STATEMENT

No potential conflict of interest was reported by the authors.

Additional information

Funding

IEEE Sight.

Notes on contributors

Virender Kadyan

Virender Kadyan is a PhD Scholar in the Department of Computer Science & Engineering at Chitkara University Institute of Engineering & Technology, Chitkara University, Punjab, India. He acts as a project lead in the Speech and Multimodal Laboratory on Sign language, Speech Signal Processing and Natural Language Processing. Over his teaching experience, he has taught the courses on data structure, theory of computation, introduction to Linux and natural language processing to graduate and post graduate students. He has been the member of a research project at national level and technical program committee member at International conference. He is a co-principal investigator on IEEE Project for development of Punjabi ASR system. His research interest includes speech analysis, recognition, synthesis, and pattern matching. He has filled two patents in technical field.

E-mail: [email protected]

Archana Mantri

Archana Mantri is PhD in electronics and communication engineering with 28 years of experience in research, development, training, academics and administration of institutes of higher technical education. Currently, she is working as a professor and pro-vice chancellor in Chitkara University Punjab India. Her areas of expertise are project management, problem and project based learning, curriculum design and development, pedagogical innovation and management. She is on board of international experts in Indo-Universal Collaboration of Engineering Education (IUCEE) and advises in the areas of pedagogical innovations. Currently, she is supervising several PhD Scholars in the areas of virtual and augmented reality, and speech recognition. She is a senior member of IEEE.

E-mail: [email protected]

R. K. Aggarwal

R. K. Aggarwal received his PhD degree in 2014 and MTech degree in 2006 from National Institute of Technology, Kurukshetra, India. Currently, he is working as an associate professor and head in the Department of Computer Engineering of the same Institute. He has published more than 24 research papers in various international/national journals and conferences, and also worked as an active reviewer in many of them. He has delivered several invited talks, keynote addresses, and also chaired the sessions in reputed conferences. His research interests include speech processing, soft computing, statistical modeling, and science and spirituality. He is a life member of Computer Society of India (CSI) and Indian Society for Technical Education (ISTE). He has been involved in various academic, administrative, and social affairs of many organizations having more than 26 years of experience in this field.

E-mail: [email protected]

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.