61
Views
2
CrossRef citations to date
0
Altmetric
Articles

VEP Detection for Read, Extempore and Conversation Speech

ORCID Icon &
 

ABSTRACT

In this paper, we propose a novel approach for accurate detection of the vowel end points (VEPs) in any mode of speech. VEP is the instant at which the vowel ends in the speech signal. In this study, we have considered three broad modes of speech, namely; conversation, extempore, and read. The existing methods were explored the VEP detection for read mode of speech, and it may not be appropriate for the VEP detection in extempore and conversation modes. This is due to the acoustic characteristic of read mode is very different from the modes as mentioned earlier. To handle this problem, we proposed a two-stage method for accurately detecting the VEPs, irrespective of modes. At the first stage, vowel onset points (VOPs) are detected in a speech signal using our recent method based on continuous wavelet transform and phone boundary. VOP represents the start of the vowel in the speech signal. At the second stage, phone boundaries are detected using spectral transition measure approach, and then the closest succeeding phone boundary for each detected VOP is considered as detected VEP. Experiments involve TIMIT and Bengali speech corpora. Performance of the proposed VEP detection method is compared with two state-of-the-art signal processing methods. The significance of the proposed method is shown by automatically detecting vowel regions from the TIMIT and Bengali speech corpora. The evaluation results report that the performance of the proposed method is significantly better than the existing methods.

Additional information

Notes on contributors

Kumud Tripathi

Kumud Tripathi received the BTech degree from Department of Computer Science and Engineering, SR Group of Institution, Jhansi, India in 2011 and MTech degree from Department of Information Technology, Indian Institute of Information Technology, Allahabad, India, in 2015. She is currently pursuing the PhD degree in the Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur. She has published papers in 6 international conferences and 2 Journals. Her research interests include speech and signal processing.

K. Sreenivasa Rao

K Sreenivasa Rao received the PhD degree from the Department of Computer Science and Engineering, Indian Institute of Technology (IIT), Chennai, in 2005. He is currently working as a professor in the Department of Computer Science and Engineering, IIT Kharagpur, Kharagpur, India. His research interests are speech, audio and music signal processing, machine learning and big-data analytics. He has published more than 200 articles in reputed international journals and conference proceedings. E-mail: [email protected]

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.