8
Views
0
CrossRef citations to date
0
Altmetric
Original Articles

Low Complexity Scalable Perceptual Audio Coder using an Optimum Wavelet Packet basis Representation and Vector Quantization

&
Pages 399-407 | Published online: 04 Jan 2016
 

Abstract

In this paper, we describe a high quality low complexity scalable audio coding scheme, using an optimum wavelet packet (WP) basis signal representation based on the time varying characteristics of the audio signal. In ISO/MPEG audio coding standards [1–3], resolution of decomposition filterbank (uniform) does not match with the resolution of psychoacoustic model (which requires more resolution and needs to be matched with the critical bands (non uniform) of cochlea). Hence MPEG coder uses a separate high resolution decomposition filterbank for psychoacoustic model implementation, which increases the computational load of the coder. Here, we use a wavelet packet decomposition structure closely matching to the critical bands [4,5] of human auditory system, to transform the data into wavelet domain and then these wavelet packet coefficients are used to drive the psychoacoustic model directly. Hence, psychoacoustic model design is integrated with the design of decomposition filterbank. Other features of the proposed coder are scalability (can support three standard industrial sampling frequencies 11.025 kHz, 22.050 kHz and 44.1 kHz) and optimum wavelet basis selection from a predefined library of wavelet bases, by extracting seven statistical features of the audio signal to be encoded. A new Vector Quantization (VQ) scheme is also proposed here, in which the length of the code book can be varied in accordance with the psychoacoustic model requirement. Experimental results show that the proposed coder yields almost transparent quality with compression ratios in the range of e to 10.

Additional information

Notes on contributors

P S Sathidevi

P S Sathidevi received BTech degree in Electronics Engineering from Regional Engineering College, Calicut (Calicut University) in 1985, MTech degree in Electronics from Cochin University of Science and Technology, Cochin in 1987 and PhD from Regional Engineering College, Calicut (Calicut University) in 2003 in the field of Speech and Audio Processing.

She has been working as lecturer in National Institute of Technology Calicut (formerly REC Calicut) from 1990 to 2003 and as Asst Professor from 2003 onwards. Her current interests include Perceptual Audio Coding, Speech Coding, Image Coding, Cryptography, Wavelet based speech enhancement techniques for the hearing impaired and Computational Auditory Scene Analysis (CASA).

Y Venkataramani

Y Venkataramani received BTech from IIT Madras in 1967, MTech from IIT Madras in 1972 and PhD from IIT, Kanpur in 1983. Worked as Lecturer in REC, Calicut from 1967 to 1983 and as Professor in REC, Calicut from 1983 to 2001. Presently working as Principal, Saranathan College of Engineering, Panjappur, Tiruchirapalli.

He has published a book titled Linear Integrated Circuits and Applications—published by ISTE, New Delhi. His areas of interest are Speech Processing, Image Processing, Wavelets and Data Communication.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.