214
Views
9
CrossRef citations to date
0
Altmetric
Original Article

Effect of linear and warped spectral transposition on consonant identification by normal-hearing listeners with a simulated dead region

, &
Pages 420-433 | Received 19 Jun 2009, Accepted 17 Nov 2009, Published online: 24 Feb 2010
 

Abstract

Abstract

We investigated the potential benefits for consonant identification of a form of frequency transposition intended for people with severe or profound hearing loss at high frequencies, but near-normal hearing at low frequencies. Frequency components from a ‘source band’ in a high-frequency region were transposed downwards to a ‘destination band’. All experiments used normal-hearing listeners. Experiment 1 showed that (untransposed) source bands centred near 4 kHz yielded highest identification scores. Also, performance was better when the source band was wider. Experiment 2 used transposition with the ‘best’ source bands from experiment 1, and showed that superimposing the transposed components on the components in the destination band gave better results than replacing the latter by the former. Experiment 3 assessed the effects of focused training, using conditions without and with transposition. Significant improvements occurred with training, but overall performance following training was similar for all conditions. However, transposition reduced some frequent errors.

Sumario

Investigamos los beneficios potenciales para la identificación de consonantes con una forma de transposición de frecuencias en personas con pérdidas auditivas severas o profundas en las frecuencias agudas pero con audición casi normal en las frecuentas graves. Los componentes de frecuencia de “bandas de origen” en la región de frecuencias agudas fueron transpuestas hacia abajo, a una “banda de destino”. En todos los experimentos participaron personas con audición normal. El experimento 1 mostró que las bandas de origen (no transpuestas) centradas cerca de 4 kHz, arrojaron las más altas puntuaciones de identificación. También fue mayor el rendimiento cuando la banda de origen fue más amplia. En el experimento 2 usamos una transposición con las “mejores” bandas de origen del experimento 1 que mostraron que, sobreponiendo los componentes transpuestos en los de la banda de destino, habían mejores resultados que reemplazando los últimos por los primeros. En el experimento 3 se evaluaron los efectos de un entrenamiento dirigido, usando condiciones con y sin transposición. Se apreció una mejoría significativa con entrenamiento, pero el rendimiento global después del entrenamiento fue similar en todas las condiciones. No obstante la transposición reduce algunos errores frecuentes.

Acknowledgements

Some of these results were presented at the British Society of Audiology Short Papers Meetings on Experimental Studies of Hearing and Deafness, York, UK (18th–19th September 2008), and the International Hearing Aid Research Conference, Lake Tahoe, USA (13th–17th August 2008). This work was supported by the MRC (UK). CF was also supported by a Marie-Curie Intra-European Fellowship, and Wolfson College Junior Research Fellowship (Cambridge, UK).

Declaration of interest: The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the paper.

Appendix

The purpose of frequency warping was to obtain a spectral representation in which the number of components in the FEW band was equal to that in the reference band, so that transposition could be performed simply by moving the components from the former to the latter. The warping was performed on a frame-by-frame basis during the overlap-add procedure, by transforming the windowed analysis frame before performing the FFT. Unwarping was then performed after the inverse FFT and before windowing.

Smith and Abel (Citation1999) have shown that the Hertz-to-ERBN-number relationship can be roughly approximated by a bilinear transformation of the digital-signal-processing variable, z, given by

short-legend(1)

where f is the frequency variable, fs is the sampling frequency (16 kHz in this study), and j is the square root of −1. A bilinear transform defines a new variable, z′, given by

short-legend(2)

where the parameter α controls the shape of the transformation. maps the frequency axis (0 to fs) onto a unit circle on the complex plane, with 0 (and fs) at angle 0, the Nyquist frequency (fs/2) at angle π radians, and intermediate frequencies uniformly distributed around the unit circle. The complex-conjugate symmetry of the spectrum around the Nyquist frequency maps to a complex-conjugate symmetry across the real axis. then maps the unit circle onto itself, with the same frequencies at angles 0 and π, but with frequencies between 0 and the Nyquist frequency non-uniformly spaced around the top half of the unit circle, and with the complex-conjugate symmetry across the real axis maintained. With appropriate choice of the value of α, the distribution of frequencies along the semicircle can be made roughly constant in ERBN-number increments. It is easy to show, using algebra, that z′ can be obtained from z by reversing their positions in Eq. (2) and reversing the sign of α; thus, the transformation is reversible.

Kates and Arehart (Citation2005) showed that this transformation, performed by a modified delay line, can be used to provide an efficient ERBN-based analysis at the input stage of a multi-channel digital hearing aid in which the separation into channels is perceptually relevant, providing a dramatic decrease in the time delay produced by the hearing aid, which is important for avoiding deleterious effects on speech production (Stone & Moore, Citation2002; Citation2005) and speech perception (Stone & Moore, Citation2003; Stone et al, Citation2008). For the present study, this type of modified delay line was used to perform an approximation to the Hertz-to-ERBN-number transformation and another delay line was used to perform the inverse of this operation. In the following, we review the delay-line modification used in this study and provide a heuristic explanation of why it works.

For any signal-processing procedure that involves frame-by-frame processing of input data, the procedure for initial processing of the input signal can be conceptualized as involving frame-length delay lines. A delay line is a one-dimensional array of data registers, which for convenience we will describe as being horizontally aligned. The main function of each register is to pass its contents to its neighbour on the left and, at the same time, to accept new contents being fed to it from the right, thus propagating waveform samples. The connections between successive registers are delay elements. For each sample interval, a new digitised sample arrives on the right, all samples shift one place to the left, and the oldest sample ‘falls off’ the left end. If the length of the delay line corresponds to the number of samples in a frame, the delay line contains one frame of data, represented as a time function with time increasing from left to right. The delay line thus resembles an ideal transmission line, through which a signal propagates with all components travelling at the same velocity.

In digital-signal-processing terms, the delay operation is represented by the frequency-domain transfer function

short-legend(3)

where x(f) is the spectrum of a signal, y(f) is the spectrum of the delayed signal, and z is defined in . This transfer function has an allpass frequency response with a linear phase characteristic. A pure delay is represented by the box in the centre of , and its phase characteristic is shown as the diagonal long-dashed line in . The phase lag increases linearly with frequency, and reaches π at the Nyquist frequency (fs/2), which is 8 kHz in this study.

Figure A1. Schematic diagram of an element of the warping delay line

Figure A1. Schematic diagram of an element of the warping delay line

Figure A2. Phase characteristics in radians (referred to the left ordinate) for an element of the normal-delay line (long-dashed line), warping delay line (solid curve), and unwarping delay line (dotted curve). The short-dashed line (referred to the right ordinate) shows frequency in ERBN-number, expressed as a function of frequency in kHz. Light grey bars on the right hand side of the figure indicate the frequency ranges of the reference and FEW3 bands expressed in ERBN-number. Light grey bars on the left indicate the phase lags associated with the warping delay line for the frequency ranges of these bands. See text for description of arrows.

Figure A2. Phase characteristics in radians (referred to the left ordinate) for an element of the normal-delay line (long-dashed line), warping delay line (solid curve), and unwarping delay line (dotted curve). The short-dashed line (referred to the right ordinate) shows frequency in ERBN-number, expressed as a function of frequency in kHz. Light grey bars on the right hand side of the figure indicate the frequency ranges of the reference and FEW3 bands expressed in ERBN-number. Light grey bars on the left indicate the phase lags associated with the warping delay line for the frequency ranges of these bands. See text for description of arrows.

For frequency warping, the delay element is replaced by the first-order allpass filter shown schematically in . The filter is characterized by the coefficient, α, and its transfer function is given by

short-legend(4)

For the special case where α = 0, the filter becomes a simple delay element. Comparison with reveals that y(f)/x(f) = z′−1, where z′ is the bilinear transform of z.

The filter’s phase characteristic represents the phase delay per section of the delay line for a sinusoidal signal at any frequency as it propagates through the delay line. The phase characteristic for α = 0.46, the value used in this study, is indicated by the solid curve above the diagonal in Figure A2. Compared to the normal delay line, the warping delay line has a greater phase lag at all frequencies except 0 and 8 kHz. Thus, a continuous tone at some intermediate frequency would have a greater phase lag and would exhibit more cycles on the warping delay line than it would on a normal delay line. In comparison to the normal delay line, the warping delay line makes the tone appear to have a higher frequency; in particular, the phase ϕ(f) at a given f corresponds to an apparent frequency (ϕ(f)/2π)×fs. This is shown graphically in Figure A2, where a tone at 0.75 kHz (indicated by the upward-pointing grey arrow) has the same phase lag per section as a tone at 1.9425 kHz in a normal delay line (indicated by the right-pointing arrow). Thus, in an FFT analysis of the contents of the modified delay line when the input is a continuous 0.75-kHz tone, there would appear to be a single component at 1.9425 kHz.

The modified delay line’s property of changing the apparent frequency of sinusoidal components accounts for its frequency-warping characteristic. As shown by the above explanation, the ordinate could be relabelled as ‘apparent frequency’ instead of phase lag, with values going linearly from 0 to 8 kHz. For the modified delay line, equal steps along this vertical axis would correspond to small steps on the horizontal axis for small frequency values, and much larger steps along the horizontal axis for large frequency values. This property qualitatively resembles the curve relating ERBN-number to Hertz; a unit step in ERBN-number at low values represents a much smaller step in Hertz than a unit step in ERBN-number at high values.

The above shows how a sinusoid at one frequency in the input appears as a sinusoid at a different frequency when represented on the modified delay line. Because the operation performed by each section is linear, the apparent frequency transformation also holds for each component of a complex signal. If the input signal is limited in both time and frequency, as it must be to satisfy the conditions for using the FFT, then the modified delay line contains a full transformed representation of the signal, so long as the delay line is long enough that no significant values ‘fall off the end’. However, this requires that the delay line be longer than the unmodified frame-length delay line. This is because some components propagate through the delay line at rates faster than one place per sample interval.

For any filter, the slope of the phase characteristic at any frequency provides a measure of the time delay of the signal component at that frequency. For the simple delay element, the phase characteristic is linear, so all signal components have the same delay. However, for the warping filter shown in Figure A2 (with α = 0.46), the delay is longer for frequencies below about 2.813 kHz and shorter for frequencies above this. This means that low frequencies propagate more slowly and high frequencies propagate more quickly through the warping delay line than through the normal one, even though all are ‘warped’ to higher frequencies. To minimize the loss of high-frequency information, the warping delay line should be longer than one frame length. At the highest frequencies, the slope ratio is about 2.7. The slope ratio is about 2 at 5 kHz. Thus, the length of the delay line for warping should be at least twice the frame length and preferably greater, and the FFT size should match the delay-line length. In this study, however, the delay line was only one-frame long, so some processing artefacts were generated.

The unwarping transformation involves the same methods as warping, but the coefficient, α, is replaced by its negative. The phase characteristic for the unwarping filter with α = −0.46 is given by the lower (dotted) curve in Figure A2. The phase lag is smaller than for the simple delay element at all frequencies between 0 and the Nyquist frequency. The left-pointing arrow shows that the phase response to a tone at 1.9425 kHz (indicated by the downward-pointing grey arrow) is the same as that for a 0.75-kHz tone in a normal delay line. This exemplifies the fact that the transformation reverses the warping obtained with α = 0.46.

The slope of the unwarping phase characteristic is smaller than for the unit delay at low frequencies and higher at high frequencies. Thus, components that propagate rapidly during warping propagate slowly during unwarping, and vice versa. The propagation velocities during warping and unwarping are matched, so the synchrony between (untransposed) components that is lost during warping can be regained during unwarping. As long as no information falls off the end of the delay line, the reconstruction can be virtually perfect. An extreme example is provided by a unit pulse. If this were presented at the input to a long warping delay line, it would be transformed into a low-to-high-frequency chirp on the delay line. However, if the contents of the warping delay line were then used as input to the unwarping delay line, the pulse could be reconstructed. The unwarping delay line need not be longer than the original frame length. Transposition would, of course, interfere with the reconstruction process. Components transposed to lower frequencies propagate faster than they would if they were not transposed, so these components could fall off the end of the frame-length delay line. Although this would cause some loss of information, it should not produce large artefacts, because components on the delay line are windowed before the overlap-add operation.

Finally, the short-dashed curve in Figure A2 shows the actual Hertz-to-ERBN-number transformation, with ERBN-number indicated on the right ordinate. It is evident that the phase-warping filter with α = 0.46 does not provide a very accurate approximation to this curve. A better fit can be achieved with a different choice of warping coefficient. However, the pattern obtained with α = 0.46 has the desirable property that the distance along the ordinate between the intercepts for 3.318 and 5.219 kHz (the boundaries for the FEW3 band) is about the same as that between the intercepts for 0.75 and 1.275 kHz (the boundaries for the reference band). The bands are indicated by thick vertical bars in ERBN-number units on the right abscissa and in phase units on the left abscissa. The bars for the FEW3 band and the reference band have the same length on either scale. Thus, the FFT of the transformed signal for an analysis frame contains the same number of components in these two bands. As a result, transposition can be performed using the same method as for an FHW band in an unwarped analysis frame.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.