495
Views
16
CrossRef citations to date
0
Altmetric
Speech Recognition in Adverse Conditions

Speech-in-speech recognition: A training study

Pages 1089-1107 | Received 22 Nov 2010, Accepted 21 Dec 2011, Published online: 30 Apr 2012
 

Abstract

This study aims to identify aspects of speech-in-noise recognition that are susceptible to training, focusing on whether listeners can learn to adapt to target talkers (“tune in”) and learn to better cope with various maskers (“tune out”) after short-term training. Listeners received training on English sentence recognition in speech-shaped noise (SSN), Mandarin babble, or English babble. Results from a speech-in-babble posttest showed evidence of both tuning in and tuning out: (1) listeners were able to take advantage of target talker familiarity; (2) training with babble was more effective than SSN training; and (3) after babble training, listeners improved most in coping with the babble in which they were trained. In general, the results show that processes related both to tuning in to speech targets and tuning out speech maskers can be improved with auditory training.

Acknowledgements

I am deeply grateful to Ann Bradlow for helpful discussions throughout this project, to Chun Chan and Lauren Calandruccio for technical assistance, to Matt Goldrick for advice on statistical analysis, and to Kelsey Mok for help with data collection. This research was supported by NIH-NIDCD Award No. F31DC009516. The content is solely the responsibility of the author and does not necessarily represent the official views of the NIDCD or the NIH.

Notes

1Although the precise definition of informational masking is still under discussion (Kidd, Mason, Richards, Gallun, & Durlach, Citation2007), the term is used here in this broad sense (i.e., nonenergetic masking) to draw the important distinction between interference that occurs in the auditory periphery and interference that occurs at higher levels of auditory and cognitive processing during speech-in-speech listening.

2While energetic masking may also differ across these maskers, Van Engen (Citation2010a) provided evidence for differences in informational masking in cross-language maskers by showing that the relative effects of English and Mandarin maskers differed across listener populations with different experiences with the two languages.

3Since the HINT test was developed from the original set of BKB sentences, some sentences appear in both the HINT and the BKB lists. To eliminate any overlap, 10 sentences from BKB lists 2 and 20 were used to replace items in the training/test lists that also appeared in HINT lists 1 and 2. Replacement sentences were selected to match the number of keywords and the basic sentence structure of the items they replaced.

4Female speakers were used for all targets and babble to eliminate the variable of gender differences in speech-in-speech intelligibility (Brungart, Simpson, Ericson, & Scott, Citation2001).

5In addition to providing evidence for listener adaptation to a wide range of speech signal types, perceptual learning studies have shown that the use of multiple talkers in training can be beneficial to learning. For example, Bradlow and Bent's (2008) comparison of various training conditions for promoting adaptation to foreign-accented English showed that training with multiple talkers of a given accent facilitated the intelligibility of a test talker just as much as training on only that talker. Multiple-talker training has also been shown to be particularly effective for generalized learning in studies of non-native phoneme contrast perception (e.g., Lively, Logan, & Pisoni, Citation1993; Logan, Lively, & Pisoni, Citation1991), lexical tone (Wang, Jongman, & Sereno, Citation2003; Wang, Spence, Jongman, & Sereno, Citation1999), and dialect classification (Clopper & Pisoni, Citation2007).

6This is not to say that experience cannot affect listeners' ability to cope with energetic masking during speech recognition. Indeed, many studies have shown that even highly proficient non-native speakers of a target language perform worse than monolingual, native speakers of the language on speech recognition tasks in speech-shaped noise (e.g., Cooke, Garcia Lecumberri & Barker, Citation2008; Hazan & Simpson, Citation2000; Van Engen, 2010; Van Wijngaarden et al., Citation2002).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 444.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.