314
Views
1
CrossRef citations to date
0
Altmetric
Review Article

Review of Time–Frequency Masking Approach for Improving Speech Intelligibility in Noise

ORCID Icon

References

  • R. Lippmann, “Speech recognition by machines and humans,” Speech. Commun., Vol. 22, pp. 1–15, 1997.
  • J. Sroka, and L. Braida, “Human and machine consonant recognition,” Speech. Commun., Vol. 45, pp. 401–423, 2005.
  • O. Scharenborg, “Reaching over the gap: A review of efforts to link human and automatic speech recognition research,” Speech. Commun., Vol. 49, pp. 336–347, 2007.
  • K. Wagener, and T. Brand, “Sentence intelligibility in noise for listeners with normal hearing and hearing impairment: influence of measurement procedure and masking parameters La inteligibilidad de frases en silencio para sujetos con audición normal y con hipoacusia: la influencia del procedimiento de medición y de los parámetros de enmascaramiento,” Int. J. Audiol., Vol. 44, no. 3, pp. 144–156, 2005.
  • A. Bronkhorst, and R. Plomp, “Binaural speech intelligibility in noise for hearing-impaired listeners,” J. Acoust. Soc. Am., Vol. 86, no. 4, pp. 1374–1383, 1989.
  • B. Wilson, and M. Dorman, “Cochlear implants: A remarkable past and a brilliant future,” Hear. Res., Vol. 242, pp. 3–21, 2008.
  • Y. Hu, and P. Loizou, “A new sound coding strategy for suppressing noise in cochlear implants,” J. Acoust. Soc. Am., Vol. 124, no. 1, pp. 498–509, 2008.
  • P. Loizou. Speech enhancement: Theory and practice. 2nd ed. Boca Raton, FL: CRC Press, 2013.
  • M. Parchami, W. Zhu, B. Champagne, et al., “Recent developments in speech enhancement in the short-time Fourier transform domain,” IEEE Circuits Syst. Mag., Vol. 16, no. 3, pp. 45–77, 2016.
  • J. Benesty, S. Makino, and J. Chen. Speech enhancement. Berlin: Springer, 2005.
  • Y. Hu, and P. Loizou, “A comparative intelligibility study of single-microphone noise reduction algorithms,” J. Acoust. Soc. Am., Vol. 122, no. 3, pp. 1777–1786, 2007.
  • S. Cao, L. Li, and X. Wu, “Improvement of intelligibility of ideal binary-masked noisy speech by adding background noise,” J. Acoust. Soc. Am., Vol. 129, no. 4, pp. 2227–2236, 2011.
  • M. Ahmadi, V. Gross, and D. Sinex, “Perceptual learning for speech in noise after application of binary time-frequency masks,” J. Acoust. Soc. Am., Vol. 133, no. 3, pp. 1687–1692, 2013.
  • D. Brungart, P. Chang, B. Simpson, et al., “Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation,” J. Acoust. Soc. Am., Vol. 120, pp. 4007–4018, 2006.
  • N. Li, and P. Loizou, “Factors influencing intelligibility of ideal binary-masked speech: Implications for noise reduction,” J. Acoust. Soc. Am., Vol. 123, pp. 1673–1682, 2008.
  • D. Wang, U. Kjems, M. Pedersen, et al., “Speech intelligibility in background noise with ideal binary time-frequency masking,” J. Acoust. Soc. Am., Vol. 125, no. 4, pp. 2336–2347, 2009.
  • A. Kressner, A. Westermann, and J. Buchholz, “Cochlear implant speech intelligibility outcomes with structured and unstructured binary mask errors,” J. Acoust. Soc. Am., Vol. 139, no. 2, pp. 800–810, 2016.
  • R. Koning, N. Madhu, and J. Wouters, “Ideal time-frequency masking algorithms lead to different speech intelligibility and quality in normal-hearing and cochlear implant listeners,” IEEE Trans. Biomedical Engineering, Vol. 62, no. 1, pp. 331–341, 2015.
  • F. Chen, “Representing the intelligibility advantage of ideal binary masking with the most energetic channels,” J. Acoust. Soc. Am., Vol. 140, no. 6, pp. 4161–4169, 2016.
  • D. Wang, and G. Brown. Computational auditory scene analysis: Principles, algorithms, and applications. Piscataway, NJ: Wiley-IEEE Press, 2006.
  • S. Srinivasan, N. Roman, and D. Wang, “Binary and ratio time-frequency masks for robust speech recognition,” Speech. Commun., Vol. 48, pp. 1486–1501, 2006.
  • J. van Hout, and A. Alwan, “A novel approach to soft-mask estimation and log-spectral enhancement for robust spectral recognition,” in Proceedings of ICASSP, 2012, pp. 4105–4108.
  • Y. Wang, A. Narayanan, and D. Wang, “On training targets for supervised speech separation,” IEEE Trans. Audio, Speech, and Language Processing, Vol. 22, no. 12, pp. 1849–1858, 2014.
  • G. Kim, Y. Lu, Y. Hu, et al., “An algorithm that improves speech intelligibility in noise for normal-hearing listeners,” J. Acoust. Soc. Am., Vol. 126, no. 3, pp. 1486–1494, 2009.
  • G. Kim, and P. Loizou, “Improving speech intelligibility in noise using environment-optimized algorithms,” IEEE Trans. Audio, Speech, and Language Processing, Vol. 18, no. 8, pp. 2080–2090, 2010.
  • Y. Hu, and P. Loizou, “Environment-specific noise suppression for improved speech intelligibility by cochlear implant users,” J. Acoust. Soc. Am., Vol. 127, no. 6, pp. 3689–3695, 2010.
  • K. Han, and D. Wang, “A classification based approach to speech segregation,” J. Acoust. Soc. Am., Vol. 132, no. 5, pp. 3475–3483, 2012.
  • Y. Wang, K. Han, and D. Wang, “Exploring monaural features for classification-based speech segregation,” IEEE Trans. Audio, Speech, and Language Processing, Vol. 21, no. 2, pp. 270–279, 2013.
  • Y. Wang, and D. Wang, “Towards scaling up classification-based speech separation,” IEEE Trans. Audio, Speech, and Language Processing, Vol. 21, no. 7, pp. 1381–1390, 2013.
  • E. Healy, S. Yoho, Y. Wang, et al., “An algorithm to improve speech recognition in noise for hearing-impaired listeners,” J. Acoust. Soc. Am., Vol. 134, no. 4, pp. 3029–3038, 2013.
  • G. Kim, “Binary mask estimation for noise reduction based on instantaneous SNR estimation using Bayes risk minimisation,” Electron. Lett., Vol. 51, no. 6, pp. 526–528, 2015.
  • E. Healy, S. Yoho, J. Chen, et al., “An algorithm to increase speech intelligibility for hearing-impaired listeners in novel segments of the same noise type,” J. Acoust. Soc. Am., Vol. 138, no. 3, pp. 1660–1669, 2015.
  • Y. Wang, and D. Wang, “A deep neural network for time-domain speech reconstruction,” in Proceedings of ICASSP, 2015, pp. 4390–4394.
  • J. Chen, Y. Wang, S. Yoho, et al., “Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises,” J. Acoust. Soc. Am., Vol. 139, no. 5, pp. 2604–2612, 2016.
  • M. Kolbaek, Z.-H. Tan, and J. Jensen, “Speech intelligibility potential of general and specialized deep neural network based speech enhancement systems,” IEEE/ACM Trans. on Audio, Speech, and Language Processing, Vol. 25, no. 1, pp. 153–167, 2017.
  • M. Soni, N. Shah, and H. Patil, “Time-frequency masking-based speech enhancement using generative adversarial network,” in Proceedings of ICASSP, 2018, pp. 5039–5043.
  • J. Yao, and A. Al-Dahle, “Coarse-to-fine optimization for speech enhancement,” in Proceedings of Interspeech, 2019, pp. 2743–2747.
  • H. Choi, J. Kim, J. Huh, et al., “Phase-aware speech enhancement with deep complex U-Net,” in Proceedings of ICLR, 2019.
  • G. Lee, and H. Kim, “Multi-task learning U-Net for single-channel speech enhancement and mask-based voice activity detection,” Applied Sciences, Vol. 10, no. 9, pp. 1–15, 2020.
  • R. Drullman, “Speech intelligibility in noise: relative contribution of speech elements above and below the noise level,” J. Acoust. Soc. Am., Vol. 98, pp. 1796–1798, 1995.
  • A. Zolnay, R. Schluter, and H. Ney, “Acoustic feature combination for robust speech recognition,” in Proceedings of ICASSP, 2005, pp. 457–460.
  • A. Lawson, P. Vabishchevich, M. Huggins, et al., “Survey and evaluation of acoustic features for speaker recognition,” in Proceedings of ICASSP, 2011, pp. 5444–5447.
  • R. Das, and S. Prasanna, “Speaker verification from short utterance perspective: a review,” IETE Tech. Rev., Vol. 35, no. 6, pp. 599–617, 2018.
  • N. Adiga, and S. Prasanna, “Acoustic features modelling for statistical parametric speech synthesis: a review,” IETE Tech. Rev., Vol. 36, no. 2, pp. 130–149, 2019.
  • B. Kollmeier, and R. Koch, “Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction,” J. Acoust. Soc. Am., Vol. 95, pp. 1593–1602, 1994.
  • J. Tchorz, and B. Kollmeier, “SNR estimation based on amplitude modulation analysis with applications to noise suppression,” IEEE Trans. Speech, Audio Process., Vol. 11, pp. 184–192, 2003.
  • H. Hermansky, “Perceptual linear predictive (PLP) analysis of speech,” J. Acoust. Soc. Am., Vol. 87, no. 4, pp. 1738–1752, 1990.
  • H. Hermansky, and N. Morgan, “RASTA processing of speech,” IEEE Trans. Speech Audio Process., Vol. 2, no. 4, pp. 578–589, 1994.
  • R. Patterson, I. Nimmo-Smith, J. Holdsworth, et al., “An efficient auditory filterbank based on the gammatone function,” Meeting of the IOC Speech Group on Auditory Modeling at RSRE, Vol. 2, no. 7, 1987.
  • R. Patterson, K. Robinson, J. Holdsworth, et al., “Complex sounds and auditory images,” Auditory Physiol. Percept., Vol. 83, pp. 429–446, 1992.
  • Y. Shao, and D. Wang, “Robust speaker identification using auditory features and computational auditory scene analysis,” in Proceedings of ICASSP, 2008, pp. 1589–1592.
  • S. Davis, and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Trans Acoustics, Speech Signal Process, ASSP-28, Vol. 28, no. 4, pp. 357–366, 1980.
  • L. Meier, S. Van De Geer, and P. Bühlmann, “The group Lasso for logistic regression,” J. R. Statist. Soc.: Ser. B, Vol. 70, no. 1, pp. 53–71, 2008.
  • E. Rothauser, W. Chapman, N. Guttman, et al., “IEEE recommended practice for speech quality measurements,” IEEE Trans. Audio Electroacoust., Vol. 17, pp. 225–246, 1969.
  • M. Nilsson, S. Soli, and J. Sullivan, “Development of the hearing In noise test for the measurement of speech reception thresholds in quiet and in noise,” J. Acoust. Soc. Am., Vol. 95, no. 2, pp. 1085–1099, 1994.
  • J. Festen, and R. Plomp, “Speech-reception threshold in noise with one and two hearing aids,” J. Acoust. Soc. Am., Vol. 79, no. 2, pp. 465–471, 1986.
  • C. Smits, and J. Festen, “The interpretation of speech reception threshold data in normal-hearing and hearing-impaired listeners: Steady-state noise,” J. Acoust. Soc. Am., Vol. 130, no. 5, pp. 2987–2998, 2011.
  • C. Smits, and J. Festen, “The interpretation of speech reception threshold data in normal-hearing and hearing-impaired listeners: II. fluctuating noise,” J. Acoust. Soc. Am., Vol. 133, no. 5, pp. 3004–3015, 2013.
  • N. Yousefian, and P. Loizou, “Predicting the speech reception threshold of cochlear implant listeners using an envelope-correlation based measure,” J. Acoust. Soc. Am., Vol. 132, no. 5, pp. 3399–3405, 2012.
  • C. Taal, R. Hendriks, R. Heusdens, et al., “An evaluation of objective measures for intelligibility prediction of time-frequency weighted noisy speech,” J. Acoust. Soc. Am., Vol. 130, no. 5, pp. 3013–3027, 2011.
  • T. May, and T. Dau, “Requirements for the evaluation of computational speech segregation systems,” J. Acoust. Soc. Am., Vol. 136, no. 6, pp. EL398–EL404, 2014.
  • J. Chen, Y. Wang, and D. Wang, “Noise perturbation improves supervised speech separation,” in Proceedings of LVA/ICA, 2015, pp. 83–90.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.