76
Views
4
CrossRef citations to date
0
Altmetric
Original Article

A new perceptually weighted cost function in deep neural network based speech enhancement systems

ORCID Icon

References

  • Xia B, Bao C. Wiener filtering based speech enhancement with weighted denoising auto-encoder and noise classification. Speech Commun. 2014;60:13–29.
  • Xu Y, Du J, Dai LR, et al. A regression approach to speech enhancement based on deep neural networks. Audio Speech Lang Proc IEEE/ACM Trans. 2015;23:7–19.
  • Huang P, Kim M, Hasegawa-Johnson M, et al. “Deep learning for monaural speech separation.” IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway (NJ): IEEE; 2014. p. 1562–1566.
  • Jin Z, Wang D. A supervised learning approach to monaural segregation of reverberant speech. IEEE Trans Audio Speech Lang Proc. 2009;17:625–638.
  • Han K, Wang Y, Wang D. Learning spectral mapping for speech dereverberation. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2014 May 4–9; Florence (Italy). IEEE; 2014. p. 4648–4652.
  • Kolbaek M, Tan ZH, Jensen J. Speech intelligibility potential of general and specialized deep neural network based speech enhancement systems. IEEE/ACM Trans Audio Speech Lang Proc. 2017;25:149–163.
  • Baby D, Verhulst S. Biophysically-inspired features improve the generalizability of neural network-based speech enhancement systems. In Proc. of INTERSPEECH; 2018. p. 3264–3268.
  • Aubreville M, Ehrensperger K, Rosenkranz T, et al. Deep de-noising for hearing aid applications. 16th International Workshop on Acoustic Signal Enhancement (IWAENC). IEEE; 2018. p. 361–365.
  • Anurag K, Dinei F. Speech enhancement in multiple-noise conditions using deep neural network. In Proc. of INTERSPEECH. 2016. p. 3738–3742.
  • Naithani G, Nikunen J, Bramsløw L, et al. Deep neural network based speech separation optimizing an objective estimator of intelligibility for low latency applications. 16th International Workshop on Acoustic Signal Enhancement (IWAENC). IEEE; 2018. p. 386–390.
  • Kolbk M, Tan ZH, Jensen J. Monaural speech enhancement using deep neural networks by maximizing a short time objective intelligibility measure. IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway (NJ): IEEE; 2018.
  • Methods for Calculation of the Speech Intelligibility Index. S3.5-1997. New York (NY): ANSI; 1997.
  • Painter T, Spanias A. A review of algorithms for perceptual coding of digital audio signals. Piscataway (NJ): IEEE; 1997. p. 179–208.
  • Zwislocki J. Analysis of some auditory characteristics. DTIC Document Tech Rep. 1963.
  • Healy EW, Yoho SE, Apoux F. Band importance for sentences and words reexamined. J Acoust Soc Am. 2013;133:463–473.
  • Goli P, Raofy M. New method boosts speech intelligibility in noisy environments. Hearing J. 2017;70:38–40.
  • Hendriks RC, Crespo JB, Jensen J, et al. Optimal near-end speech intelligibility improvement incorporating additive noise and late reverberation under an approximation of the short-time SII. IEEE/ACM Trans Audio Speech Lang Proc. 2015;23:851–862.
  • Taal CH, Jensen J, Leijon A. On optimal linear filtering of speech for near-end listening enhancement. IEEE Signal Process Lett. 2013;20:225–228.
  • Gerkmann T, Hendriks RC. Noise power estimation based on the probability of speech presence. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). New Paltz (NY): WASPAA; 2011. p. 145–148.
  • Garofolo JS, Lamel LF, Fisher WM, et al. Darpa Timit acoustic-phonetic continuous speech corpus cd-rom. nist speech disc 1–1.1. 1993. p. 27403. (NASA STI/Recon Technical Report N, vol. 93).
  • Steeneken HJM, Geurtsen FWM. Description of the RSG.10 noise database. TNO Institute for Perception, 1988. (Technical Report IZF 1988-3).
  • Hirsch HG, Pearce D. The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions. In ASR2000-Automatic Speech Recognition: Challenges for the new Millenium ISCA Tutorial and Research Workshop (ITRW). 2000. p. 181–188.
  • Bengio Y. Learning deep architectures for AI. FNT Machine Learn. 2009;2:1–127.
  • Kates JM, Arehart KH. Coherence and the speech intelligibility index. J Acoust Soc Am. 2005;117:2224–2237.
  • Taal CH, Hendriks RC, Heusdens R, et al. An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans Audio Speech Lang Process. 2011;19:2125–2136.
  • P.862: Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs,” International Telecommunication Union, ITU-T recommendation, Jan. 2001.
  • Lu X, Tsao Y, Matsuda S, et al. Speech enhancement based on deep de-noising autoencoder. In Proc. of INTERSPEECH. 2013. p. 436–440.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.