117
Views
0
CrossRef citations to date
0
Altmetric
Articles

Dynamic Pronunciation Modelling for Unsupervised Learning of ASR Systems

, &

REFERENCES

  • J. M. Baker, L. Deng, J. Glass, S. Khudanpur, C.-H. Lee, N. Morgan, and D. O'Shaughnessy, “Research developments and directions in speech recognition and understanding, part 1,” IEEE Signal Process. Mag., Vol. 26, no. 3, pp. 75–80, May 2009.
  • J. M. Baker, L. Deng, S. Khudanpur, C.-H. Lee, J. R. Glass, N. Morgan, and D. O'Shaughnessy, “Updated minds report on speech recognition and understanding, part 2,” IEEE Signal Process. Mag., Vol. 26, no. 4, pp. 78–85, Jul. 2009.
  • H. Tolba, and D. O'Shaughnessy, “Speech recognition by intelligent machines,” IEEE Can. Rev. – Summer, Vol. 38, pp. 20–3, Summer 2001.
  • Y. Tsao, X. Lu, P. Dixon, T.-Y. Hu, S. Matsuda, and C. Hori, “Incorporating local information of the acoustic environments to MAP-based feature compensation and acoustic model adaptation,” Comput. Speech Lang., Vol. 28, pp. 709–726, May 2014.
  • C. Chesta, O. Siohan, and C.-H. Lee, “Maximum a posteriori linear regression for hidden Markov model adaptation,” in Proceedings. Eurospeech'99, Budapest, 1999, pp. 211–4.
  • J. L. Gauvain, and C. H. Lee, “Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains,” IEEE Trans. Speech Audio Process., Vol. 2, no. 2, pp. 291–9, Apr. 1994.
  • C. Leggetter, and P. Woodland, “Maximum likelihood linear regression for speaker adaptation of continuous density Hidden Markov Models,” Comput. Speech Lang., Vol. 9, pp. 171–85, 1995.
  • Z. Wang, T. Schultz, and A. Waibel, “Comparison of acoustic model adaptation techniques on non-native speech,” in Proceedings of ICASSP, Hong Kong, 2003.
  • J. Humphries, and P. C. Woodland, “Using accent-specific pronunciation modeling for improved large vocabulary continuous speech recognition,” in Proceedings. Eurospeech, Rhodes, 1997.
  • M. Magimai-Doss, and H. Bourlard, “Pronunciation models and their evaluation using confidence measures,” Idiap Research Report FFIdiap-RR-29-2001, Oct. 2001.
  • O. Pietquin, and T. Dutoit, “A probabilistic framework for dialog simulation and optimal strategy learning,” IEEE Trans. Audio Speech Lang. Process., Vol. 14, no. 2, pp. 589–99, Mar. 2006.
  • M. Wieling, E. Margaretha, and J. Nerbonne, “Inducing phonetic distances from dialect variation,” Comput. Linguist. Netherlands J., Vol. 1, pp. 109–18, Dec. 2011.
  • B. Hixon, E. Schneider, S. L. Epstein, “Phonemic similarity metrics to compare pronunciation methods,” in INTERSPEECH, Florence, 2011.
  • M. Pucher, A. Türk1, J. Ajmera, and N. Fecher, “Phonetic distance measures for speech recognition vocabulary,” presented at the 3rd Congress of the Alps Adria Acoustics Association, Graz, Austria, Sep. 27–28, 2007.
  • M. Lehr, K. Gorman, and I. Shafran, “Discriminative pronunciation modeling for dialectal speech recognition,” in INTERSPEECH, Singapore, 2014.
  • C. Zhang, Y. Liu, Y. Xia, T.F. Zheng, J. Olsen, and J. Tian, “Reliable accent-specific unit generation with discriminative dynamic Gaussian mixture selection for multi-accent Chinese speech recognition,” IEEE Trans. Audio Speech Lang. Process., Vol. 21, no. 10, pp. 2073–84, Oct. 2013.
  • A. S. Park, and J. R. Glass, “Unsupervised pattern discovery in speech,” IEEE Trans. Audio Speech Lang. Process., Vol. 16, no. 1, pp. 186–97, Jan. 2008.
  • L. Deng, and X. Li, “Machine learning paradigms for speech recognition: An overview,” IEEE Trans. Audio Speech Lang. Process., Vol. 21, no. 5, pp. 1–30, May 2013.
  • A. K. Jain, A. Topchy, M. H. C. Law, and J. M. Buhmann, “Landscape of clustering algorithms,” in Proceedings of IAPR International Conference on Pattern Recognition, Cambridge, UK, 2004.
  • L. Rabiner, B. Juang, and B. Yegnanarayana, Fundamentals of Speech Recognition. Englewood Cliffs, NJ: Prentice Hall, 2010.
  • X. Huang, A. Acero, and H.-W. Hon, Spoken Language Processing Guide to Algorithms and System Development. Upper Saddle River, NJ: Prentice Hall, 2001.
  • A. A. Babu, R. Yellasiri, and A. A. Rao, “Unsupervised adaptation of ASR systems using hybrid HMM and VQ model,” in Lecture Notes in Engineering and Computer Science: Proceedings of the International Multiconference of Engineers and Computer Scientists 2014, IMECS, Hong Kong, Mar. 12–14, 2014, pp. 169–74.
  • A. A. Babu, Y. Ramadevi, and A. A. Rao, Data Driven Methods for Adaptation of ASR Systems. IAENG Transactions on Engineering Sciences, International Multiconference of Engineers and Computer Scientists (IMECS2014) & World Congress on Engineering, Hong Kong, 2014, pp. 327–41.
  • J. S. Garofolo, et al., “TIMIT acoustic–phonetic continuous speech corpus,” in Linguistic Data Consortium, Philadelphia, Feb. 1993.
  • W. Walker, P. Lamere, P. Kwok, B. Raj, R. Singh, E. Gouvea, P. Wolf, and J. Woelfel, Sphinx-4: A Flexible Open Source Framework for Speech Recognition, Pittsburgh, PA: Sun Microsystems Inc., SMLI TR2004-0811, 2004.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.