References
- M. H. Cohen, Voice user interface design. Addison-Wesley Professional, 2004.
- L. Besacier, E. Barnard, A. Karpov, and T. Schultz, “Automatic speech recognition for under-resourced languages: A survey,” Speech Communication, vol. 56, no. 0, pp. 85–100, 2014.
- H. Lin, J.-t. Huang, F. Beaufays, B. Strope, and Y.-h. Sung, “Recognition of multilingual speech in mobile applications,” in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2012, pp. 4881–4884, IEEE, 2012.
- V.-B. Le and L. Besacier, “Automatic Speech Recognition for Under-Resourced Languages: Application to Vietnamese Language,” IEEE Trans. Audio, Speech, and Language Processing, vol. 17, no. 8, pp. 1471–1482, 2009.
- J. Kominek and A. W. Black, “The CMU Arctic speech databases,” in Fifth ISCA Workshop on Speech Synthesis, 2004.
- J. Žibert and F. Mihelič, “Slovenian weather forecast speech database,” in Proc, SoftCOM, vol. 1, pp. 199–206, Soft-COM, 10 2000.
- A. Hunt and A. Black, “Unit selection in a concatenative speech synthesis system using a large speech database,” in Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on, vol. 1, pp. 373–376 vol. 1, May 1996.
- H. Zen, K. Tokuda, and A. W. Black, “Statistical parametric speech synthesis,” Speech Communication, vol. 51, no. 11, pp. 1039–1064, 2009.
- J. Kominek, T. Schultz, and A. W. Black, “Synthesizer voice quality on new languages calibrated with mel-cepstral distorion,” in in SLTU 2008, Hanoi, Viet Nam, 2008.
- H. Zen, T. Nose, J. Yamagishi, S. Sako, T. Masuko, A. Black, and K. Tokuda, “The HMM-based speech synthesis system (HTS) version 2.0,” in Proc. of Sixth ISCA Workshop on Speech Synthesis, pp. 294–299, 2007.
- T. Justin, M. Pobar, I. Ipšić, F. Mihelič, and J. Žibert, “A bilingual HMM-based speech synthesis system for closely related languages,” in Text, Speech and Dialogue, pp. 543–550, Springer Berlin Heidelberg, 2012.
- J. Dijkstra, L. C. Pols, and R. J. V. Son, “Frisian TTS, an example of bootstrapping TTS for minority languages,” in Fifth ISCA Workshop on Speech Synthesis, 2004.
- N. T. Vu, F. Kraus, and T. Schultz, “Cross-language bootstrapping based on completely unsupervised training using multilingual A-stabil,” in Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pp. 5000–5003, May 2011.
- T. Schultz and A. Waibel, “Multilingual and Crosslingual Speech Recognition,” in Proc. DARPA Workshop on Broadcast News Transcription and Understanding, pp. 259–262, 1998.
- K. C. Sim and H. Li, “Robust phone set mapping using decision tree clustering for cross-lingual phone recognition,” in Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on, pp. 4309–4312, March 2008.
- C. Traber, K. Huber, K. Nedir, B. Pfister, E. Keller, and B. Zellner, “From multilingual to polyglot speech synthesis,” in Proc. of the Eurospeech, vol. 99, pp. 835–838, 1999.
- J. Latorre, K. Iwano, and S. Furui, “New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer,” Speech Commun., vol. 48, no. 10, pp. 1227–1242, 2006.
- M. Pobar, T. Justin, J. Žibert, F. Mihelič, and I. Ipšič, “A Comparison of Two Approaches to Bilingual HMM-Based Speech Synthesis,” in Text, Speech, and Dialogue, pp. 44–51, Springer Berlin Heidelberg, 2013.
- T. Schultz, N. Vu, and T. Schlippe, “Global Phone: A multilingual text amp; speech database in 20 languages,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pp. 8126–8130, May 2013.
- Y. Qian, H. Liang, and F. Soong, “A Cross-Language State Sharing and Mapping Approach to Bilingual (Mandarin- English) TTS,” IEEE Trans. Audio, Speech, and Language Processing, vol. 17, pp. 1231–1239, Aug 2009.
- X. Cui, J. Xue, X. Chen, P. Olsen, P. Dognin, U. V. Chaudhari, J. Hershey, and B. Zhou, “Hidden Markov Acoustic Modeling With Bootstrap and Restructuring for Low-Resourced Languages,” IEEE Trans. Audio, Speech, and Language Processing, vol. 20, pp. 2252–2264, Oct 2012.
- Y. Qian, J. Xu, and F. Soong, “A frame mapping based HMM approach to cross-lingual voice transformation,” in Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pp. 5120–5123, May 2011.
- H. Cao, T. Lee, and P. Ching, “Cross-lingual speaker adaptation via Gaussian component mapping.,” in INTERSPEECH, pp. 869–872, 2010.
- S.-J. Kim, J.-J. Kim, and M. Hahn, “HMM-based Korean speech synthesis system for hand-held devices,” IEEE Trans. Consumer Electronics, vol. 52, pp. 1384–1390, Nov 2006.
- J. Žganec Gros and M. Žganec, “An efficient unit-selection method for embedded concatenative speech synthesis,” Informacije MIDEM—Journal of Microelectronics, Electronic Components and Materials, vol. 37, no. 3, pp. 158–164, 2007.
- F. Mihelič, J. Gros, J. Dobrišek, S. and Žibert, and N. Pavešič, “Spoken Language Resources at LUKS of the University of Ljubljanai,” International Journal of Speech Technology, vol. 6, no. 3, pp. 221–232, 2003.
- D. H. Klatt, “Review of the ARPA speech understanding project,” The Journal of the Acoustical Society of America, vol. 62, no. 6, pp. 1345–1366, 1977.
- I. P. Association and C. A. I. Corporate, Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet. Cambridge University Press, June 1999.
- D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, “Speaker verification using Adapted Gaussian mixture models,” in Digital Signal Processing, p. 2000, 2000.
- J. Gauvain and C.-H. Lee, “Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains,” IEEE Trans. Speech and Audio Processing, vol. 2, pp. 291–298, Apr 1994.
- A. P. Dempster, N. M. Laird, D. B. Rubin, et al., “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal statistical Society, vol. 39, no. 1, pp. 138, 1977.
- Y. Linde, A. Buzo, and R. Gray, “An Algorithm for Vector Quantizer Design,” Communications, IEEE Transactions on, vol. 28, pp. 84–95, Jan 1980.
- E. Standard, “Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Frontend feature extraction algorithm; Compression algorithms,” tech. rep., ETSI, 2003.
- S. Young and S. Young, “The HTK Hidden Markov Model Toolkit: Design and Philosophy,” Entropic Cambridge Research Laboratory, Ltd, vol. 2, pp. 2–44, 1994.
- J. luc Gauvain, L. Lamel, and G. Adda, “The LIMSI Broadcast News Transcription System,” Speech Communication, vol. 37, pp. 89–108, 2002.
- M.-Y. Hwang and X. Huang, “Subphonetic modeling with Markov states-Senone,” in Acoustics, Speech, and Signal Processing, 1992. ICASSP-92., 1992 IEEE International Conference on, vol. 1, pp. 33–36 vol.1, Mar 1992.
- K. Tokuda, T. Kobayashi, T. Masuko, and S. Imai, “Mel-Generalized Cepstral Analysis,” in Proc. ICSLP-94, pp. 1043–1046, 1994.
- K. Tokuda, T. Masuko, N. Miyazaki, and T. Kobayashi, “Hidden Markov models based on multi-space probability distribution for pitch pattern modeling,” in Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on, vol. 1, pp. 229–232 vol.1, Mar 1999.
- S. Imai, K. Sumita, and C. Furuichi, “Mel log spectrum approximation (MLSA) filter for speech synthesis,” Electronics and Communications in Japan (Part I: Communications), vol. 66, no. 2, pp. 10–18, 1983.
- J. Yamagishi, T. Nose, H. Zen, Z.-H. Ling, T. Toda, K. Tokuda, S. King, and S. Renals, “Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis,” IEEE Trans. Audio, Speech, and Language Processing, vol. 17, pp. 1208–1230, Aug 2009.
- M. J. Gales, The generation and use of regression class trees for MLLR adaptation. University of Cambridge, Department of Engineering, 1996.
- A. Vasilijević and D. Petrinović, “Perceptual Significance of Cepstral Distortion Measures in Digital Speech Processing,” AUTOMATIKA: časopis za automatiku, mjerenje, elektroniku, računarstvo i komunikacije, vol. 52, no. 2, pp. 132–146, 2011.
- R. B. D'agostino, W. Chase, and A. Belanger, “The Appropriateness of Some Common Procedures for Testing the Equality of Two Independent Binomial Populations,” The American Statistician, vol. 42, no. 3, pp. 198–202, 1988.
- S. Martinčič-Ipšic, M. Pobar, and I. Ipšic, “Croatian large vocabulary automatic speech recognition,” AUTOMATIKA: časopis za automatiku, mjerenje, elektroniku, računarstvo i komunikacije, vol. 52, no. 2, pp. 147–157, 2011.