Search in:

Automatika

Journal for Control, Measurement, Electronics, Computing and Communications

Volume 57, 2016 - Issue 1

Submit an article Journal homepage

Free access

166

Views

CrossRef citations to date

Altmetric

Original scientific paper

Towards automatic cross-lingual acoustic modelling applied to HMM-based speech synthesis for under-resourced languages

Primjena automatskog međujezičnog akustičnog modeliranja na HMM sintezu govora za oskudne jezične baze

Tadej JustinLaboratory of Artificial Perception, Systems and Cybernetics (LUKS), Faculty of Electrical Engineering, University of Ljubljana, Tržaška 25, SI-1000Ljubljana, Slovenia email: [email protected], [email protected]View further author information

, B.Sc.,

Prof. France MiheličLaboratory of Artificial Perception, Systems and Cybernetics (LUKS), Faculty of Electrical Engineering, University of Ljubljana, Tržaška 25, SI-1000Ljubljana, Slovenia email: [email protected], [email protected]View further author information

, Ph.D. &

Assoc. Prof. Janez ŽibertFaculty of Health Sciences, University of Ljubljana, Zdravstvena pot 5, SI-1000Ljubljana, Slovenia email: [email protected]View further author information

, Ph.D.

Pages 268-281 | Received 28 Oct 2014, Accepted 04 May 2015, Published online: 20 Jan 2017

Cite this article
https://doi.org/10.7305/automatika.2016.07.1084
CrossMark

References
Citations
Metrics
Reprints & Permissions
View PDF PDF

References

M. H. Cohen, Voice user interface design. Addison-Wesley Professional, 2004.
Google Scholar
L. Besacier, E. Barnard, A. Karpov, and T. Schultz, “Automatic speech recognition for under-resourced languages: A survey,” Speech Communication, vol. 56, no. 0, pp. 85–100, 2014.
Google Scholar
H. Lin, J.-t. Huang, F. Beaufays, B. Strope, and Y.-h. Sung, “Recognition of multilingual speech in mobile applications,” in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2012, pp. 4881–4884, IEEE, 2012.
Google Scholar
V.-B. Le and L. Besacier, “Automatic Speech Recognition for Under-Resourced Languages: Application to Vietnamese Language,” IEEE Trans. Audio, Speech, and Language Processing, vol. 17, no. 8, pp. 1471–1482, 2009.
Google Scholar
J. Kominek and A. W. Black, “The CMU Arctic speech databases,” in Fifth ISCA Workshop on Speech Synthesis, 2004.
Google Scholar
J. Žibert and F. Mihelič, “Slovenian weather forecast speech database,” in Proc, SoftCOM, vol. 1, pp. 199–206, Soft-COM, 10 2000.
Google Scholar
A. Hunt and A. Black, “Unit selection in a concatenative speech synthesis system using a large speech database,” in Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on, vol. 1, pp. 373–376 vol. 1, May 1996.
Google Scholar
H. Zen, K. Tokuda, and A. W. Black, “Statistical parametric speech synthesis,” Speech Communication, vol. 51, no. 11, pp. 1039–1064, 2009.
Google Scholar
J. Kominek, T. Schultz, and A. W. Black, “Synthesizer voice quality on new languages calibrated with mel-cepstral distorion,” in in SLTU 2008, Hanoi, Viet Nam, 2008.
Google Scholar
H. Zen, T. Nose, J. Yamagishi, S. Sako, T. Masuko, A. Black, and K. Tokuda, “The HMM-based speech synthesis system (HTS) version 2.0,” in Proc. of Sixth ISCA Workshop on Speech Synthesis, pp. 294–299, 2007.
Google Scholar
T. Justin, M. Pobar, I. Ipšić, F. Mihelič, and J. Žibert, “A bilingual HMM-based speech synthesis system for closely related languages,” in Text, Speech and Dialogue, pp. 543–550, Springer Berlin Heidelberg, 2012.
Google Scholar
J. Dijkstra, L. C. Pols, and R. J. V. Son, “Frisian TTS, an example of bootstrapping TTS for minority languages,” in Fifth ISCA Workshop on Speech Synthesis, 2004.
Google Scholar
N. T. Vu, F. Kraus, and T. Schultz, “Cross-language bootstrapping based on completely unsupervised training using multilingual A-stabil,” in Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pp. 5000–5003, May 2011.
Google Scholar
T. Schultz and A. Waibel, “Multilingual and Crosslingual Speech Recognition,” in Proc. DARPA Workshop on Broadcast News Transcription and Understanding, pp. 259–262, 1998.
Google Scholar
K. C. Sim and H. Li, “Robust phone set mapping using decision tree clustering for cross-lingual phone recognition,” in Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on, pp. 4309–4312, March 2008.
Google Scholar
C. Traber, K. Huber, K. Nedir, B. Pfister, E. Keller, and B. Zellner, “From multilingual to polyglot speech synthesis,” in Proc. of the Eurospeech, vol. 99, pp. 835–838, 1999.
Google Scholar
J. Latorre, K. Iwano, and S. Furui, “New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer,” Speech Commun., vol. 48, no. 10, pp. 1227–1242, 2006.
Google Scholar
M. Pobar, T. Justin, J. Žibert, F. Mihelič, and I. Ipšič, “A Comparison of Two Approaches to Bilingual HMM-Based Speech Synthesis,” in Text, Speech, and Dialogue, pp. 44–51, Springer Berlin Heidelberg, 2013.
Google Scholar
T. Schultz, N. Vu, and T. Schlippe, “Global Phone: A multilingual text amp; speech database in 20 languages,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pp. 8126–8130, May 2013.
Google Scholar
Y. Qian, H. Liang, and F. Soong, “A Cross-Language State Sharing and Mapping Approach to Bilingual (Mandarin- English) TTS,” IEEE Trans. Audio, Speech, and Language Processing, vol. 17, pp. 1231–1239, Aug 2009.
Google Scholar
X. Cui, J. Xue, X. Chen, P. Olsen, P. Dognin, U. V. Chaudhari, J. Hershey, and B. Zhou, “Hidden Markov Acoustic Modeling With Bootstrap and Restructuring for Low-Resourced Languages,” IEEE Trans. Audio, Speech, and Language Processing, vol. 20, pp. 2252–2264, Oct 2012.
Google Scholar
Y. Qian, J. Xu, and F. Soong, “A frame mapping based HMM approach to cross-lingual voice transformation,” in Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pp. 5120–5123, May 2011.
Google Scholar
H. Cao, T. Lee, and P. Ching, “Cross-lingual speaker adaptation via Gaussian component mapping.,” in INTERSPEECH, pp. 869–872, 2010.
Google Scholar
S.-J. Kim, J.-J. Kim, and M. Hahn, “HMM-based Korean speech synthesis system for hand-held devices,” IEEE Trans. Consumer Electronics, vol. 52, pp. 1384–1390, Nov 2006.
Google Scholar
J. Žganec Gros and M. Žganec, “An efficient unit-selection method for embedded concatenative speech synthesis,” Informacije MIDEM—Journal of Microelectronics, Electronic Components and Materials, vol. 37, no. 3, pp. 158–164, 2007.
Google Scholar
F. Mihelič, J. Gros, J. Dobrišek, S. and Žibert, and N. Pavešič, “Spoken Language Resources at LUKS of the University of Ljubljanai,” International Journal of Speech Technology, vol. 6, no. 3, pp. 221–232, 2003.
Google Scholar
D. H. Klatt, “Review of the ARPA speech understanding project,” The Journal of the Acoustical Society of America, vol. 62, no. 6, pp. 1345–1366, 1977.
Google Scholar
I. P. Association and C. A. I. Corporate, Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet. Cambridge University Press, June 1999.
Google Scholar
D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, “Speaker verification using Adapted Gaussian mixture models,” in Digital Signal Processing, p. 2000, 2000.
Google Scholar
J. Gauvain and C.-H. Lee, “Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains,” IEEE Trans. Speech and Audio Processing, vol. 2, pp. 291–298, Apr 1994.
Google Scholar
A. P. Dempster, N. M. Laird, D. B. Rubin, et al., “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal statistical Society, vol. 39, no. 1, pp. 138, 1977.
Google Scholar
Y. Linde, A. Buzo, and R. Gray, “An Algorithm for Vector Quantizer Design,” Communications, IEEE Transactions on, vol. 28, pp. 84–95, Jan 1980.
Web of Science ®Google Scholar
E. Standard, “Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Frontend feature extraction algorithm; Compression algorithms,” tech. rep., ETSI, 2003.
Google Scholar
S. Young and S. Young, “The HTK Hidden Markov Model Toolkit: Design and Philosophy,” Entropic Cambridge Research Laboratory, Ltd, vol. 2, pp. 2–44, 1994.
Google Scholar
J. luc Gauvain, L. Lamel, and G. Adda, “The LIMSI Broadcast News Transcription System,” Speech Communication, vol. 37, pp. 89–108, 2002.
Google Scholar
M.-Y. Hwang and X. Huang, “Subphonetic modeling with Markov states-Senone,” in Acoustics, Speech, and Signal Processing, 1992. ICASSP-92., 1992 IEEE International Conference on, vol. 1, pp. 33–36 vol.1, Mar 1992.
Google Scholar
K. Tokuda, T. Kobayashi, T. Masuko, and S. Imai, “Mel-Generalized Cepstral Analysis,” in Proc. ICSLP-94, pp. 1043–1046, 1994.
Google Scholar
K. Tokuda, T. Masuko, N. Miyazaki, and T. Kobayashi, “Hidden Markov models based on multi-space probability distribution for pitch pattern modeling,” in Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on, vol. 1, pp. 229–232 vol.1, Mar 1999.
Google Scholar
S. Imai, K. Sumita, and C. Furuichi, “Mel log spectrum approximation (MLSA) filter for speech synthesis,” Electronics and Communications in Japan (Part I: Communications), vol. 66, no. 2, pp. 10–18, 1983.
Google Scholar
J. Yamagishi, T. Nose, H. Zen, Z.-H. Ling, T. Toda, K. Tokuda, S. King, and S. Renals, “Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis,” IEEE Trans. Audio, Speech, and Language Processing, vol. 17, pp. 1208–1230, Aug 2009.
Google Scholar
M. J. Gales, The generation and use of regression class trees for MLLR adaptation. University of Cambridge, Department of Engineering, 1996.
Google Scholar
A. Vasilijević and D. Petrinović, “Perceptual Significance of Cepstral Distortion Measures in Digital Speech Processing,” AUTOMATIKA: časopis za automatiku, mjerenje, elektroniku, računarstvo i komunikacije, vol. 52, no. 2, pp. 132–146, 2011.
Google Scholar
R. B. D'agostino, W. Chase, and A. Belanger, “The Appropriateness of Some Common Procedures for Testing the Equality of Two Independent Binomial Populations,” The American Statistician, vol. 42, no. 3, pp. 198–202, 1988.
Google Scholar
S. Martinčič-Ipšic, M. Pobar, and I. Ipšic, “Croatian large vocabulary automatic speech recognition,” AUTOMATIKA: časopis za automatiku, mjerenje, elektroniku, računarstvo i komunikacije, vol. 52, no. 2, pp. 147–157, 2011.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Towards automatic cross-lingual acoustic modelling applied to HMM-based speech synthesis for under-resourced languages

Primjena automatskog međujezičnog akustičnog modeliranja na HMM sintezu govora za oskudne jezične baze

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Towards automatic cross-lingual acoustic modelling applied to HMM-based speech synthesis for under-resourced languages

Primjena automatskog međujezičnog akustičnog modeliranja na HMM sintezu govora za oskudne jezične baze

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date