Publication Cover
Automatika
Journal for Control, Measurement, Electronics, Computing and Communications
Volume 57, 2016 - Issue 1
166
Views
0
CrossRef citations to date
0
Altmetric
Original scientific paper

Towards automatic cross-lingual acoustic modelling applied to HMM-based speech synthesis for under-resourced languages

Primjena automatskog međujezičnog akustičnog modeliranja na HMM sintezu govora za oskudne jezične baze

, B.Sc., , Ph.D. & , Ph.D.
Pages 268-281 | Received 28 Oct 2014, Accepted 04 May 2015, Published online: 20 Jan 2017
 

Abstract

Nowadays Human Computer Interaction (HCI) can also be achieved with voice user interfaces (VUIs). To enable devices to communicate with humans by speech in the user's own language, low-cost language portability is often discussed and analysed. One of the most time-consuming parts for the language-adaptation process of VUI- capable applications is the target-language speech-data acquisition. Such data is further used in the development of VUIs subsystems, especially of speech-recognition and speech-production systems. The tempting idea to bypass a long-term process of data acquisition is considering the design and development of an automatic algorithms, which can extract the similar target-language acoustic from different language speech databases. This paper focus on the cross-lingual phoneme mapping between an under-resourced and a well-resourced language. It proposes a novel automatic phoneme-mapping technique that is adopted from the speaker-verification field. Such a phoneme mapping is further used in the development of the HMM-based speech-synthesis system for the under-resourced language. The synthesised utterances are evaluated with a subjective evaluation and compared by the expert knowledge cross-language method against to the baseline speech synthesis based just from the under-resourced data. The results reveals, that combining data from well-resourced and under-resourced language with the use of the proposed phoneme-mapping technique, can improve the quality of under-resourced language speech synthesis.

U današnje vrijeme interakcija čovjeka i računala (HCI) može se ostvariti i putem govornih sučelja (VUIs). Da bi se omogućila komunikacija uređaja i korisnika putem govora na vlastitom korisnikovom jeziku, cesto se raspravlja i analizira o jeftinom rješenju prijevoda govora na razlicite jezike. Jedan od vremenski najzahtjevnijih dijelova procesa prilagodbe jezika za aplikacije koje podržavaju VUI je prikupljanje govornih podataka za ciljani jezik. Ovakvi podaci dalje se koriste za razvoj VUI podsustava, posebice za prepoznavanje i produkciju govora. Primamljiva ideja za izbjegavanje dugotrajnog postupka prikupljanja podataka jeste razmatranje sinteze i razvoja automatskih algoritama koji su sposobni izvesti slična akusticna svojstva za ciljani jezik iz postojecih baza razlicitih jezika. Ovaj rad fokusiran je na povezivanje medujezicnih fonema između oskudnih i bogatih jezičnih baza. Predložena je nova tehnika automatskog povezivanja fonema, usvojena i prilagođena iz područja govorne autentikacije. Ovakvo povezivanje fonema kasnije se koristi za razvoj sustava za sintezu govora zasnovanom na HMM-u za manje poznate jezike. Načinjene govorne izjave ocijenjene su subjektivnim pristupom kroz usporedbu međujezicnih metoda visoke razine poznavanja jezika u odnosu na sintezu govora načinjenu iz oskudne jezične baze. Rezultati otkrivaju da kombinacija oskudne i bogate baze jezika uz primjenu predložene tehnike povezivanja fonema može unaprijediti kvalitetu sinteze govora iz oskudne jezične baze.

Additional information

Notes on contributors

Tadej Justin

Tadej Justin Tadej Justin was born in 1983 in Ljubljana, Slovenia. In 2009 he obtained his B.Sc degree in electrical engineering from the University of Ljubljana, Faculty of Electrical Engineering. He currently works as Researcher at University of Ljubljana, Faculty of Electrical Engineering. His research interests include statistical modeling, emotional speech synthesis, emotional speech recognition and cross-language related speech technologies with a special focus on the Slovenian language.

France Mihelič

France Mihelič France Mihelič studied at the Faculty of Natural Sciences, Faculty of Economics and Faculty of Electrical Engineering all at the University of Ljubljana. There he received the B.Sc. degree in Technical Mathematics, the M.Sc. degree in Operational Research and the Ph.D. degree in Electrotechnical Sciences in 1976, 1979 and 1991, respectively. Since 1978 he has been a staff member at the Faculty of Electrical and Computer Engineering in Ljubljana, where he is Full Professor, and the Head of the Laboratory for Artificial Perception, Systems and Cybernetics. His research interests include Pattern Recognition, Speech Recognition and Understanding, Speech Synthesis and Signal Processing.

Janez Žibert

Janez Žibert Janez Žibert received his B.Sc. degree in mathematics in 1998 and the M.Sc. and Ph.D. degrees in electrical engineering from the University of Ljubljana in 2001 and 2006, respectively. He is currently working as an Associate Professor at the Faculty of Health Sciences at University of Ljubljana and as a Research Fellow at the ‘Andrej Marušič’ Institute at University of Primorska. His research interests include statistical modeling, pattern recognition, machine learning in general with focus on audio-signal and image data processing.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.