Automatika

Journal for Control, Measurement, Electronics, Computing and Communications

Volume 57, 2016 - Issue 1

166

Views

CrossRef citations to date

Altmetric

Original scientific paper

Towards automatic cross-lingual acoustic modelling applied to HMM-based speech synthesis for under-resourced languages

Primjena automatskog međujezičnog akustičnog modeliranja na HMM sintezu govora za oskudne jezične baze

Tadej JustinLaboratory of Artificial Perception, Systems and Cybernetics (LUKS), Faculty of Electrical Engineering, University of Ljubljana, Tržaška 25, SI-1000Ljubljana, Slovenia email: [email protected], [email protected]View further author information

, B.Sc.,

Prof. France MiheličLaboratory of Artificial Perception, Systems and Cybernetics (LUKS), Faculty of Electrical Engineering, University of Ljubljana, Tržaška 25, SI-1000Ljubljana, Slovenia email: [email protected], [email protected]View further author information

, Ph.D. &

Assoc. Prof. Janez ŽibertFaculty of Health Sciences, University of Ljubljana, Zdravstvena pot 5, SI-1000Ljubljana, Slovenia email: [email protected]View further author information

, Ph.D.

Abstract

Nowadays Human Computer Interaction (HCI) can also be achieved with voice user interfaces (VUIs). To enable devices to communicate with humans by speech in the user's own language, low-cost language portability is often discussed and analysed. One of the most time-consuming parts for the language-adaptation process of VUI- capable applications is the target-language speech-data acquisition. Such data is further used in the development of VUIs subsystems, especially of speech-recognition and speech-production systems. The tempting idea to bypass a long-term process of data acquisition is considering the design and development of an automatic algorithms, which can extract the similar target-language acoustic from different language speech databases. This paper focus on the cross-lingual phoneme mapping between an under-resourced and a well-resourced language. It proposes a novel automatic phoneme-mapping technique that is adopted from the speaker-verification field. Such a phoneme mapping is further used in the development of the HMM-based speech-synthesis system for the under-resourced language. The synthesised utterances are evaluated with a subjective evaluation and compared by the expert knowledge cross-language method against to the baseline speech synthesis based just from the under-resourced data. The results reveals, that combining data from well-resourced and under-resourced language with the use of the proposed phoneme-mapping technique, can improve the quality of under-resourced language speech synthesis.

U današnje vrijeme interakcija čovjeka i računala (HCI) može se ostvariti i putem govornih sučelja (VUIs). Da bi se omogućila komunikacija uređaja i korisnika putem govora na vlastitom korisnikovom jeziku, cesto se raspravlja i analizira o jeftinom rješenju prijevoda govora na razlicite jezike. Jedan od vremenski najzahtjevnijih dijelova procesa prilagodbe jezika za aplikacije koje podržavaju VUI je prikupljanje govornih podataka za ciljani jezik. Ovakvi podaci dalje se koriste za razvoj VUI podsustava, posebice za prepoznavanje i produkciju govora. Primamljiva ideja za izbjegavanje dugotrajnog postupka prikupljanja podataka jeste razmatranje sinteze i razvoja automatskih algoritama koji su sposobni izvesti slična akusticna svojstva za ciljani jezik iz postojecih baza razlicitih jezika. Ovaj rad fokusiran je na povezivanje medujezicnih fonema između oskudnih i bogatih jezičnih baza. Predložena je nova tehnika automatskog povezivanja fonema, usvojena i prilagođena iz područja govorne autentikacije. Ovakvo povezivanje fonema kasnije se koristi za razvoj sustava za sintezu govora zasnovanom na HMM-u za manje poznate jezike. Načinjene govorne izjave ocijenjene su subjektivnim pristupom kroz usporedbu međujezicnih metoda visoke razine poznavanja jezika u odnosu na sintezu govora načinjenu iz oskudne jezične baze. Rezultati otkrivaju da kombinacija oskudne i bogate baze jezika uz primjenu predložene tehnike povezivanja fonema može unaprijediti kvalitetu sinteze govora iz oskudne jezične baze.

Key words:

Ključne riječi:

Additional information

Notes on contributors

Tadej Justin

Tadej Justin Tadej Justin was born in 1983 in Ljubljana, Slovenia. In 2009 he obtained his B.Sc degree in electrical engineering from the University of Ljubljana, Faculty of Electrical Engineering. He currently works as Researcher at University of Ljubljana, Faculty of Electrical Engineering. His research interests include statistical modeling, emotional speech synthesis, emotional speech recognition and cross-language related speech technologies with a special focus on the Slovenian language.

France Mihelič

France Mihelič France Mihelič studied at the Faculty of Natural Sciences, Faculty of Economics and Faculty of Electrical Engineering all at the University of Ljubljana. There he received the B.Sc. degree in Technical Mathematics, the M.Sc. degree in Operational Research and the Ph.D. degree in Electrotechnical Sciences in 1976, 1979 and 1991, respectively. Since 1978 he has been a staff member at the Faculty of Electrical and Computer Engineering in Ljubljana, where he is Full Professor, and the Head of the Laboratory for Artificial Perception, Systems and Cybernetics. His research interests include Pattern Recognition, Speech Recognition and Understanding, Speech Synthesis and Signal Processing.

Janez Žibert

Janez Žibert Janez Žibert received his B.Sc. degree in mathematics in 1998 and the M.Sc. and Ph.D. degrees in electrical engineering from the University of Ljubljana in 2001 and 2006, respectively. He is currently working as an Associate Professor at the Faculty of Health Sciences at University of Ljubljana and as a Research Fellow at the ‘Andrej Marušič’ Institute at University of Primorska. His research interests include statistical modeling, pattern recognition, machine learning in general with focus on audio-signal and image data processing.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Towards automatic cross-lingual acoustic modelling applied to HMM-based speech synthesis for under-resourced languages

Primjena automatskog međujezičnog akustičnog modeliranja na HMM sintezu govora za oskudne jezične baze

Notes on contributors

Tadej Justin

France Mihelič

Janez Žibert

Information for

Open access

Opportunities

Help and information

Towards automatic cross-lingual acoustic modelling applied to HMM-based speech synthesis for under-resourced languages

Primjena automatskog međujezičnog akustičnog modeliranja na HMM sintezu govora za oskudne jezične baze

Abstract

Additional information

Notes on contributors

Tadej Justin

France Mihelič

Janez Žibert

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature