Abstract
Nowadays Human Computer Interaction (HCI) can also be achieved with voice user interfaces (VUIs). To enable devices to communicate with humans by speech in the user's own language, low-cost language portability is often discussed and analysed. One of the most time-consuming parts for the language-adaptation process of VUI- capable applications is the target-language speech-data acquisition. Such data is further used in the development of VUIs subsystems, especially of speech-recognition and speech-production systems. The tempting idea to bypass a long-term process of data acquisition is considering the design and development of an automatic algorithms, which can extract the similar target-language acoustic from different language speech databases. This paper focus on the cross-lingual phoneme mapping between an under-resourced and a well-resourced language. It proposes a novel automatic phoneme-mapping technique that is adopted from the speaker-verification field. Such a phoneme mapping is further used in the development of the HMM-based speech-synthesis system for the under-resourced language. The synthesised utterances are evaluated with a subjective evaluation and compared by the expert knowledge cross-language method against to the baseline speech synthesis based just from the under-resourced data. The results reveals, that combining data from well-resourced and under-resourced language with the use of the proposed phoneme-mapping technique, can improve the quality of under-resourced language speech synthesis.
U današnje vrijeme interakcija čovjeka i računala (HCI) može se ostvariti i putem govornih sučelja (VUIs). Da bi se omogućila komunikacija uređaja i korisnika putem govora na vlastitom korisnikovom jeziku, cesto se raspravlja i analizira o jeftinom rješenju prijevoda govora na razlicite jezike. Jedan od vremenski najzahtjevnijih dijelova procesa prilagodbe jezika za aplikacije koje podržavaju VUI je prikupljanje govornih podataka za ciljani jezik. Ovakvi podaci dalje se koriste za razvoj VUI podsustava, posebice za prepoznavanje i produkciju govora. Primamljiva ideja za izbjegavanje dugotrajnog postupka prikupljanja podataka jeste razmatranje sinteze i razvoja automatskih algoritama koji su sposobni izvesti slična akusticna svojstva za ciljani jezik iz postojecih baza razlicitih jezika. Ovaj rad fokusiran je na povezivanje medujezicnih fonema između oskudnih i bogatih jezičnih baza. Predložena je nova tehnika automatskog povezivanja fonema, usvojena i prilagođena iz područja govorne autentikacije. Ovakvo povezivanje fonema kasnije se koristi za razvoj sustava za sintezu govora zasnovanom na HMM-u za manje poznate jezike. Načinjene govorne izjave ocijenjene su subjektivnim pristupom kroz usporedbu međujezicnih metoda visoke razine poznavanja jezika u odnosu na sintezu govora načinjenu iz oskudne jezične baze. Rezultati otkrivaju da kombinacija oskudne i bogate baze jezika uz primjenu predložene tehnike povezivanja fonema može unaprijediti kvalitetu sinteze govora iz oskudne jezične baze.
Additional information
Notes on contributors
Tadej Justin
Tadej Justin Tadej Justin was born in 1983 in Ljubljana, Slovenia. In 2009 he obtained his B.Sc degree in electrical engineering from the University of Ljubljana, Faculty of Electrical Engineering. He currently works as Researcher at University of Ljubljana, Faculty of Electrical Engineering. His research interests include statistical modeling, emotional speech synthesis, emotional speech recognition and cross-language related speech technologies with a special focus on the Slovenian language.
France Mihelič
France Mihelič France Mihelič studied at the Faculty of Natural Sciences, Faculty of Economics and Faculty of Electrical Engineering all at the University of Ljubljana. There he received the B.Sc. degree in Technical Mathematics, the M.Sc. degree in Operational Research and the Ph.D. degree in Electrotechnical Sciences in 1976, 1979 and 1991, respectively. Since 1978 he has been a staff member at the Faculty of Electrical and Computer Engineering in Ljubljana, where he is Full Professor, and the Head of the Laboratory for Artificial Perception, Systems and Cybernetics. His research interests include Pattern Recognition, Speech Recognition and Understanding, Speech Synthesis and Signal Processing.
Janez Žibert
Janez Žibert Janez Žibert received his B.Sc. degree in mathematics in 1998 and the M.Sc. and Ph.D. degrees in electrical engineering from the University of Ljubljana in 2001 and 2006, respectively. He is currently working as an Associate Professor at the Faculty of Health Sciences at University of Ljubljana and as a Research Fellow at the ‘Andrej Marušič’ Institute at University of Primorska. His research interests include statistical modeling, pattern recognition, machine learning in general with focus on audio-signal and image data processing.