1,945
Views
3
CrossRef citations to date
0
Altmetric
Research Article

Learning Bilingual Word Embedding Mappings with Similar Words in Related Languages Using GAN

ORCID Icon, &
Article: 2019885 | Received 14 Jan 2021, Accepted 09 Dec 2021, Published online: 08 Feb 2022
 

ABSTRACT

Cross-lingual word embeddings display words from different languages in the same vector space. They provide reasoning about semantics, compare the meaning of words across languages and word meaning in multilingual contexts, necessary to bilingual lexicon induction, machine translation, and cross-lingual information retrieval. This paper proposes an efficient approach to learn bilingual transform mapping between monolingual word embeddings in language pairs. We choose ten different languages from three different language families and downloaded their last update Wikipedia dumps1

1. https://dumps.wikimedia.org.

Then, with some pre-processing steps and using word2vec, we produce word embeddings for them. We select seven language pairs from chosen languages. Since the selected languages are relative, they have thousands of identical words with similar meanings. With these identical dictation words and word embedding models of each language, we create training, validation and, test sets for the language pairs. We then use a generative adversarial network (GAN) to learn the transform mapping between word embeddings of source and target languages. The average accuracy of our proposed method in all language pairs is 71.34%. The highest accuracy is achieved for the Turkish-Azerbaijani language pair with the accuracy 78.32%., which is noticeably higher than prior methods.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Notes