1,955
Views
3
CrossRef citations to date
0
Altmetric
Research Article

Learning Bilingual Word Embedding Mappings with Similar Words in Related Languages Using GAN

ORCID Icon, &
Article: 2019885 | Received 14 Jan 2021, Accepted 09 Dec 2021, Published online: 08 Feb 2022

References

  • Ammar, W., G. Mulcaire, Y. Tsvetkov, G. Lample, C. Dyer, and N. A. Smith. 2016. Massively multilingual word embeddings. CoRR abs/1602.01925. http://arxiv.org/abs/1602.01925
  • Arjovsky, M., S. Chintala, and L. Bottou. 2017. Wasserstein GAN. CORR abs/1701.07875
  • Artetxe, M., G. Labaka, and E. Agirre. 2016. Learning principled bilingual mappings of word embedding while preserving monolingual invariance. Conference on empirical methods in natural language processing, 2289–1561 .
  • Artetxe, M., G. Labaka, and E. Agirre. 2017. Learning bilingual word embeddings with (almost) no bilingual data. Proceedings of ACL, ACL, 451–62.
  • Artetxe, M., G. Labaka, and E. Agirre. 2018. Generalizing and improving bilingual word embedding mappings with a multi-step framework of linear transformations, Proceedings of the AAAI Conference on Artificial Intelligence, 32 1 https://ojs.aaai.org/index.php/AAAI/article/view/11992
  • Bahdanau, D., K. Cho, and Y. Bengio. 2016. Neural Machine Translation by Jointly Learning to Align and Translate. https://arxiv.org/abs/1409.0473
  • Bojanowsk, P., E. Grave, A. Joulin, and T. Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5:135–46. doi:10.1162/tacl_a_00051.
  • Collobert, R., and J. Weston. 2008. A unified architecture for natural language processing. Proceedings of the 25th International Conference on Machine Learning - ICML ’08. 20 (1) 160–167.
  • Conneau, A., G. Lample, M. Ranzato, L. Denoyer, and H. Jégo. 2018. Word translation without parallel data, 6th International Conference on Learning Representations Vancouver, BC, Canada, OpenReview.net.
  • Dinu, Georgiana, Lazaridou, Angeliki, and Baroni, Marco. 2015. Improving zero-shot learning by mitigating the hubness problem, In Proceedings of ICLR (Workshop Track).
  • Duong, L., H. Kanayama, T. Ma, S. Bird, and T. Cohn. 2016. Learning cross-lingual word embeddings without bilingual corpora, Proceedings of EMNLP, 1285–1295.
  • Faruqui, M., and C. Dyer. 2014. Improving vector space word representations using multilingual correlation. Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden, 462–71.
  • Goodfellow, I., J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, and S. Ozair. 2014. Generative adversarial nets. Neural Information Processing Systems 27 2672–2680.
  • Gouws, S., Y. Bengio, and G. Corrado. 2015. Fast bilingual distributed representations without word alignments. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), Red Hook, NY, 748–756.
  • Hammarström, H., R. Forkel, and M. Haspelmath. 2017. Turkic . In Glottolog 3.0.,” Jena, Germany: Max Planck Institute for the Science of Human History, vol. 3.
  • Hauer, Bradley, Garrett, Nicolai, and Grzeg, Kondrak. 2017. Bootstrapping unsupervised bilingual lexicon induction. In Proceedings of EACL, 619–624
  • Hoshen, Y., and L. Wolf. 2018. Non-adversarial unsupervised word translation. Proc. of the Conference on Empirical Methods in Natural Language Processing, Association for Computational LinguisticsN. Eight Street, Stroudsburg, PA, 18360United States, 469–78.
  • Iacer, Calixto, Qun, Liu, and Nick, Campbell. 2017. Multilingual Multi-modal Embeddings for Natural Language Processing, CoRR, abs/1702.01101
  • Jinsong, S., S. Zhenqiao, L. Yaojie, X. Mu, W. Changxing, and C. Yidong. 2018b. Exploring implicit semantic constraints for bilingual word embeddings. Neural Process Letter 48 1073–1088. doi: https://doi.org/10.1007/s11063-017-9762-8
  • Jinsong, S., W. Shan, Z. Biao, W. Changxing, Q. Yue, and X. Deyi. 2018a. A neural generative autoencoder for bilingual word embeddings. Information Sciences 424 287–300. doi:10.1016/j.ins.2017.09.070
  • Joulin, A., E. Grave, P. Bojanowski, and T. Mikolov. 2016. Bag of tricks for efficient text classification. https://arxiv.org/abs/1607.01759v1
  • Kondrak, G., B. Hauer, and G. Nicolai. 2017. Bootstrapping unsupervised bilingual lexicon induction. Proceedings of EACL 2 , 619–624. doi:10.18653/v1/E17-2098.
  • Lample, G., A. Conneau, L. Denoyer, and M. Ranzato. 2018. Unsupervised machine translation using monolingual corpora only. arXiv:1711.00043
  • Lazaridou, A., G. Dinu, and M. Baroni. 2015. Hubness and pollution: Delving into cross space mapping for zero-shot learning. Proceedings of ACL, Beijing, China.
  • Levy, O., A. Søgaard, and Y. Goldberg. 2017. A strong baseline for learning cross-lingual word embeddings from sentence alignments. Proceeding of EACL 1 765–774.
  • Levy, O., Y. Goldberg, and I. Dagan. 2015. Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics 3:211–25. doi:10.1162/tacl_a_00134.
  • Lu, Ang, Wang, Weiran, Bansal, Mohit, Gimple, Kevin, and Livescu, Karen. 2015. Deep multilingual correlation for improved word embeddings Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 1 250–256.
  • Luong, M., and C. Manning. 2016. Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models. https://arxiv.org/abs/1604.00788
  • Luong, T., H. Pham, and C. D. Manning. 2015. Bilingual word representations with monolingual quality in mind. Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, 151–59 doi:10.3115/v1/W15-1521.
  • Makhzani, A., J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey. 2016. Adversarial autoencoders. https://arxiv.org/abs/1511.05644
  • Mikolov, T., I. Sutskever, K. Chen, G. Corrado, and J. Dean. 2013b. Distributed representations of words and phrases and their compositionality. https://arxiv.org/abs/1310.4546
  • Mikolov, T., K. Chen, G. Corrado, and J. Dean. 2013a. Efficient estimation of word representations in vector space. arXiv:1301.3781 2:3111–19.
  • Mikolov, T., Q. V. Le, and I. Sutskever. 2013. Exploiting similarities among languages for machine translation. https://arxiv.org/abs/1309.4168
  • Mogadala, Aditya, and Rettinger, Achim. 2016. Bilingual word embeddings from parallel and nonparallel corpora for cross-language text classification Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 692–702.
  • Mrkšić, N., I. Vulić, D. Ó. Séaghdha, İ. Leviant, R. Reichart, M. Gašić, and A. Korh. 2017. Semantic specialization of distributional word vector spaces using monolingual and cross-lingual constraints. Transactions of the Association for Computational Linguistics 5:309–24. doi:10.1162/tacl_a_00063.
  • Pennington, J., R. Socher, and C. Manning. 2014. Glove: Global vectors for word representation. D14-1162 2014 Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) Doha, Qatar (Association for Computational Linguistics), 1532–1543 https://aclanthology.org/D14-1162 doi:10.3115/v1/D14-1162.
  • Rajendran, Janarthanan, Khapra, Mitesh M, Chandar, Sarath, and Ravindran, Balaraman. 2016. Bridge correlational neural networks for multilingual multimodal representation learning, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, 171–181
  • Ruder, S., I. Vulic, and A. Søgaard. 2019. A survey of cross-lingual word embedding models. Journal of Artificial Intelligence Research 65:569–631. doi:10.1613/jair.1.11640.
  • Shigeto, Y., I. Suzuki, K. Hara, M. Shimbo, and Y. Matsumoto. 2015. Ridge Regression, Hubness, and Zero-Shot Learning. https://arxiv.org/abs/1507.00825
  • Smith, S. L., D. H. Turban, S. Hamblin, and N. Y. Hammerla. Offline bilingual word vectors, orthogonal transformations and the inverted softmax. 5th International Conference on Learning Representations (ICLR 2017), April 24-26 2017 (OpenReview.net) Toulon, France.
  • Upadhyay, S., M. Faruqui, C. Dyer, and D. Ro. Cross-lingual models of word embeddings: An empirical comparison. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Vol. 1. ( Long Papers).
  • Valerio, A., and M. Barone. 2016. Towards crosslingual distributed representations without parallel text trained with adversarial autoencoders. Proceedings of the 1st Workshop on Representation Learning for NLP Berlin, Germany (Association for Computational Linguistics), 121–126 https://aclanthology.org/W16-16 doi:10.18653/v1/W16-16.
  • Vulić, I., and M.-F. Moens. 2015. Bilingual word embeddings from non-parallel DocumentAligned data applied to bilingual lexicon induction. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China. (Association for Computational Linguistics), 719–725. https://aclanthology.org/P15-2 doi:10.3115/v1/P15-2.
  • Vulic´, I., and A. Korhonen. 2016. On the role of seed lexicons in learning bilingual word embeddings. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, (Association for Computational Linguistics), 247–57 https://aclanthology.org/P16-1 doi:10.18653/v1/P16-1.
  • Xing, C., D. Wang, C. Liu, and Y. Lin. 2015. Normalized word embedding and orthogonal transform for bilingual word translation. Proceedings of NAACL-HLT Denver, USA (Association for Computational Linguistics), 1005–10.
  • Zeman, D., J. Hajič, M. Popel, M. Potthast, M. Straka, F. Ginter, … S. Petrov. 2018. Multilingual parsing from raw text to universal dependencies. In Proceedings of the CoNLL 2017 Shared Task Vancouver, Canada (Association for Computational Linguistics), 1–19 https://aclanthology.org/K17-3 doi:10.18653/v1/K17-3.
  • Zhang, M., Y. Liu, H. Luan, and M. Sun. 2017. Adversarial training for unsupervised bilingual lexicon induction. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics Vancouver, Canada, Vol. 1 (Association for Computational Linguistics), 1959–1970 https://aclanthology.org/P17-1 doi:10.18653/v1/P17-1.
  • Zhang, Y., D. Gaddy, R. Barzilay, and T. Jaakkola. 2016. Ten pairs to tag – multilingual POS tagging via coarse mapping between embeddings. Proceedings of NAACL-HLT San Diego, USA (Association for Computational Linguistics), 1307–1317.
  • Zhang, Y., Y. Li, Y. Zhu, and X. Hu. 2020. Wasserstein GAN based on Autoencoder with back-translation for cross-lingual embedding mappings. Pattern Recognition Letters 129:311–16. doi:10.1016/j.patrec.2019.11.033.