1,480
Views
1
CrossRef citations to date
0
Altmetric
COMPUTER SCIENCE

Development of the information system for the Kazakh language preprocessing

, , , & | (Reviewing editor)
Article: 1896418 | Received 25 Oct 2019, Accepted 31 Jan 2021, Published online: 10 Mar 2021

References

  • Assylbekov, Z., Washington, J., & Tyers, F. (2016). A free/open-source hybrid morphological disambiguation tool for Kazakh. The First International Conference on Turkic Computational Linguistics, 18–15.
  • Assylbekov, Z., Washington, J. N., Tyers, F., Nurkas, A., Sundetova, A., Karibayeva, A., Abduali, B., & Amirova, D. (2016). A free/open-source hybrid morphological disambiguation tool for Kazakh. Proceedings of TurCLing.
  • Bashkir poetry corpus. 2019. http://web-corpora.net/bashcorpus/search/?interface_language=ru, Accessed 2019 July 7
  • Bekmanova, G., & Sharipbay, A. A. (2017). Uniform Morphological Analyzer for the Kazakh and Turkish Languages. Proceedings of the Sixth International Conference on Analysis of Images, Social Networks and Texts, Moscow, Russia, 20–30.
  • Bhardwaj, N. D. (2016). Comparative Study of CouchDB and MongoDB – NoSQL Document Oriented Databases. International Journal of Computer Applications, 136(3), 975–8887.
  • Caplan, A. (1955). An experimental study of ambiguity and context. Mechanical Translation, 2(2), 39–46.
  • Constant, M., Eryiğit, G., Monti, J., Van Der Plas, L., Ramisch, C., & Rosner, M. (2017). Multiword Expression Processing: A Survey. Computational Linguistics, 43(4), 837–892.
  • Corpus of Written Tatar language. 2019. http://corpus.tatar/, Accessed 2019 July 7
  • Eryiğit, G., Eryiğit, C., Karabüklü, S., Kelepir, M., Özkul, A., Pamay, T., Torunoğlu-Selamet, D., & Köse, H. (2019). Building the first comprehensive machine-readable Turkish sign language resource: Methods, challenges and solutions. Language Resources and Evaluation, 54, 97–121.
  • Eryiğit, G., & Torunoğlu-Selamet, D. (2017). Social media text normalization for Turkish. Natural Language Engineering, 1–41. 10.
  • Gataullin, R. R., & Gil’mullin, R. A. (2016). Contextual rules for resolving morphological polysemy in the Tatar corpus. OpenSemantic Technologies for Intelligent Systems OSTIS-2016, Minsk, 389–392.
  • Gataullin, R. R. (2016). Analytical review of methods for resolving morphological ambiguity. Russian Digital Libraries Journal, 19(2), 98–114.
  • Hakimov, B. J., Gil’mullin, R. A., & Gataullin, R. R. (2014). Resolution of grammatical polysemy in the Tatar corpus. Uchenye zapiski Kazanskogo universiteta [Scientific notes of Kazan University]. Humanities Series, 156(5), 236–244.
  • Han, J., Haihong, E., Le, G., & Du, J. (2011). Survey on NoSQL database, Pervasive computing and applications (ICPCA), 6th international conference, IEEE, 363–366.
  • Koibagarov, K., Amirgaliyev, Y., & Musabayev, T. (2013). Software implementation of recognition of Kazakh speech commands based on the Markov model. Proceedings of the 9th International Asian School-Seminar “Problems of the optimization of complex systems”, Almaty, Kazakhstan, 12–17.
  • Koibagarov, K., Musabayev, R., & Kalimoldayev, M. (2014). Development of a linguistic processor of texts in the Kazakh language. Journal of Problems of the Informatics, 24(3), 64–72.
  • Kudo, T., & Richardson, J. (2018). SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Brussels, Belgium, 66–71.
  • Kuriyozov, E., Doval, Y., & Gómez-Rodríguez, C. (2020). Cross-Lingual Word Embeddings for Turkic Languages. Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, 4047–4055.
  • Makhambetov, O., Makazhanov, A., Yessenbayev, Z., Matkarimov, B., Sabyrgaliyev, I., & Sharafudinov, A. (2013). Assembling the Kazakh Language Corpus. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, 1022–1031.
  • Mansurova, M., Koybagarov, K., Barakhnin, V., Soltangeldinova, M., & Berdibekov, S. (2016). Application of the morphological analyzer of the Kazakh language for the automated filling of the ontology of the factographic search system. Bulletin of the Kyrgyz State Technical University, 38(2), 61–66.
  • Mansurova, M., Madiyeva, G., Aubakirov, S., Yermekov, Z., & Alimzhanov, Y. (2017). Design and Development of Media-Corpus of the Kazakh Language. Computational Collective Intelligence Technologies and Applications: ICCCI 2017, Nicosia, Cyprus, 509–518.
  • Mansurova, M., Madiyeva, G., Kadyrbek, N., & Yermekov, Z. (2019). Design and development of preprocessing tools for media-corpus of the Kazakh language. Proceedings of the 9th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics May 17–19, Poznań, Poland, 25–31.
  • Myrzakhmetov, B., & Kozhirbayev, Zh. (2018). Extended Language Modeling Experiments for Kazakh. International Workshop on Computational Models in Language and Speech, 2303, 35–43.
  • National corpus of the Russian language. 2019. http://ruscorpora.ru/corpora-intro.html, Accessed 2019 July 7
  • Nevzorova, O., Mukhamedshin, D., & Gataullin, R. (2017). Developing Corpus Management System: Architecture of System and Database. Int’l Conf. Information and Knowledge Engineering, Las Vegas, Nevada, United States, 108–112.
  • Petrovic, D., & Stankovic, M. (2019). The influence of text preprocessing methods and tools on calculating text similarity. Facta Universitatis, 34(5), 973–994. https://doi.org/10.22190/FUMI1905973D
  • Pokorný, J. (2016). How to Store and Process Big Data: Are Today’s Databases Sufficient, 13th IFIP International Conference on Computer Information Systems and Industrial Management (CISIM), Ho Chi Minh City, Vietnam, 5–10.
  • Said, D. A., Wanas, N. M., Darwish, N. M., & Hegazy, N. H. A. (2009). Study of Text Preprocessing Tools for Arabic Text Categorization. The Second International Conference on Arabic Language, Cairo, Egypt, 230–236.
  • Sak, H., Gungor, T., & Saraclar, M. (2008). Turkish Language Resources: Morphological Parser, Morphological Disambiguator and Web Corpus. GoTAL: International Conference on Natural Language Processing, Gothenburg, Sweden, Springer-Verlag Berlin Heidelberg, 417–427.
  • Sulubacak, U., & Eryiğit, G. (2018). Implementing universal dependency, morphology and multiword expression annotation standards for Turkish language processing. Turkish Journal of Electrical Engineering & Computer Sciences, 1–23. https://doi.org/10.3906/elk-1706-81
  • Tunali, V., & Bilgin, T. T. (2012). PRETO: A high-performance text mining tool for preprocessing Turkish texts. CompSysTech ‘12: Proceedings of the 13th International Conference on Computer Systems and Technologies, Bulgaria, 134–140.
  • Turganbayeva, A., & Tukeyev, U. (2020).The Solution of the Problem of Unknown Words Under Neural Machine Translation of the Kazakh Language. Communications in Computer and Information Science book series volume 1178, 319–328.
  • Turkish National Corpus. 2019. http://www.tnc.org.tr/index.php/en/, Accessed 2019 July 7