Search in:

Advanced search

Automatika

Journal for Control, Measurement, Electronics, Computing and Communications

Volume 62, 2021 - Issue 2

Submit an article Journal homepage

Open access

3,562

Views

CrossRef citations to date

Altmetric

Regular Paper

Advancing natural language processing (NLP) applications of morphologically rich languages with bidirectional encoder representations from transformers (BERT): an empirical case study for Turkish

Akın ÖzçiftHasan Ferdi Turgutlu Technology Faculty, Software Engineering Department, Manisa Celal Bayar University, Manisa, TurkeyCorrespondence[email protected]

https://orcid.org/0000-0003-2840-1917

Kamil AkarsuHasan Ferdi Turgutlu Technology Faculty, Software Engineering Department, Manisa Celal Bayar University, Manisa, Turkey

Fatma YumukHasan Ferdi Turgutlu Technology Faculty, Software Engineering Department, Manisa Celal Bayar University, Manisa, Turkey

Cevhernur SöylemezHasan Ferdi Turgutlu Technology Faculty, Software Engineering Department, Manisa Celal Bayar University, Manisa, Turkey

Pages 226-238 | Received 04 May 2020, Accepted 21 Apr 2021, Published online: 05 May 2021

Cite this article
https://doi.org/10.1080/00051144.2021.1922150
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

References

Batbaatar E, Li M, Ryu KH. Semantic-emotion neural network for emotion recognition from text. IEEE Access. 2019;7:111866–111878.
Web of Science ®Google Scholar
Wang D, Su J, Yu H. Feature extraction and analysis of natural language processing for deep learning English language. IEEE Access. 2020;8:46335–46345.
Web of Science ®Google Scholar
Khan W, Daud A, Khan K, et al. Part of speech tagging in Urdu: comparison of machine and deep learning approaches. IEEE Access. 2019;7:38918–38936.
Web of Science ®Google Scholar
Dong M, Li Y, Tang X, et al. Variable convolution and pooling convolutional neural network for text sentiment classification. IEEE Access. 2020;8:16174–16186.
Web of Science ®Google Scholar
Kaliyar RK, Goswami A, Narang P, et al. FNDNet–A Deep convolutional neural network for fake news detection. Cogn Syst Res. 2020;61:32–44.
Web of Science ®Google Scholar
Sailunaz K, Alhajj R. Emotion and sentiment analysis from twitter text. J Comput Sci. 2019;36:101003.
Web of Science ®Google Scholar
Ren Y, Ji D. Neural networks for deceptive opinion spam detection: An empirical study. Inf Sci (Ny). 2017;385–386:213–224.
Web of Science ®Google Scholar
Samant SS, Bhanu Murthy NL, Malapati A. Improving term weighting schemes for short text classification in vector space model. IEEE Access. 2019;7:166578–166592.
Web of Science ®Google Scholar
Lan M, Tan CL, Su J, et al. Supervised and traditional term weighting methods for automatic text categorization. IEEE Trans Pattern Anal Mach Intell. 2009;31(4 ):721–735.
PubMed Web of Science ®Google Scholar
Rudkowsky E, Haselmayer M, Wastian M, et al. More than bags of words: sentiment analysis with word embeddings. Commun Methods Meas. 2018;12(2–3):140–157.
Web of Science ®Google Scholar
Rubenstein H, Goodenough JB. Contextual correlates of synonymy. Commun ACM. 1965;8(10):627–633.
Web of Science ®Google Scholar
Schütze H, Pedersen JO. Information retrieval based on word senses. 1995.
Google Scholar
Khattak FK, Jeblee S, Pou-Prom C, et al. A survey of word embeddings for clinical text. J Biomed Inform X. 2019;4:100057.
Google Scholar
Bengio Y, Ducharme R, Vincent P, et al. A neural probabilistic language model. J Mach Learn Res. 2003;3:1137–1155.
Web of Science ®Google Scholar
Guo B, Zhang C, Liu J, et al. Improving text classification with weighted word embeddings via a multi-channel text CNN model. Neurocomputing. 2019;363:366–374.
Web of Science ®Google Scholar
Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space. ArXiv:1301.3781 [Cs]. September 6, 2013.
Google Scholar
Pennington J, Socher R, Manning C. Glove: global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP); Doha, Qatar: Association for Computational Linguistics. 2014:1532–1543.
Google Scholar
Mikolov T, Grave E, Bojanowski P, et al. Advances in pre-training distributed word representations. ArXiv:1712.09405 [Cs]. December 26, 2017.
Google Scholar
Devlin J, Chang M-W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding. In: J Burstein, C Doran, T Solorio, editors. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics; 2019. p. 4171–4186.
Google Scholar
Peters ME, Neumann M, Iyyer M, et al. Deep contextualized word representations. ArXiv:1802.05365 [Cs]. March 22, 2018.
Google Scholar
Hao J, Wang X, Yang B, et al. Modeling recurrence for transformer. ArXiv:1904.03092 [Cs]. April 5, 2019.
Google Scholar
Li F, Jin Y, Liu W, et al. Fine-tuning bidirectional encoder representations from transformers (BERT)–based models on large-scale electronic health record notes: an empirical study. JMIR Med Inform. 2019;1–13.
Google Scholar
Alhaj YA, Xiang J, Zhao D, et al. A study of the effects of stemming strategies on Arabic document classification. IEEE Access. 2019;7:32664–32671.
Web of Science ®Google Scholar
Demir H, Özgür A. Improving named entity recognition for morphologically rich languages using word embeddings. 2014 13th International Conference on Machine Learning and Applications. 2014:117–122.
Google Scholar
Uysal AK, Gunal S. The impact of preprocessing on text classification. Inf Process Manag. 2014;50(1):104–112.
Web of Science ®Google Scholar
Mulki H, Haddad H, Ali CB, et al. Preprocessing impact on Turkish sentiment analysis. 2018 26th Signal Processing and Communications Applications Conference (SIU). 2018:1–4.
Google Scholar
Ebert S, Müller T, Schütze H. LAMB: a good shepherd of morphologically rich languages. EMNLP. 2016. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing: 742–752.
Google Scholar
Romanov V, Khusainova A. Evaluation of morphological embeddings for English and Russian languages. Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP; Minneapolis, USA: Association for Computational Linguistics. 2019:77–81.
Google Scholar
Belinkov Y, Durrani N, Dalvi F, et al. On the linguistic representational power of neural machine translation models. ArXiv:1911.00317 [Cs]. November 1, 2019.
Google Scholar
Jawahar G, Sagot B, Seddah D. What does BERT learn about the structure of language? Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics. 2019:3651–3657.
Google Scholar
Zhu Y, Heinzerling B, Vulić I, et al. On the importance of subword information for morphological tasks in truly low-resource languages. ArXiv:1909.12375 [Cs]. September 26, 2019.
Google Scholar
Bozyiğit, Alican, Semih Utku, and Efendi Nasiboğlu. Cyberbullying detection by using artificial neural network models. 2019 4th International Conference on Computer Science and Engineering (UBMK). 2019:520–24.
Google Scholar
Ucan A, Naderalvojoud B, Sezer EA, et al. SentiWordNet for new language: Automatic translation approach. 2016 12th International Conference on Signal-Image Technology Internet-Based Systems (SITIS). 2016:308–15.
Google Scholar
Özdemir C, Yılmaz K. A new approach to filtering spam SMS: motif patterns. GUJSC. 2018;6(2):436–450.
Google Scholar
Kılınç D, Özçift A, Bozyigit F, et al. TTC-3600: A new benchmark dataset for Turkish text categorization. J Inf Sci. 2017;43(2):174–185.
Web of Science ®Google Scholar
Tocoglu MA, Alpkocak A. TREMO: A dataset for emotion analysis in Turkish. J Inf Sci. 2018.
Web of Science ®Google Scholar
Sak H, Güngör T, Saraçlar M. Resources for Turkish morphological processing. Lang Resour Eval. 2011;45(2):249–261.
Web of Science ®Google Scholar
Vylomova E, Cohn T, He X, et al. Word representation models for morphologically rich languages in neural machine translation. Proceedings of the First Workshop on Subword and Character Level Models in NLP. Copenhagen, Denmark: Association for Computational Linguistics. 2017:103–108.
Google Scholar
Hans K, Milton RS. Improving the performance of neural machine translation involving morphologically rich languages. ArXiv:1612.02482 [Cs]. January 8, 2017.
Google Scholar
Oflazer K. Turkish and Its challenges for language processing. Lang Resour Eval. 2014;48(4):639–653.
Web of Science ®Google Scholar
Kışla T, Karaoglan B. A hybrid statistical approach to stemming in Turkish: an agglutinative language. Anadolu Univ J Sci Technol Appl Sci Eng. 2016;401–412.
Google Scholar
Abudukelimu H, Liu Y, Chen X, et al. Learning Distributed Representations of Uyghur words and morphemes. CCL. 2015; 202–211.
Google Scholar
Wolk K. Machine Learning in Translation corpora processing. 1st ed. CRC Press; 2019.
Google Scholar
Nuzumlalı MY, Özgür A. Analyzing stemming approaches for Turkish multi-document summarization. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguistics. 2014:702–706.
Google Scholar
Mogotsi I, Christopher C, Manning D, et al. Introduction to information retrieval. Inf Retr Boston. 2010;13(2):192–195.
Google Scholar
Tantuğ AC, Adali E, Oflazer K. Machine translation between Turkic languages. Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. Prague, Czech Republic: Association for Computational Linguistics. 2007:189–192.
Google Scholar
Vuckovic K, Bekavac B, Silberztein M, et al. Automatic Processing of various levels of linguistic phenomena: selected papers from the NooJ 2011 International Conference.
Google Scholar
Akdoğan Ö, Ayşe Özel S. Nitelik Çıkarımı Yöntemlerinin Türkçe Metinlerin Sınıflandırılmasına Etkisi. Çukurova Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi. September 30, 2019:95–108.
Google Scholar
Kowsari K, Meimandi KJ, Heidarysafa M, et al. Text classification algorithms: a survey. Information. 2019;10(4):150.
Web of Science ®Google Scholar
Uysal AK, Gunal S, Ergin S, et al. The impact of feature extraction and selection on SMS spam filtering. 2013.
Google Scholar
Tahir M, Haq AU, Asghar S, et al. A classification model for class imbalance dataset using genetic programming. IEEE Access. 2019;7:71013–71037.
Web of Science ®Google Scholar
Kobayashi VB, Mol ST, Berkers HA, et al. Text classification for organizational researchers: a tutorial. Organ Res Methods. 2018;21(3):766–799.
PubMed Web of Science ®Google Scholar
Gao Z, Feng A, Song X, et al. Target-dependent sentiment classification with BERT. IEEE Access. 2019;7:154290–154299.
Web of Science ®Google Scholar
Devlin J, Chang M-W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding. ArXiv:1810.04805 [Cs]. May 24, 2019.
Google Scholar
Sun C, Qiu X, Xu Y, et al. How to fine-tune BERT for text classification? ArXiv:1905.05583 [Cs]. February 5, 2020.
Google Scholar
Taylor WL. ‘Cloze procedure’: a new tool for measuring readability. Journalism Q. 1953;30(4):415–433.
Web of Science ®Google Scholar
Lu J, Batra D, Parikh D, et al. ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. ArXiv:1908.02265 [Cs]. August 6, 2019.
Google Scholar
Wu X, Lv S, Zang L, et al. Conditional BERT contextual augmentation. December 17, 2018.
Google Scholar
McCann B, Bradbury J, Xiong C, et al. Learned in translation: contextualized word vectors. ArXiv:1708.00107 [Cs]. June 20, 2018.
Google Scholar
Ma G. Tweets classification with BERT in the Field Of Disaster Management | Semantic Scholar. Accessed May 4, 2020.
Google Scholar
Mubarak H, Rashed A, Darwish K, et al. Arabic offensive language on twitter: Analysis and experiments. ArXiv:2004.02192 [Cs]. April 5, 2020.
Google Scholar
Asim MN, Ghani MU, Ibrahim MA, et al. Benchmark performance of machine and deep learning based methodologies for Urdu text document classification. ArXiv:2003.01345 [Cs]. March 3, 2020.
Google Scholar
Hiew J, Git Z, Huang X, et al. BERT-based financial sentiment index and LSTM-based stock return predictability. ArXiv:1906.09024 [q-Fin]. June 21, 2019.
Google Scholar
Erşahin B, Aktaş Ö, Kilinç D, et al. A hybrid sentiment analysis method for Turkish. Turk JElec EngComp Sci. 2019;27(3):1780–1793.
Web of Science ®Google Scholar
Houlsby N, Giurgiu A, Jastrzebski S, et al. Parameter-efficient transfer learning for NLP. ArXiv:1902.00751 [Cs, Stat]. June 13, 2019.
Google Scholar
Huang C, Trabelsi A, Zaïane OR. ANA at SemEval-2019 Task 3: contextual emotion detection in conversations through hierarchical LSTMs and BERT. ArXiv:1904.00132 [Cs]. May 31, 2019.
Google Scholar
Botha JA. Probabilistic modelling of morphologically rich languages. ArXiv:1508.04271[Cs]. August 18, 2015.
Google Scholar

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Advancing natural language processing (NLP) applications of morphologically rich languages with bidirectional encoder representations from transformers (BERT): an empirical case study for Turkish

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Advancing natural language processing (NLP) applications of morphologically rich languages with bidirectional encoder representations from transformers (BERT): an empirical case study for Turkish

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date