1,629
Views
3
CrossRef citations to date
0
Altmetric
Research Article

Semi-Supervised Self-Training of Hate and Offensive Speech from Social Media

ORCID Icon & ORCID Icon
Pages 1621-1645 | Received 06 Mar 2021, Accepted 27 Sep 2021, Published online: 25 Oct 2021

References

  • Albadi, N., M. Kurdi, and S. Mishra. 2018. Are they our brothers? Analysis and detection of religious hate speech in the Arabic twittersphere. In Proceeding of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 69–76, Barcelona, Spain .
  • Alsafari, S., S. Sadaoui, and M. Mouhoub. 2020a. Deep learning ensembles for hate speech detection. 32th International Conference on Tools with Artificial Intelligence, ICTAI, Virtual Conference.
  • Alsafari, S., S. Sadaoui, and M. Mouhoub. 2020b. Effect of word embedding models on hate and offensive speech detection. In arXiv, CC BY 4.0.
  • Alsafari, S., S. Sadaoui, and M. Mouhoub. 2020c. Hate and offensive speech detection on Arabic social media. Online Social Networks and Media 19:100096. doi:https://doi.org/10.1016/j.osnem.2020.100096.
  • Antoun, W., F. Baly, and H. Hajj. 2020. AraBERT: Transformer-based model for Arabic language understanding.
  • Bhojanapalli, S., K. Wilber, A. Veit, A. S. Rawat, S. Kim, A. Menon, and S. Kumar. 2021. On the reproducibility of neural network predictions.
  • Borrajo, M., R. Romero, and E. Iglesias. 2015. A linear-RBF multikernel SVM to classify big text corpora. BioMed Research International 2015:1–14.
  • Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805,.
  • Elshaar, S., and S. Sadaoui. 2020. Detecting bidding fraud using a few labeled data. In 12th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, 17–25, Valletta, Malta.
  • Georgakopoulos, S. V., S. K. Tasoulis, A. G. Vrahatis, and V. P. Plagianakos. 2018. Convolutional neural networks for toxic comment classification. In Proceedings of the 10th Hellenic Conference on Artificial Intelligence, Patras, Greece.
  • Graves, A., A. Mohamed, and G. Hinton. 2013. Speech recognition with deep recurrent neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, may, 6645–49.
  • He, J., J. Gu, J. Shen, and M. Ranzato. 2020. Revisiting self-training for neural sequence generation.
  • Hughes, M., I. Li, S. Kotoulas, and T. Suzumura. 2017. Medical text classification using convolutional neural networks. In Studies in Health Technology and Informatics235:246–25, IOS Press, Amsterdam, Netherlands.
  • Kahn, J., A. Lee, and A. Hannun. 2020. Self-training for end-to-end speech recognition. In Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7084–88, Barcelona, Spain.
  • Lee, J. Y., and F. Dernoncourt. 2016. Sequential short-text classification with recurrent and convolutional neural networks. In Proceedings of the 2016 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, 515–20, San Diego, CA, USA.
  • Li, J., Q. Zhu, Q. Wu, and D. Cheng. 2020. An effective framework based on local cores for self-labeled semi-supervised classification. Knowledge-Based Systems 197:105804. doi:https://doi.org/10.1016/j.knosys.2020.105804.
  • Li, Y.-F., and D.-M. Liang. 2019. Safe semi-supervised learning: A brief introduction. Frontier Computing Science, Springer-Verlag 13 (4):669–76. doi:https://doi.org/10.1007/s11704-019-8452-2.
  • Liu, Y., J.-W. Bi, and Z.-P. Fan. 2017. A method for multi-class sentiment classification based on an improved one-vs-one (OVO) strategy and the support vector machine (SVM) algorithm. Information Sciences, Elsevier 394:38–52. doi:https://doi.org/10.1016/j.ins.2017.02.016.
  • Madhyastha, P., and R. Jain. 2019. On model stability as a function of random seed. 929–39.
  • Mikolov, T., I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. 2013. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems 26, 3111–3119, Red Hook, NY, USA.
  • Mubarak, H., A. Rashed, K. Darwish, Y. Samih, and A. Abdelali. 2020. Arabic offensive language on Twitter: Analysis and experiments.
  • Mubarak, H., and K. Darwish. 2017. Abusive language detection on Arabic social media. In Proceeding of the First Workshop on Abusive Language Online, 52–56. Association for Computational Linguistics, Vancouver, BC, Canada.
  • Mulki, H., H. Haddad, C. B. Ali, and H. Alshabani. 2019. L-HSAB: A levantine Twitter dataset for hate speech and abusive language. In Proceeding of the Third Workshop on Abusive Language Online, 111–18. Association for Computational Linguistics, Florence, Italy.
  • Nartey, O. T., G. Yang, J. Wu, and S. K. Asare. 2020. Semi-supervised learning for fine-grained classification with self-training. IEEE Access 8:2109–21. doi:https://doi.org/10.1109/ACCESS.2019.2962258.
  • Ousidhoum, N. L. Zizheng, Z. Hongming, S. Yangqiu and Y. Dit-Yan. 2019. Multilingual and multi-aspect hate speech analysis. In Proc. of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 4675–84, Hong Kong, China.
  • Ousidhoum, N., Y. Song, and D.-Y. Yeung. 2020. Comparative evaluation of label-agnostic selection bias in multilingual hate speech datasets. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 2532–42. Association for Computational Linguistics.
  • Rosenthal, S., P. Atanasova, G. Karadzhov, M. Zampieri, and P. Nakov. 2020. A large-scale semi-supervised dataset for offensive language identification.
  • Sanh, V., L. Debut, J. Chaumond, and T. Wolf. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. ArXiv preprint arXiv:191910.01108.
  • van Engelen, J. E., and H. Hoos. 2019. A survey on semi-supervised learning. Machine Learning, Springer 109 (2):373–440. doi:https://doi.org/10.1007/s10994-019-05855-6.
  • Viegas, J. L., N. M. Cepeda, and S. M. Vieira. 2018. Electricity fraud detection using committee semi-supervised learning. In 2018 International Joint Conference on Neural Networks (IJCNN), 1–6, Rio de Janeiro,Brazil .
  • Wang, J., Y. Yang, J. Mao, Z. Huang, C. Huang, and X. Wei 2016. CNN-RNN: A unified framework for multi-label image classification. 2285–94.
  • Wu, D., M. Shang, X. Luo, J. Xu, H. Yan, W. Deng, G. Wang. 2018. Self-training semi-supervised classification based on density peaks of data. Neurocomputing 275(C):180–191. doi:https://doi.org/10.1016/j.neucom.2017.05.072.
  • Xia, M., A. Anastasopoulos, X. Ruochen, Y. Yang, and G. Neubig. 2020. On model stability as a function of random seed.
  • Xie, Q., M.-T. Luong, E. Hovy, and Q. V. Le. 2020. Self-training with noisy student improves imagenet classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June, Seattle, WA, USA.
  • Xu, Q., A. Baevski, T. Likhomanenko, P. Tomasello, A. Conneau, R. Collobert, G. Synnaeve, and M. Auli. 2020. Self-training and pre-training are complementary for speech recognition.
  • Zampieri, M., P. Nakov, S. Rosenthal, P. Atanasova, G. Karadzhov, H. Mubarak, L. Derczynski, Z. Pitenis, and Ç. Çöltekin. 2020. SemEval-2020 Task 12: Multilingual offensive language identification in social media (OffensEval 2020), 1425–47, Barcelona, Spain.
  • Zampieri, M., S. Malmasi, P. Nakov, S. Rosenthal, N. Farra, and R. Kumar. 2019. Predicting the type and target of offensive posts in social media. In Proceedings of the 2019 Conference of the North American Chapter of the Association for ComputationalLinguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 1415–20, Minneapolis, Minnesota.
  • Zhu, X., and A. Goldberg. 2009. Introduction to semi-supervised learning. Synthesis lectures on artificial intelligence and machine learning, 3:1-130. Morgan and Claypool, San Rafael, California, USA
  • Zoph, B., G. Ghiasi, T.-Y. Lin, Y. Cui, H. Liu, E. D. Cubuk, and Q. V. Le. 2020. Rethinking pre-training and self-training.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.