Abstract
The widespread adoption of Online Platforms for our day-to-day life is increasingly contributing to the rise of Online Aggression and its escalation. Consequently, there is a need for a robust mechanism that could automatically recognize online aggressive content to foster a safer and more respectful online environment. In this work, we explored the usage of four well-known Transformer-based Language Models i.e. BERT, RoBERTa, ALBERT, and XLNet, and on top employed three varied kinds of Ensemble Techniques such as Logistic Regression, Random Forest, and XGBoost on the predictions obtained from those standalone Transformer Models and aim to actively contribute to the advancement of automated Cyber Aggression Identification system. We identified the noise involved and a few outright erroneous data labels in the popular shared task Aggression dataset TRAC-1 and developed an improvised dataset called ReLbTRAC-1 by resolving the identified issues. The proposed models are evaluated on improvised TRAC-1, and TRAC-2 datasets, and they outperformed the state-of-art methods on the TRAC-1 dataset and most existing approaches on the TRAC-2 dataset, including all individual Transformer-based classifiers employed in the proposed Ensemble techniques, obtaining admirable results on all metrics considered (such as Accuracy of 0.6305, and 0.7550 and weighted F1-Score of 0.6518, and 0.7512 on TRAC-1 and TRAC-2 datasets respectively).
Disclosure statement
No potential conflict of interest was reported by the author(s).
Data availability statement
The data that support the findings of this study are available from the corresponding author, [SC], upon reasonable request.