713
Views
2
CrossRef citations to date
0
Altmetric
Research Article

Optimal feature selection and invasive weed tunicate swarm algorithm-based hierarchical attention network for text classification

, &
Article: 2231171 | Received 24 Jan 2023, Accepted 24 Jun 2023, Published online: 10 Jul 2023

References

  • 20 Newsgroup dataset. Retrieved November, 2020, from https://www.kaggle.com/datasets/crawford/20-newsgroups.
  • Abdollahzadeh, B., Gharehchopogh, F. S., Khodadadi, N., & Mirjalili, S. (2022). Mountain gazelle optimizer: A new nature-inspired metaheuristic algorithm for global optimization problems. Advances in Engineering Software, 174, 103282. doi:10.1016/j.advengsoft.2022.103282
  • Aldjanabi, W., Dahou, A., Al-qaness, M. A. A., Elaziz, M. A., Helmi, A. M., & Damaševičius, R. (2021). Arabic offensive and hate speech detection using a cross-corpora multi-task learning model. Informatics, 8(4), 69. doi:10.3390/informatics8040069
  • Anagnostopoulos, I., Anagnostopoulos, C., Loumos, V., & Kayafas, E. (2004). Classifying web pages employing a probabilistic neural network. IEEE Proceedings-Software, 151(3), 139–150. doi:10.1049/ip-sen:20040121
  • Azizi, M., Talatahari, S., & Gandomi, A. H. (2023). Fire Hawk optimizer: A novel metaheuristic algorithm. Artificial Intelligence Review, 56(1), 287–363. doi:10.1007/s10462-022-10173-w
  • Belazzoug, M., Touahria, M., Nouioua, F., & Brahimi, M. (2020). An improved sine cosine algorithm to select features for text categorization. Journal of King Saud University-Computer and Information Sciences, 32(4), 454–464. doi:10.1016/j.jksuci.2019.07.003
  • Borhani, M. (2020). Multi-label log-loss function using L-BFGS for document categorization. Engineering Applications of Artificial Intelligence, 91, 103623. doi:10.1016/j.engappai.2020.103623
  • Che, Y., & He, D. (2022). An enhanced seagull optimization algorithm for solving engineering optimization problems. Applied Intelligence, 52(11), 13043–13081. doi:10.1007/s10489-021-03155-y
  • Chen, R. C., & Hsieh, C. H. (2006). Web page classification based on a support vector machine using a weighted vote schema. Expert Systems with Applications, 31(2), 427–435. doi:10.1016/j.eswa.2005.09.079
  • Coban, O. (2022). A new modification and application of item response theory-based feature selection for different machine learning tasks. Concurrency and Computation: Practice and Experience, 34(26), doi:10.1002/cpe.7282
  • Drucker, H., Wu, D., & Vapnik, V. N. (1999). Support vector machines for spam categorization. IEEE Transactions on Neural networks, 10(5), 1048–1054. doi:10.1109/72.788645
  • ElAmine Chennafi, M., Bedlaoui, H., Dahou, A., & Al-qaness, M. A. A. (2022). Arabic Aspect-Based Sentiment Classification Using Seq2Seq Dialect Normalization and Transformers. Knowledge, 2(3), 388–401. doi:10.3390/knowledge2030022
  • Fan, Y., Zhang, W., Bai, J., Lei, X., & Li, K. (2023). Privacy-preserving deep learning on big data in cloud. China Communications.
  • Feng, F., Li, K. C., Yang, E., Zhou, Q., Han, L., Hussain, A., & Cai, M. (2023). A novel oversampling and feature selection hybrid algorithm for imbalanced data classification. Multimedia Tools and Applications, 82(3), 3231–3267. doi:10.1007/s11042-022-13240-0
  • Gasmi, K. (2022). Improving bert-based model for medical text classification with an optimization algorithm. In the proceeding of International Conference on Computational Collective Intelligence, 1653, 101–111.
  • Gharehchopogh, F. S., Maleki, I., & Dizaji, Z. A. (2021). Chaotic vortex search algorithm: metaheuristic algorithm for feature selection. Evolutionary Intelligence.
  • Goudjil, M., Koudil, M., Bedda, M., & Ghoggali, N. (2018). A novel active learning method using SVM for text classification. International Journal of Automation and Computing, 15(3), 290–298. doi:10.1007/s11633-015-0912-z
  • Günal, S., Ergin, S., Gülmezoğlu, M. B., & Gerek, ÖN. (2006). On feature extraction for spam e-mail detection. In International Workshop on Multimedia Content Representation, Classification and Security, Springer, Berlin, Heidelberg, 635–642.
  • Guzella, T. S., & Caminhas, W. M. (2009). A review of machine learning approaches to spam filtering. Expert Systems with Applications, 36(7), 10206–10222. doi:10.1016/j.eswa.2009.02.037
  • Han, D., Pan, N., & Li, K. C. (2022). A traceable and revocable ciphertext-policy attribute-based encryption scheme based on privacy protection. IEEE Transactions on Dependable and Secure Computing, 19(1), 316–327. doi:10.1109/TDSC.2020.2977646
  • Jin, L., Zhang, L., & Zhao, L. (2023). Feature selection based on absolute deviation factor for text classification. Information Processing and Management, 60(3).
  • Kaur, S., Awasthi, L. K., Sangal, A. L., & Dhiman, G. (2020). Tunicate Swarm algorithm: A new bio-inspired based metaheuristic paradigm for global optimization. Engineering Applications of Artificial Intelligence, 90, 103541. doi:10.1016/j.engappai.2020.103541
  • Koller, D., & Sahami, M. (1997). Hierarchically classifying documents using very few words. StanfordInfoLab.
  • Kou, G., Yang, P., Peng, Y., Xiao, F., Chen, Y., & Alsaadi, F. E. (2020). Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods. Applied Soft Computing, 86.
  • Lim, H., & Kim, D. W. (2020). Generalized term similarity for feature selection in text classification using quadratic programming. Entropy, 22(4), 395. doi:10.3390/e22040395
  • Liu, G., & Guo, J. (2019). Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing, 337, 325–338. doi:10.1016/j.neucom.2019.01.078
  • Liu, Y., Ju, S., Wang, J., & Su, C. (2020). A new feature selection method for text classification based on independent feature space search. Mathematical Problems in Engineering.
  • Liu, Y., Sun, C. J., Lin, L., Wang, X., & Zhao, Y. (2015). Computing semantic text similarity using rich features. In Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation, 44–52.
  • Maragheh, H. K., Gharehchopogh, F. S., Majidzadeh, K., & Sangar, A. B. (2022). A new hybrid based on long short-term memory network with spotted hyena optimization algorithm for multi-label text classification. Mathematics, 10, 1–24.
  • Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., & Gao, J. (2020). Deep learning based text classification: A comprehensive review. ACM Computing, 1(1), 1–43.
  • Misaghi, M., & Yaghoobi, M. (2019). Improved invasive weed optimization algorithm (IWO) based on chaos theory for optimal design of PID controller. Journal of Computational Design and Engineering, 6(3), 284–295. doi:10.1016/j.jcde.2019.01.001
  • Mohammadzadeh, H., & Gharehchopogh, F. S. (2021). Feature selection with binary symbiotic organisms search algorithm for Email spam detection. International Journal of Information Technology and Decision Making, 20(1), 469–515. doi:10.1142/S0219622020500546
  • Naruei, I., & Keynia, F. (2022). Wild horse optimizer: A new meta-heuristic algorithm for solving engineering optimization problems. Engineering with Computers, 38(S4), 3025–3056. doi:10.1007/s00366-021-01438-z
  • Parlak, B., & Uysal, A. K. (2023). A novel filter feature selection method for text classification: Extensive feature selector. Journal of Information Science, 49(1), 59–78. doi:10.1177/0165551521991037
  • Pedersen, B. P., Ifrim, G., Liboriussen, P., Axelsen, K. B., Palmgren, M. G., Nissen, P., Wiuf, C., & Pedersen, C. N. S. (2014). Large scale identification and categorization of protein sequences using structured logistic regression. PLOS ONE, 9(1), e85139. doi:10.1371/journal.pone.0085139
  • Ranjan, N. M., & Prasad, R. S. (2018). LFNN: Lion fuzzy neural network-based evolutionary model for text classification using context and sense based features. Applied Soft Computing, 71, 994–1008. doi:10.1016/j.asoc.2018.07.016
  • Reuters-21578 Text Categorization Collection Data Set. Retrieved November, 2020, from https://archive.ics.uci.edu/ml/datasets/reuters-21578+text+categorization+collection.
  • Roy, P. K., Tripathy, A. K., Weng, T. H., & Li, K. C. (2023). Securing social platform from misinformation using deep learning. Computer Standards & Interfaces, 84, 103674. doi:10.1016/j.csi.2022.103674
  • Saeed, M. M., & Al Aghbari, Z. (2022). ARTC: Feature selection using association rules for text classification. Neural Computing and Applications, 34(24), 22519–22529. doi:10.1007/s00521-022-07669-5
  • Şahin, DÖ, & Kılıç, E. (2019). Two new feature selection metrics for text classification. Automatika, 60(2), 162–171. doi:10.1080/00051144.2019.1602293
  • Stein, R. A., Jaques, P. A., & Valiati, J. F. (2019). An analysis of hierarchical text classification using word embeddings. Information Sciences, 471, 216–232. doi:10.1016/j.ins.2018.09.001
  • Tanhaeean, M., Moghaddam, R. T., & Akbari, A. H. (2022). Boxing Match algorithm: a new meta-heuristic algorithm. Soft Computing, 26(24), 13277–13299. doi:10.1007/s00500-022-07518-6
  • Thirumoorthy, K., & Muneeswaran, K. (2020). Optimal feature subset selection using hybrid binary Jaya optimization algorithm for text classification. Sadhana, 45(1), 1–13. doi:10.1007/s12046-020-01443-w
  • Vidyadhari, C. H., Sandhya, N., & Premchand, P. (2019). A semantic word processing using enhanced cat swarm optimization algorithm for automatic text clustering. Multimedia Research, 2(4), 23–32.
  • Wang, H., & Hong, M. (2019). Supervised Hebb rule based feature selection for text classification. Information Processing & Management, 56(1), 167–191. doi:10.1016/j.ipm.2018.09.004
  • Wang, L., Cao, Q., Zhang, Z., Mirjalili, S., & Zhao, W. (2022). Artificial rabbits optimization: A new bio-inspired meta-heuristic algorithm for solving engineering optimization problems. Engineering Applications of Artificial Intelligence, 114.
  • Wei, J., & Zou, K. (2019). Eda: Easy data augmentation techniques for boosting performance on text classification tasks. In the Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, Association for Computational Linguistics, 6382–6388.
  • Wu, D., Yang, R., & Shen, C. (2020). Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm. Journal of Intelligent Information Systems, 1–23.
  • Xue, J., & Shen, B. (2023). Dung beetle optimizer: A new meta-heuristic algorithm for global optimization. The Journal of Supercomputing, 79(7), 7305–7336. doi:10.1007/s11227-022-04959-6
  • Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1480–1489.
  • Yu, B., & Zhu, D. H. (2009). Combining neural networks and semantic feature space for email classification. Knowledge-Based Systems, 22(5), 376–381. doi:10.1016/j.knosys.2009.02.009
  • Zhou, H., Li, X., Wang, C., & Ma, Y. (2022). A feature selection method based on term frequency difference and positive weighting factor. Data and Knowledge Engineering, 141, 102060. doi:10.1016/j.datak.2022.102060