753
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Intelligent Grouping Method of Science and Technology Projects Based on Data Augmentation and SMOTE

, &
Article: 2145637 | Received 03 Jul 2022, Accepted 04 Nov 2022, Published online: 15 Nov 2022

References

  • Alsmadi, I., and K. H. Gan. 2019. Review of short-text classification. International Journal of Web Information Systems 15 (2):155–3506. doi:10.1108/IJWIS-12-2017-0083.
  • Balkus, S., D. Yan, J. M. Shikany, S. V. Balkus, J. Rumbut, H. Ngo, H. Wang, J. J. Allison, and L. M. Steffen. 2022. A review of harmonization methods for studying dietary patterns. Smart health (Amsterdam, Netherlands) arXiv preprint arXiv:2205.10981. doi: 10.1016/j.smhl.2021.100263.
  • Bayer, M., M.-A. Kaufhold, and C. Reuter. 2021. A survey on data augmentation for text classification. ACM Computing Surveys arXiv preprint arXiv:2107.03158. doi: 10.1145/3544558.
  • Brown, T., B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, and A. Askell. 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems 33:1877–901.
  • Chang, W.-C., H.-F. Yu, K. Zhong, Y. Yang, and I. S. Dhillon. 2020. Taming pretrained transformers for extreme multi-label text classification. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining San Francisco, California, USA, 3163–71.
  • Chawla, N. V., K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. Smote: Synthetic minority over-sampling technique. The Journal of Artificial Intelligence Research 16:321–57. doi:10.1613/jair.953.
  • Chen, T., and C. Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining San Francisco, California, USA, 785–94.
  • Chen, J. A., Z. C. Yang, and D. Y. Yang. 2020. Mixtext: Linguistically-informed interpolation of hidden space for semi-supervised text classification. 58th Annual Meeting of the Association for Computational Linguistics (Acl Seattle, Washington, USA 2020): 2147–57.
  • Chiu, K.-L., and R. Alexander. 2021. Detecting hate speech with gpt-3. arXiv preprint arXiv:2103.12407.
  • Cortes, C., and V. Vapnik. 1995. Support-vector networks. Machine Learning 20 (3):273–97. doi:10.1007/BF00994018.
  • Deng, J., L. Cheng, and Z. Wang. 2021. Attention-based bilstm fused cnn with gating mechanism model for Chinese long text classification. Computer Speech & Language 68:101182. doi:10.1016/j.csl.2020.101182.
  • Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  • Fang, Z., Z. Yang, L. Li, and T. Li. 2022. Multi-view project text classification based on cross-attention. Journal of Chinese Information Processing 36 (7):123–31.
  • Flisar, J., V. Podgorelec, C. Badica, M. Ivanovic, Y. Manolopoulos, R. Rosati, and P. Torroni. 2020. Improving short text classification using information from dbpedia ontology. Fundamenta Informaticae 172 (3):261–97. doi:10.3233/FI-2020-1905.
  • Flores, A. C., R. I. Icoy, C. F. Pena, and K. D. Gorro. 2018. An evaluation of svm and naive bayes with smote on sentiment analysis data set. In 2018 International Conference on Engineering, Applied Sciences, and Technology (ICEAST), 1-4: IEEE.
  • Hao, H., Y. Zhao, Z. Zheng, H. Yang, Z. Gao, S. Zhang, Z. Li, C. Che, L. Yang, and C. Wang. 2022. Proposal application, peer review and funding of national natural science foundation of china in 2021:An overview. Bulletin of National Natural Science Foundation of China 36 (01):3–6.
  • Hartmann, J., J. Huppertz, C. Schamp, and M. Heitmann. 2019. Comparing automated text classification methods. International Journal of Research in Marketing 36 (1):20–38. doi:10.1016/j.ijresmar.2018.09.009.
  • He, H., and E. A. Garcia. 2009. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21 (9):1263–84. doi:10.1109/TKDE.2008.239.
  • Howard, J., and S. Ruder. 2018. Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146.
  • Kim, Y. 2014. Convolutional neural networks for sentence classification, 1746-51. Doha, Qatar: Association for Computational Linguistics.
  • Kim, S.-W., and J.-M. Gil. 2019. Research paper classification systems based on tf-idf and lda schemes. Human-Centric Computing and Information Sciences 9 (1). doi:10.1186/s13673-019-0192-7.
  • Li, B., Y. Hou, and W. Che. 2022. Data augmentation approaches in natural language processing: A survey. China: AI Open.
  • Liu, G., and J. Guo. 2019. Bidirectional lstm with attention mechanism and convolutional layer for text classification. Neurocomputing 337:325–38. doi:10.1016/j.neucom.2019.01.078.
  • Liu, Y., P. Li, and X. Hu. 2022. Combining context-relevant features with multi-stage attention network for short text classification. Computer Speech & Language 71:101268. doi:10.1016/j.csl.2021.101268.
  • Liu, P., X. Qiu, and X. Huang. 2016. Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101.
  • Liu, C. Z., Y. X. Sheng, Z. Q. Wei, and Y. Q. Yang. 2018. Research of text classification based on improved tf-idf algorithm. 2018 Ieee International Conference of Intelligent Robotics and Control Engineering (Irce) Lanzhou, China: 218–22.
  • Liu, J., D. Shen, Y. Zhang, B. Dolan, L. Carin, and W. Chen. 2021. What makes good in-context examples for gpt-3? arXiv preprint arXiv:2101.06804.
  • Liu, P., X. Wang, C. Xiang, and W. Meng. 2020. A survey of text data augmentation. Paper presentat at the 2020 International Conference on Computer Communication and Network Security (CCNS) Guilin, China.
  • Luo, X. 2021. Efficient English text classification using selected machine learning techniques. Alexandria Engineering Journal 60 (3):3401–09. doi:10.1016/j.aej.2021.02.009.
  • Marivate, V., and T. Sefara. 2020. Improving short text classification through global augmentation methods. In International Cross-Domain Conference for Machine Learning and Knowledge Extraction, 385-99: Springer.
  • Mikolov, T., K. Chen, G. Corrado, and J. Dean. 2013a. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
  • Mikolov, T., I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. 2013b. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems 26 .
  • Noori, B. 2021. Classification of customer reviews using machine learning algorithms. Applied Artificial Intelligence 35 (8):567–88. doi:10.1080/08839514.2021.1922843.
  • Pennington, J., R. Socher, and C. D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) Doha, Qatar, 1532–43.
  • Rezaeinia, S. M., R. Rahmani, A. Ghodsi, and H. Veisi. 2019. Sentiment analysis based on improved pre-trained word embeddings. Expert Systems with Applications 117:139–47. doi:10.1016/j.eswa.2018.08.044.
  • Samant, S. S., N. L. Bhanu Murthy, and A. Malapati. 2019. Improving term weighting schemes for short text classification in vector space model. IEEE Access 7:166578–92. doi:10.1109/ACCESS.2019.2953918.
  • Sarakit, P., T. Theeramunkong, and C. Haruechaiyasak. 2015. Improving emotion classification in imbalanced youtube dataset using smote algorithm. In 2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA), 1-5: IEEE.
  • Sharma, A., and M. O. Shafiq. 2022. A comprehensive artificial intelligence based user intention assessment model from online reviews and social media. Applied Artificial Intelligence 36 (1):1–26. doi:10.1080/08839514.2021.2014193.
  • Shen, Y., Q. Zhang, J. Zhang, J. Huang, Y. Lu, and K. Lei. 2018. Improving medical short text classification with semantic expansion using word-cluster embedding. In International Conference on Information Science and Applications, 401-11: Springer.
  • Shorten, C., T. M. Khoshgoftaar, and B. Furht. 2021. Text data augmentation for deep learning. Journal of Big Data 8 (1):101. doi:10.1186/s40537-021-00492-0.
  • Sriram, B., D. Fuhry, E. Demir, H. Ferhatosmanoglu, and M. Demirbas. 2010. Short text classification in twitter to improve information filtering. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval Geneva, Switzerland, 841–42.
  • Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 .
  • Wang, Z., C. Wu, K. Zheng, X. Niu, and X. Wang. 2019. Smotetomek-based resampling for personality recognition. IEEE Access 7:129678–89. doi:10.1109/ACCESS.2019.2940061.
  • Wei, J., and K. Zou. 2019. Eda: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196.
  • Xu, X., W. Chen, and Y. Sun. 2019. Over-sampling algorithm for imbalanced data classification. Journal of Systems Engineering and Electronics 30 (6):1182–91. doi:10.21629/JSEE.2019.06.12.
  • Yang, T., L. Hu, C. Shi, H. Ji, X. Li, and L. Nie. 2021. Hgat: Heterogeneous graph attention networks for semi-supervised short text classification. ACM Transactions on Information Systems (TOIS) 39 (3):1–29. doi:10.1145/3450352.
  • Yilmaz, S., and S. Toklu. 2020. A deep learning analysis on question classification task using word2vec representations. Neural Computing & Applications 32 (7):2909–28. doi:10.1007/s00521-020-04725-w.
  • Zeng, J., J. Jia, and W. Wu. 2018. Study of the construction of national technology program classification. Journal of the China Society for Scientific andTechnical Information 37 (8):796–804.
  • Zhang, X., J. Zhao, and Y. Lecun. 2015. Character-level convolutional networks for text classification. Advances in Neural Information Processing Systems 28 .
  • Zhao, Z., E. Wallace, S. Feng, D. Klein, and S. Singh. 2021b. Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning, 12697-706: PMLR.
  • Zhao, Y., Z. Zheng, H. Hao, Z. Gao, S. Zhang, Z. Li, C. Che, Y. Wang, and C. Wang. 2021a. Proposal application, peer review and funding of nsfc in 2020:An overview. Bulletin of National Natural Science Foundation of China 35 (01):12–15.
  • Zhu, L., G. J. Wang, and X. C. Zou. 2016a. A study of Chinese document representation and classification with word2vec. Proceedings of 2016 9th International Symposium on Computational Intelligence and Design (Iscid) Hangzhou, China, Vol 1: 298–302.
  • Zhu, W., W. Zhang, G. Z. Li, C. He, and L. Zhang. 2016b. A study of damp-heat syndrome classification using word2vec and tf-idf. 2016 Ieee International Conference on Bioinformatics and Biomedicine (Bibm) Shenzhen, China: 1415–20.