Intelligent Grouping Method of Science and Technology Projects Based on Data Augmentation and SMOTE

Can Zhoua School of Automation, Central South University, Changsha, China

Mengting Lia School of Automation, Central South University, Changsha, China

Sha Yub Special Management Department, China Science and Technology Exchange Center, Beijing, ChinaCorrespondence[email protected]

Article: 2145637 | Received 03 Jul 2022, Accepted 04 Nov 2022, Published online: 15 Nov 2022

Cite this article
https://doi.org/10.1080/08839514.2022.2145637
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

References

Alsmadi, I., and K. H. Gan. 2019. Review of short-text classification. International Journal of Web Information Systems 15 (2):155–3506. doi:10.1108/IJWIS-12-2017-0083.
Web of Science ®Google Scholar
Balkus, S., D. Yan, J. M. Shikany, S. V. Balkus, J. Rumbut, H. Ngo, H. Wang, J. J. Allison, and L. M. Steffen. 2022. A review of harmonization methods for studying dietary patterns. Smart health (Amsterdam, Netherlands) arXiv preprint arXiv:2205.10981. doi: 10.1016/j.smhl.2021.100263.
PubMedGoogle Scholar
Bayer, M., M.-A. Kaufhold, and C. Reuter. 2021. A survey on data augmentation for text classification. ACM Computing Surveys arXiv preprint arXiv:2107.03158. doi: 10.1145/3544558.
Web of Science ®Google Scholar
Brown, T., B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, and A. Askell. 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems 33:1877–901.
Google Scholar
Chang, W.-C., H.-F. Yu, K. Zhong, Y. Yang, and I. S. Dhillon. 2020. Taming pretrained transformers for extreme multi-label text classification. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining San Francisco, California, USA, 3163–71.
Google Scholar
Chawla, N. V., K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. Smote: Synthetic minority over-sampling technique. The Journal of Artificial Intelligence Research 16:321–57. doi:10.1613/jair.953.
Web of Science ®Google Scholar
Chen, T., and C. Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining San Francisco, California, USA, 785–94.
Google Scholar
Chen, J. A., Z. C. Yang, and D. Y. Yang. 2020. Mixtext: Linguistically-informed interpolation of hidden space for semi-supervised text classification. 58th Annual Meeting of the Association for Computational Linguistics (Acl Seattle, Washington, USA 2020): 2147–57.
Google Scholar
Chiu, K.-L., and R. Alexander. 2021. Detecting hate speech with gpt-3. arXiv preprint arXiv:2103.12407.
Google Scholar
Cortes, C., and V. Vapnik. 1995. Support-vector networks. Machine Learning 20 (3):273–97. doi:10.1007/BF00994018.
Web of Science ®Google Scholar
Deng, J., L. Cheng, and Z. Wang. 2021. Attention-based bilstm fused cnn with gating mechanism model for Chinese long text classification. Computer Speech & Language 68:101182. doi:10.1016/j.csl.2020.101182.
Web of Science ®Google Scholar
Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Google Scholar
Fang, Z., Z. Yang, L. Li, and T. Li. 2022. Multi-view project text classification based on cross-attention. Journal of Chinese Information Processing 36 (7):123–31.
Google Scholar
Flisar, J., V. Podgorelec, C. Badica, M. Ivanovic, Y. Manolopoulos, R. Rosati, and P. Torroni. 2020. Improving short text classification using information from dbpedia ontology. Fundamenta Informaticae 172 (3):261–97. doi:10.3233/FI-2020-1905.
Web of Science ®Google Scholar
Flores, A. C., R. I. Icoy, C. F. Pena, and K. D. Gorro. 2018. An evaluation of svm and naive bayes with smote on sentiment analysis data set. In 2018 International Conference on Engineering, Applied Sciences, and Technology (ICEAST), 1-4: IEEE.
Google Scholar
Hao, H., Y. Zhao, Z. Zheng, H. Yang, Z. Gao, S. Zhang, Z. Li, C. Che, L. Yang, and C. Wang. 2022. Proposal application, peer review and funding of national natural science foundation of china in 2021:An overview. Bulletin of National Natural Science Foundation of China 36 (01):3–6.
Google Scholar
Hartmann, J., J. Huppertz, C. Schamp, and M. Heitmann. 2019. Comparing automated text classification methods. International Journal of Research in Marketing 36 (1):20–38. doi:10.1016/j.ijresmar.2018.09.009.
Web of Science ®Google Scholar
He, H., and E. A. Garcia. 2009. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21 (9):1263–84. doi:10.1109/TKDE.2008.239.
Web of Science ®Google Scholar
Howard, J., and S. Ruder. 2018. Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146.
Google Scholar
Kim, Y. 2014. Convolutional neural networks for sentence classification, 1746-51. Doha, Qatar: Association for Computational Linguistics.
Google Scholar
Kim, S.-W., and J.-M. Gil. 2019. Research paper classification systems based on tf-idf and lda schemes. Human-Centric Computing and Information Sciences 9 (1). doi:10.1186/s13673-019-0192-7.
Web of Science ®Google Scholar
Li, B., Y. Hou, and W. Che. 2022. Data augmentation approaches in natural language processing: A survey. China: AI Open.
Google Scholar
Liu, G., and J. Guo. 2019. Bidirectional lstm with attention mechanism and convolutional layer for text classification. Neurocomputing 337:325–38. doi:10.1016/j.neucom.2019.01.078.
Web of Science ®Google Scholar
Liu, Y., P. Li, and X. Hu. 2022. Combining context-relevant features with multi-stage attention network for short text classification. Computer Speech & Language 71:101268. doi:10.1016/j.csl.2021.101268.
Web of Science ®Google Scholar
Liu, P., X. Qiu, and X. Huang. 2016. Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101.
Google Scholar
Liu, C. Z., Y. X. Sheng, Z. Q. Wei, and Y. Q. Yang. 2018. Research of text classification based on improved tf-idf algorithm. 2018 Ieee International Conference of Intelligent Robotics and Control Engineering (Irce) Lanzhou, China: 218–22.
Google Scholar
Liu, J., D. Shen, Y. Zhang, B. Dolan, L. Carin, and W. Chen. 2021. What makes good in-context examples for gpt-3? arXiv preprint arXiv:2101.06804.
Google Scholar
Liu, P., X. Wang, C. Xiang, and W. Meng. 2020. A survey of text data augmentation. Paper presentat at the 2020 International Conference on Computer Communication and Network Security (CCNS) Guilin, China.
Google Scholar
Luo, X. 2021. Efficient English text classification using selected machine learning techniques. Alexandria Engineering Journal 60 (3):3401–09. doi:10.1016/j.aej.2021.02.009.
Web of Science ®Google Scholar
Marivate, V., and T. Sefara. 2020. Improving short text classification through global augmentation methods. In International Cross-Domain Conference for Machine Learning and Knowledge Extraction, 385-99: Springer.
Google Scholar
Mikolov, T., K. Chen, G. Corrado, and J. Dean. 2013a. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Google Scholar
Mikolov, T., I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. 2013b. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems 26 .
Google Scholar
Noori, B. 2021. Classification of customer reviews using machine learning algorithms. Applied Artificial Intelligence 35 (8):567–88. doi:10.1080/08839514.2021.1922843.
Web of Science ®Google Scholar
Pennington, J., R. Socher, and C. D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) Doha, Qatar, 1532–43.
Google Scholar
Rezaeinia, S. M., R. Rahmani, A. Ghodsi, and H. Veisi. 2019. Sentiment analysis based on improved pre-trained word embeddings. Expert Systems with Applications 117:139–47. doi:10.1016/j.eswa.2018.08.044.
Web of Science ®Google Scholar
Samant, S. S., N. L. Bhanu Murthy, and A. Malapati. 2019. Improving term weighting schemes for short text classification in vector space model. IEEE Access 7:166578–92. doi:10.1109/ACCESS.2019.2953918.
Web of Science ®Google Scholar
Sarakit, P., T. Theeramunkong, and C. Haruechaiyasak. 2015. Improving emotion classification in imbalanced youtube dataset using smote algorithm. In 2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA), 1-5: IEEE.
Google Scholar
Sharma, A., and M. O. Shafiq. 2022. A comprehensive artificial intelligence based user intention assessment model from online reviews and social media. Applied Artificial Intelligence 36 (1):1–26. doi:10.1080/08839514.2021.2014193.
Web of Science ®Google Scholar
Shen, Y., Q. Zhang, J. Zhang, J. Huang, Y. Lu, and K. Lei. 2018. Improving medical short text classification with semantic expansion using word-cluster embedding. In International Conference on Information Science and Applications, 401-11: Springer.
Google Scholar
Shorten, C., T. M. Khoshgoftaar, and B. Furht. 2021. Text data augmentation for deep learning. Journal of Big Data 8 (1):101. doi:10.1186/s40537-021-00492-0.
PubMedGoogle Scholar
Sriram, B., D. Fuhry, E. Demir, H. Ferhatosmanoglu, and M. Demirbas. 2010. Short text classification in twitter to improve information filtering. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval Geneva, Switzerland, 841–42.
Google Scholar
Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 .
Google Scholar
Wang, Z., C. Wu, K. Zheng, X. Niu, and X. Wang. 2019. Smotetomek-based resampling for personality recognition. IEEE Access 7:129678–89. doi:10.1109/ACCESS.2019.2940061.
Web of Science ®Google Scholar
Wei, J., and K. Zou. 2019. Eda: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196.
Google Scholar
Xu, X., W. Chen, and Y. Sun. 2019. Over-sampling algorithm for imbalanced data classification. Journal of Systems Engineering and Electronics 30 (6):1182–91. doi:10.21629/JSEE.2019.06.12.
Web of Science ®Google Scholar
Yang, T., L. Hu, C. Shi, H. Ji, X. Li, and L. Nie. 2021. Hgat: Heterogeneous graph attention networks for semi-supervised short text classification. ACM Transactions on Information Systems (TOIS) 39 (3):1–29. doi:10.1145/3450352.
Web of Science ®Google Scholar
Yilmaz, S., and S. Toklu. 2020. A deep learning analysis on question classification task using word2vec representations. Neural Computing & Applications 32 (7):2909–28. doi:10.1007/s00521-020-04725-w.
Web of Science ®Google Scholar
Zeng, J., J. Jia, and W. Wu. 2018. Study of the construction of national technology program classification. Journal of the China Society for Scientific andTechnical Information 37 (8):796–804.
Google Scholar
Zhang, X., J. Zhao, and Y. Lecun. 2015. Character-level convolutional networks for text classification. Advances in Neural Information Processing Systems 28 .
Google Scholar
Zhao, Z., E. Wallace, S. Feng, D. Klein, and S. Singh. 2021b. Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning, 12697-706: PMLR.
Google Scholar
Zhao, Y., Z. Zheng, H. Hao, Z. Gao, S. Zhang, Z. Li, C. Che, Y. Wang, and C. Wang. 2021a. Proposal application, peer review and funding of nsfc in 2020:An overview. Bulletin of National Natural Science Foundation of China 35 (01):12–15.
Google Scholar
Zhu, L., G. J. Wang, and X. C. Zou. 2016a. A study of Chinese document representation and classification with word2vec. Proceedings of 2016 9th International Symposium on Computational Intelligence and Design (Iscid) Hangzhou, China, Vol 1: 298–302.
Google Scholar
Zhu, W., W. Zhang, G. Z. Li, C. He, and L. Zhang. 2016b. A study of damp-heat syndrome classification using word2vec and tf-idf. 2016 Ieee International Conference on Bioinformatics and Biomedicine (Bibm) Shenzhen, China: 1415–20.
Google Scholar

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Intelligent Grouping Method of Science and Technology Projects Based on Data Augmentation and SMOTE

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Intelligent Grouping Method of Science and Technology Projects Based on Data Augmentation and SMOTE

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date