69
Views
0
CrossRef citations to date
0
Altmetric
Original Articles

Improving Performance Prediction on Education Data with Noise and Class Imbalance

&

References

  • Anyfantis, D., Karagiannopoulos, M., Kotsiantis, S., & Pintelas, P. (2007). Robustness of learning techniques in handling class noise in imbalanced datasets. In Artificial intelligence and innovations 2007: From theory to applications Proc. IFIP Int. Federation Inform. Process., vol. 247. (pp. 21–28). Springer.10.1007/978-0-387-74161-1
  • Batista, G.E.A.P.A., Prati, R.C., & Monard, M.C. (2004). A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations Newsletter, 6, 20–29.10.1145/1007730
  • Berrar, D., & Flach, P. (2011). Caveats and pitfalls of ROC analysis in clinical microarray research (and how to avoid them). Briefings in bioinformatics, 13, 83–97.
  • Blagus, R., & Lusa, L. (2013). SMOTE for high-dimensional class-imbalanced data. BMC Bioinformatics, 14(1): 1–16.
  • Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.10.1023/A:1010933404324
  • Brodley, C.E., & Friedl, M.A. (1999). Identifying mislabelled training data. Journal of Artificial Intelligence Research, 11, 131–167.
  • Chawla, N.V., Bowyer, K.W., Hall, L.O., & Kegelmeyer, W.P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 341–378.
  • Cortez, P., & Silva, A. (2008). Using data mining to predict secondary school student performance. In Proceedings of 5th Annual Future Business Technology Conference (pp. 5–12). Porto, Portugal.
  • Dietterich, T.G. (2000). Ensemble Methods in Machine Learning. In Proceedings of the First International Workshop on Multiple Classifier Systems (pp. 1–15). London, UK.
  • Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27, 861–874.10.1016/j.patrec.2005.10.010
  • Gamberger, D., Boskovic, R., Lavrac, N., & Groselj, C. (1999). Experiments with noise filtering in a medical domain. In Proc. of 16th ICML (pp. 143–151), San Francisco, CA.
  • García, V., Sánchez, J. S., Martín-Félez, R., & Mollineda, R. A. (2012). Surrounding neighborhood-based SMOTE for learning from imbalanced data sets. Progress in Artificial Intelligence, 1(4), 347–362.
  • Guo, X., Yin, Y., Dong, C., Yang, G., & Zhou, G. (2008). On the class imbalance problem. Fourth International Conference on Natural Computation IEEE, 4, 192–201.
  • Han, H., Wang, W.Y., & Mao, B.H. (2005). Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In D.-S. Huang, X.P. Zhang, & G.-B. Huang (Eds.), Advances in intelligent computing (pp. 878–887). Springer Berlin Heidelberg.10.1007/11538059
  • He, H., & Garcia, E. A. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.
  • Khoshgoftaar, T.M., & Rebours, P. (2007). Improving software quality prediction by noise filtering techniques. Journal of Computer Science and Technology, 223, 387–396.10.1007/s11390-007-9054-2
  • Khoshgoftaar, T.M., Joshi, V., & Seliya, N. (2006). Detecting noisy instances with the ensemble filter: A study in software quality estimation. International Journal of Software Engineering and Knowledge Engineering, 16, 53–76.10.1142/S0218194006002677
  • López, V., Fernández, A., García, S., Palade, V., & Herrera, F. (2013). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences, 250, 113–141.10.1016/j.ins.2013.07.007
  • Márquez-Vera, C., Cano, A., Romero, C., & Ventura, S. (2013). Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Applied Intelligence, 38, 315–330.10.1007/s10489-012-0374-8
  • Napierała, K., Stefanowski, J., & Wilk, S. (2010). Learning from imbalanced data in presence of noisy and borderline examples. In M. Szczuka, M. Kryszkiewicz, S. Ramanna, R. Jensen, & Q. Hu (Eds.), Rough sets and current trends in computing (pp. 158–167). Springer Berlin Heidelberg.10.1007/978-3-642-13529-3
  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … Duchesnay, E. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12, 2825–2830.
  • Radwan, A.M., Birkan, B., Hania, F., & Cataltepe, Z. (2017). Active machine learning framework for teaching object recognition skills to children with autism. International Journal of Developmental Disabilities, 63, 158–169. doi:10.1080/20473869.2016.1190543
  • Sáez, J. A., Luengo, J., Stefanowski, J., & Herrera, F. (2014). Managing borderline and noisy examples in imbalanced classification by combining SMOTE with ensemble filtering. In E. Corchado, J. A. Lozano, H. Quintián, & H. Yin (Eds.), Intelligent Data Engineering and Automated Learning-IDEAL 2014 (pp. 61–68). Berlin, Heidelberg: Springer.
  • Sáez, J.A., Luengo, J., Stefanowski, J., & Herrera, F. (2015). SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Information Sciences, 291, 184–203.10.1016/j.ins.2014.08.051
  • Satyanarayana, A., & Nuckowski, M. (2016). Data mining using ensemble classifiers for improved prediction of student academic performance. ASEE Mid-Atlantic Section Spring 2016 Conference, Washington, DC.
  • Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., & Napolitano, A. (2010). RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 40, 185–197.10.1109/TSMCA.2009.2029559
  • Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., & Folleco, A. (2014). An empirical study of the classification performance of learners on imbalanced and noisy software quality data. Information Sciences, 259, 571–595.10.1016/j.ins.2010.12.016
  • Sheng, V.S., & Ling, C.X. (2006). Thresholding for making classifiers cost-sensitive. In Proceedings of the 21st National Conference on Artificial Intelligence (pp. 476–481). Boston, MA: Massachusetts.
  • Sluban, B., Gamberger, D., & Lavrač, N. (2014). Ensemble-based noise detection: Noise ranking and visual performance evaluation. Data Mining and Knowledge Discovery, 28, 265–303.10.1007/s10618-012-0299-1
  • Thai-Nghe, N., Busche, A., & Schmidt-Thieme, L. (2009). Improving academic performance prediction by dealing with class imbalance. Ninth International Conference on Intelligent Systems Design and Applications, (ISDA 2009) (pp. 878–883). Pisa, Italy: IEEE Computer Society.10.1109/ISDA.2009.15
  • Thai-Nghe, N., Gantner, Z., & Schmidt-Thieme, L. (2010). Cost-sensitive learning methods for imbalanced data. In The 2010 International Joint Conference on Neural Networks (IJCNN) (pp. 1–8), Barcelona, Spain, 2010.10.1109/IJCNN.2010.5596486
  • Ting, K.M. (1998). Inducing cost-sensitive trees via instance weighting. In J.M. Żytkow & M. Quafafou (Eds.), Principles of data mining and knowledge discovery (pp. 139–147). Heidelberg: Springer, Berlin.10.1007/BFb0094798
  • Tomek, I. (1976). Two modifications of CNN. IEEE Transactions on Systems, Man, and Cybernetics, SMC-6, 11, 769–772.
  • Truong, Y., Lin, X., & Beecher, C. (2004). Learning a complex metabolomic data set using random forests and support vector machines. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp 835−840). New York, NY: ACM.
  • Wu, X., & Zhu, X. (2008). Mining with noise knowledge: error-aware data mining. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 38, 917–932.10.1109/TSMCA.2008.923034
  • Yang, Q., & Wu, X. (2006). 10 challenging problems in data mining research. International Journal of Information Technology & Decision Making, 5, 597–604.10.1142/S0219622006002258
  • Zhu, X., & Wu, X. (2004). Class noise vs. Attribute noise: A quantitative study. Artificial Intelligence Review, 22, 177–210.10.1007/s10462-004-0751-8
  • Zhu, X., Wu, X., & Chen, Q. (2003). Eliminating class noise in large datasets. In Proceedings of the 20th ICML (pp. 920-927), Washington, DC.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.