4,433
Views
4
CrossRef citations to date
0
Altmetric
Research Article

Enhanced Model for Predicting Student Dropouts in Developing Countries Using Automated Machine Learning Approach: A Case of Tanzanian’s Secondary Schools

ORCID Icon, &
Article: 2071406 | Received 15 Feb 2022, Accepted 25 Apr 2022, Published online: 07 May 2022

References

  • Aggarwal, C. C., A. Hinneburg, and D. A. Keim (2001). On the surprising behavior of distance metrics in high dimensional space. Database Theory ICDT 2001 8th International Conference, Januaary 4-6 London, UK, 433–451. 10.1007/3-540-44503-x_27
  • Agrapetidou, A., P. Charonyktakis, P. Gogas, T. Papadimitriou, and I. Tsamardinos. 2021. An AutoML application to forecasting bank failures. Applied Economics Letters 28 (1):5–9. doi:10.1080/13504851.2020.1725230.
  • Aguiar, E . 2015. Identifying Students at Risk and Beyond: A Machine Learning Approach. ProQuest LLC, Ann Arbor, Michigan, U.S: University of Notre Dame. Issue July.
  • Aissaoui, O. El, Madani, Y. E. A. El, Oughdir, L., Dakkak, A., & Allioui, Y. El. (2020). A Multiple Linear Regression-Based Approach to Predict Student Performance. International Conference on Advanced Intelligent Systems for Sustainable Development, 9–23. https://doi.org/10.1007/978-3-030-36653-7_2
  • Azad, M., I. Chikalov, S. Hussain, and M. Moshkov. 2021. Entropy-Based Greedy Algorithm for Decision Trees Using Hypotheses. Journal of Entropy 23 (808):1–8. doi:10.3390/e23070808.
  • Berens, J., K. Schneider, S. Gortz, S. Oster, and J. Burghoff. 2018. Early Detection of Students at Risk: Predicting Student Dropouts Using Administrative Student Data and Machine Learning Methods. Schumpeter School of Business and Economics 11 (3): 1–41. https://doi.org/10.5281/zenodo.3594771
  • Bergstra, J., and Y. Bengio. 2012. Random Search for Hyper-parameter Optimization. Journal of Machine Learning Research 13:281–305.
  • Bibi, T. 2018. Factors Affecting Dropout Rate at Secondary School Level in Private Schools of Punjab, Pakistan. International Journal of Management Sciences and Business Research 7 (4):1–7.
  • Breiman, L. 2001. Random Forests. Machine Learning 45 (1): 5–32.
  • Bridgeland, J. M., J. J. Dilulio, and K. B. Morison (2006). The Silent Epidemic: Perspectives of High School Dropouts. https://eric.ed.gov/?id=ED513444
  • Chareonrat, J. 2016. Student Dropout Factor Analysis and Trend Prediction using Decision Tree. Journal of Science and Technology 23 (2):187–93.
  • Emmanuel, T., T. Maupong, D. Mpoeleng, T. Semong, B. Mphago, and O. Tabona. 2021. A survey on Missing Data in Machine Learning. Journal of Big Data 8 (1): Springer International Publishing. doi:10.1186/s40537-021-00516-9.
  • Faruk, B. U. 2015. Assessment of Primary and Secondary Schools Education in Katsina State. International Journal of Strategic Research in Education, Technology and Humanities 2 (2):13–27.
  • Feurer, M., A. Klein, K. Eggensperger, J. T. Springenberg, M. Blum, and F. Hutter. 2015. Efficient and robust automated machine learning NIPS'15: Proceedings of the 28th International Conference on Neural Information Processing Systems December 7-12 2 (55 Hayward St., Cambridge, MA, United States: MIT Press) . 2755–2763 doi:10.5555/2969442.2969547.
  • Gada, M., Z. Haria, A. Mankad, K. Damania, and S. Sankhe (2021). Automated Feature Engineering and Hyperparameter optimization for Machine Learning. 2021 7th International Conference on Advanced Computing and Communication Systems, ICACCS 2021 March 19-20 1 (Coimbatore, India: Piscataway, New Jersey: IEEE) , 981–86. 10.1109/ICACCS51430.2021.9441668
  • Gil, J. S., A. J. P. Delima, and R. N. Vilchez. 2020. Predicting Students’ Dropout Indicators in Public School using Data Mining Approaches. International Journal of Advanced Trends in Computer Science and Engineering 9 (1):774–78. doi:10.30534/ijatcse/2020/110912020.
  • Guyon, I., and A. Elisseeff. 2003. An Introduction to Variable and Feature Selection 1 Introduction. Journal OfMachine Learning Research 3:1157–82.
  • He, X., K. Zhao, and X. Chu. 2021. AutoML: A survey of the state-of-the-art. Knowledge-Based Systems 212: . . 106622. 10.1016/j.knosys.2020.106622.
  • HRW. (2017). I Had a Dream to Finish School. Barriers to Secondary Education in Tanzania: Human Rught Watch (HRW). https://www.hrw.org/report/2017/02/14-had-dream-finish-school/barriers-secndary-education-tanzania
  • Hutagaol, N. S. 2019. Predictive modelling of student dropout using ensemble classifier method in higher education. Advances in Science, Technology and Engineering Systems 4 (4):206–11. doi:10.25046/aj040425.
  • Iam-On, N., and T. Boongoen. 2017. Generating Descriptive Model for Student Dropout: A Review of Clustering Approach. Human-Centric Computing and Information Sciences 7 (1):1–24. doi:10.1186/s13673-016-0083-0.
  • Jawthari, M., and V. Stoffová. 2021. Predicting Students’ Academic Performance using a Modified kNN algorithm. Pollack Periodica 16 (3):20–26. doi:10.1556/606.2021.00374.
  • Kemper, L., G. Vorhoff, and B. U. Wigger. 2020. Predicting Student Dropout: A Machine Learning Approach. European Journal of Higher Education 10 (1):28–47. doi:10.1080/21568235.2020.1718520.
  • Kumar, M., A. J. Singh, and D. Handa. 2017. Literature Survey on Educational Dropout Prediction. International Journal of Education and Management Engineering 7 (2):8–19. doi:10.5815/ijeme.2017.02.02.
  • Lee, S., and J. Y. Chung. 2019. The Machine Learning-Based Dropout Early Warning System for Improving the Performance of Dropout Prediction. Applied Sciences 9 (15):3093. doi:10.3390/app9153093.
  • Liashchynskyi, P., and P. Liashchynskyi. 2019. Grid Search, Random Search, Genetic Algorithm: ABig Comparison for NAS. . vol. 2017. 1–11. http://arxiv.org/abs/1912.06059
  • Liu, H., and R. Setiono (1995). Chi2: Feature Selection and Discretization of Numeric Attributes. In Proceedings of the IEEE 7th International Conference on Tools with Artificial Intelligence, November 5-8, Herndon, Virginia, USA (pp. 388–391). 10.1109/tai.1995.479783
  • Márquez-Vera, C., A. Cano, C. Romero, A. Y. M. Noaman, H. Mousa Fardoun, and S. Ventura. 2016. Early Dropout Prediction using Data Mining: A case Study with High School Students. Expert Systems 33 (1):107–24. doi:10.1111/exsy.12135.
  • Mduma, N., K. Kalegele, and D. Machuve. 2019. Machine Learning Approach for Reducing Students Dropout Rates. International Journal of Advanced Computer Research 9 (42):156–69. doi:10.19101/ijacr.2018.839045.
  • Mirza, T., and M. M. Hassan. 2020. Prediction of School Drop outs witht the help of Machine Learning Algorithms. GIS Science Journal 7 (7):253–63.
  • Muhajir, D., M. Akbar, A. Bagaskara, and R. Vinarti. 2022. Improving classification algorithm on education dataset using hyperparameter tuning. Procedia Computer Science 197:538–44. doi:10.1016/j.procs.2021.12.171.
  • Nagarajah, T., and G. Poravi (2019). A Review on Automated Machine Learning (AutoML) Systems. 2019 IEEE 5th International Conference for Convergence in Technology, I2CT 2019, Hamburg, Germany, 1–6. 10.1109/I2CT45611.2019.9033810
  • Nnamoko, N. A., F. N. Arshad, D. England, J. Vora, and J. Norman (2014). Evaluation of Filter and Wrapper Methods for Feature Selection in Supervised Machine Learning. PGNET Proceedings of the 15th Annual Postgraduate Symposium on the Convergence of Telecommunications, Networking and Broadcasting, Liverpool, United Kingdom.
  • Nurhayati, Putra, A. E., Wardhani, L. K., & Busman. (2019). Chi-Square Feature Selection Effect on Naive Bayes Classifier Algorithm Performance for Sentiment Analysis Document. 2019 7th International Conference on Cyber and IT Service Management, CITSM 2019, November. https://doi.org/10.1109/CITSM47753.2019.8965332
  • Page, M. J., J. E. McKenzie, P. M. Bossuyt, I. Boutron, T. C. Hoffmann, C. D. Mulrow, L. Shamseer, and J. M. Tetzlaff. 2021. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. Journal of Clinical Epidemiology 74 (9):790–99.
  • PO-RALG. 2019.Pre-Primary, Primary, Adult and Non Formal Education Statistics . (Dodoma, Tanzania: President's Office Regional Administration and Local Government
  • PO-RALG.2020 Pre-Primary, Primary, Adult and Non Formal Education Statistics .Dodoma, Tanzania: President's Office Regional Administration and Local Government.
  • Probst, P., A. L. Boulesteix, and B. Bischl. 2019. Tunability: Importance of hyperparameters of machine learning algorithms. Journal of Machine Learning Research 20:1–32.
  • Rezaie, M. G., H. S. Zadeh, H. Ying, and M. Dong. 2010. Selection-Fusion Approach for Classification of Datasets with Missing Values. Pattern Recognit 43 (6):1–27. doi:10.1016/j.patcog.2009.12.003.
  • Rovira, S., E. Puertas, and L. Igual. 2017. Data-Driven System to Predict Academic Grades and Dropout. PLoS ONE 12 (2):1–21. doi:10.1371/journal.pone.0171207.
  • Said, H. 2020. Developing Dropout Predictive System for Secondary Schools, By Using Classification Algorithm: A Case Study of Tabora Region. Dodoma City: University of Dodoma
  • Sansone, D. 2019. Beyond Early Warning Indicators: High School Dropout and Machine Learning. Oxford Bulletin of Economics and Statistics 81 (2):456–85. doi:10.1111/obes.12277.
  • Sara, N. B., R. Halland, C. Igel, and S. Alstrup (2015). High-School Dropout Prediction using Machine Learning: A Danish Large-scale Study. ESANN 2015 Proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, April 22-24, Bruges, Belgium.
  • Schaer, R., H. Müller, and A. Depeursinge. 2016. Optimized distributed hyperparameter search and simulation for lung texture classification in CT using Hadoop. Journal of Imaging 2 (2):19. doi:10.3390/jimaging2020019.
  • Stastica. 2022. Pupils out of lower secondary school by gender and region Brahms Kontor, Hamburg. https://www.statista.com.
  • Tangirala, S. 2020. Evaluating the impact of GINI index and information gain on classification using decision tree classifier algorithm. International Journal of Advanced Computer Science and Applications 11 (2):612–19. doi:10.14569/ijacsa.2020.0110277.
  • Tuggener, L., M. Amirian, K. Rombach, S. Lorwald, A. Varlet, C. Westermann, and T. Stadelmann (2019). Automated Machine Learning in Practice: State of the Art and Recent Results. Proceedings - 6th Swiss Conference on Data Science, SDS 2019, Los Alamitos, California, 31–36. 10.1109/SDS.2019.00-11
  • URT. 2008. The United Republic of Tanzania: Education Sector Development Programme 2008-2017. Dar es Salaam, Tanzania: Ministry of Education, Science and Technology.
  • Vaccaro, L., G. Sansonetti, and A. Micarelli. 2021. An empirical review of automated machine learning. Journal of Computers 10 (1):1–27. doi:10.3390/computers10010011.
  • Venkatesh, B., and J. Anuradha. 2019. A Review of Feature Selection and its Methods. Cybernetics and Information Technologies 19 (1):3–26. doi:10.2478/CAIT-2019-0001.
  • Verleysen, M., and D. François. 2005. Computational Intelligence and Bioinspired Systems, 8th International Work-Conference on Artificial Neural Networks, IWANN 2005 June 8-10, Vilanova i la Geltrú, Barcelona, Spain. . 758–770.
  • Vujović, Ž. 2021. Classification Model Evaluation Metrics. International Journal of Advanced Computer Science and Applications 12 (6):599–606. doi:10.14569/IJACSA.2021.0120670.
  • Wen, L., X. Ye, and L. Gao. 2020. A new Automatic Machine Learning based Hyperparameter Optimization for Workpiece Quality Prediction. Measurement and Control (United Kingdom) 53 (7–8):1088–98. doi:10.1177/0020294020932347.
  • Whaley, D. L. (2005). The Interquartile Range: Theory and Estimation. In Electronic Theses and Dissertations. http://dc.etsu.edu/etd%0Ahttp://dc.etsu.edu/etd
  • Witte, K., S. Cabus, G. Thyssen, W. Groot, and H. M. Van Den Brink. 2013. A critical review of the literature on school dropout. Educational Research Review 10:13–28. doi:10.1016/j.edurev.2013.05.002.
  • World Bank. (2014). How Tanzania Plans to Achieve “Big Results Now” in Education. http://www.worldbank.org/en/news/feature/2014/07/10/how-tanzania-plans-to-achieve-big-reforms-now-in-education
  • Wu, J., X. Y. Chen, H. Zhang, L. D. Xiong, H. Lei, and S. H. Deng. 2019. Hyperparameter optimization for machine learning models based on Bayesian optimization. Journal of Electronic Science and Technology 17 (1):26–40. doi:10.11989/JEST.1674-862X.80904120.
  • Yang, L., and A. Shami. 2020. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 415:295–316. doi:10.1016/j.neucom.2020.07.061.
  • Zahedi, L., F. G. Mohammadi, S. Rezapour, M. W. Ohland, and M. H. Amini (2021). Search Algorithms for Automated Hyper-Parameter Tuning. 1–10. http://arxiv.org/abs/2104.14677
  • Zaman, M., S. Kaul, and M. Ahmed. 2020. Analytical Comparison between the Information Gain and Gini index using Historical Geographical Data. International Journal of Advanced Computer Science and Applications 11 (5):429–40. doi:10.14569/IJACSA.2020.0110557.
  • Zeineddine, H., U. Braendle, and A. Farah. 2021. Enhancing prediction of student success: Automated machine learning approach. Computers and Electrical Engineering 89 (November):106903. doi:10.1016/j.compeleceng.2020.106903.
  • Zhao, X., K. Liu, W. Fan, L. Jiang, X. Zhao, M. Yin, and Y. Fu (2020). Simplifying Reinforced Feature Selection via Restructured Choice Strategy of Single Agent. Proceedings - IEEE International Conference on Data Mining, ICDM, 2020-Novem Sorrento, Italy, 871–80. 10.1109/ICDM50108.2020.00096
  • Zou, K. H., A. J. O’Malley, and L. Mauri. 2007. Receiver-operating characteristic analysis for evaluating diagnostic tests and predictive models. Circulation 115 (5):654–57. doi:10.1161/CIRCULATIONAHA.105.594929.