ABSTRACT
Student dropout is a major concern in studies investigating retention strategies in higher education. This study identifies which variables are important to predict student dropout, using academic data from 3583 first-year students on the Business Administration (BA) degree at the University of Barcelona (Spain). The results indicate that two variables, the percentage of subjects failed and not attended in the first semester, demonstrate significant predictive power. This has been corroborated with an additional sample of 10,784 students from three-degree programs (Law, BA, and Economics) at the Complutense University of Madrid (Spain), to assess the robustness of the results. Three different algorithms have also been utilized: neural networks, random forest, and logit. In the specific case of neural networks, the NeuralSens methodology has been employed, which is based on the use of sensitivities, allowing for its interpretation. The outcomes are highly consistent in all cases: both a simple model (logit) and more sophisticated ones (neural networks and random forest) exhibit high accuracy (correctly predicted values) and sensitivity (correctly predicted dropouts). In test set average values of 77% and 69% have been respectively achieved. In this regard, a noteworthy point is that only academic data from the university itself was used to develop the models. This ensures that there’s no dependence on other personal or organizational variables, which can often be difficult to access.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Notes
1 Other studies, such as that of Lizarte Simón and Gijón Puerta (Citation2022), in this case using a sample of students from Early Childhood, Primary, and Social Education and Pedagogy degree programs, achieve an accuracy of 91%, using predictors derived from a survey that evaluates various academic dimensions. This means, once again, the model requires access to a series of variables that are challenging to obtain.