1,049
Views
17
CrossRef citations to date
0
Altmetric
Articles

Early-predicting dropout of university students: an application of innovative multilevel machine learning and statistical techniques

, , ORCID Icon, ORCID Icon &
Pages 1935-1956 | Published online: 22 Dec 2021
 

ABSTRACT

This paper combines a theoretical-based model with a data-driven approach to develop an Early Warning System that detects students who are more likely to dropout. The model uses innovative multilevel statistical and machine learning methods. The paper demonstrates the validity of the approach by applying it to administrative data from a leading Italian university.

Acknowledgments

This research stems from an institutional initiative launched by Politecnico di Milano under the label ‘Data Analytics for Institutional Support’, which broad aim is to leverage the available (administrative) datasets of the university to analyze many aspects of the academic life, and support better decision-making. We are grateful to the University’s management for their support and encouragement and to the IT Office of Politecnico di Milano for their support in extracting data and pre-processing them. All the eventual errors are our solely responsibility.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 An important note is needed here. Dropout represents a net waste of resources in the cases in which students leave university, but sometimes they do so for switching major or university. In this latter case, the effect is not a net waste of resources for the society, but only for the abandoned university. The argument holds its validity then, although its application is dependent upon the specific definition of dropout. In this paper, we consider the viewpoint of the single university involved (see the section about Methodology and data).

2 It is worth to recall that relevance of covariates and threshold values in the splits are automatically identified by the tree, standing on certain input parameters.

3 We chose this threshold because the third semester after the enrolment represents the deadline for students to enrol in the second academic year.

4 In the early dropout analyses, late dropout students are excluded from the sample and vice-versa.

5 Tables in Annexes A1 and A2 report detailed results of Models 1a, 1b and 1c, for early and late dropout, respectively. The association between student-level covariates and the response remains coherent across the models.

6 We are aware that there could be a portion of students who do not take any attempts because they have already decided to drop, creating a potential endogeneity issue in studying the phenomenon. In order to check the robustness of our results and to avoid this potential confounding factor, we re-run our linear models for predicting early dropout excluding from the sample those students who did not take any attempts at the first semester. Results, reported in , confirm that student characteristics associated to the dropout probability, together with models predictive performance, remain quite unchanged (AUC indexes are slightly lower when excluding zero attempts students).

7 The technical and mathematical details about the computation of degree courses’ effects are reported in Pinheiro and Bates (Citation2006) and Pellagatti et al. (Citation2021).

8 We provide mean and interquartile range for numerical variables and percentage for categorical variables.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 678.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.