Abstract
Machine learning methods tend to outperform traditional statistical models at prediction. In the prediction of academic achievement, ML models have not shown substantial improvement over logistic regression. So far, these results have almost entirely focused on college achievement, due to the availability of administrative datasets, and have contained relatively small sample sizes by ML standards. In this article, we apply popular machine learning models to a large dataset (n = 1.2 million) containing primary and middle school performance on a standardized test given annually to Australian students. We show that machine learning models do not outperform logistic regression for detecting students who will perform in the “below standard” band of achievement upon sitting their next test, even in a large-n setting.
Acknowledgments
Thanks must be given to the Australian Curriculum, Assessment and Reporting Authority for the provision of the data utilized by this study. The authors would like to thank Foivos Diakogiannis and Airong Zhang for their helpful comments and suggestions.
Notes
1 See Shingari, Kumar, and Khetan (Citation2017) for a recent review.
2 See nap.edu.au
3 Samples of these tests may be found at https://www.nap.edu.au/naplan/the-tests.
4 nap.edu.au/results-and-reports/how-to-interpret/score-equivalence-tables
5 In addition to these classifiers, other common choices include k-nearest neighbors and support vector machines. However, computation of these classifiers does not scale well with number of predictors and sample size, respectively, and so were omitted from this study.
6 The mixing parameter, α, may also be tuned. However, this option is not included in the glmnet package. We re-estimated the classifiers with and
with no change to the results.
7 We performed the analysis with the two previous NAPLAN achievement variables as numerical scores rather than dummies for “at standard” or “below standard” achievement with no change to the results.
8 Employment status of the parents is time varying, but is only collected at the time of enrolment.