179
Views
3
CrossRef citations to date
0
Altmetric
Original Articles

Machine learning classifiers do not improve the prediction of academic risk: Evidence from Australia

&
 

Abstract

Machine learning methods tend to outperform traditional statistical models at prediction. In the prediction of academic achievement, ML models have not shown substantial improvement over logistic regression. So far, these results have almost entirely focused on college achievement, due to the availability of administrative datasets, and have contained relatively small sample sizes by ML standards. In this article, we apply popular machine learning models to a large dataset (n = 1.2 million) containing primary and middle school performance on a standardized test given annually to Australian students. We show that machine learning models do not outperform logistic regression for detecting students who will perform in the “below standard” band of achievement upon sitting their next test, even in a large-n setting.

Acknowledgments

Thanks must be given to the Australian Curriculum, Assessment and Reporting Authority for the provision of the data utilized by this study. The authors would like to thank Foivos Diakogiannis and Airong Zhang for their helpful comments and suggestions.

Notes

1 See Shingari, Kumar, and Khetan (Citation2017) for a recent review.

2 See nap.edu.au

3 Samples of these tests may be found at https://www.nap.edu.au/naplan/the-tests.

4 nap.edu.au/results-and-reports/how-to-interpret/score-equivalence-tables

5 In addition to these classifiers, other common choices include k-nearest neighbors and support vector machines. However, computation of these classifiers does not scale well with number of predictors and sample size, respectively, and so were omitted from this study.

6 The mixing parameter, α, may also be tuned. However, this option is not included in the glmnet package. We re-estimated the classifiers with α=0.1 and α=0.9 with no change to the results.

7 We performed the analysis with the two previous NAPLAN achievement variables as numerical scores rather than dummies for “at standard” or “below standard” achievement with no change to the results.

8 Employment status of the parents is time varying, but is only collected at the time of enrolment.

Additional information

Funding

This research is supported by an Australian Government Research Training Program (RTP) Scholarship.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.