2,144
Views
5
CrossRef citations to date
0
Altmetric
Monte Carlo and Optimization Methods

Multiple Imputation Through XGBoost

ORCID Icon & ORCID Icon
Pages 352-363 | Received 01 Jun 2021, Accepted 30 Jul 2023, Published online: 19 Oct 2023

References

  • Awada, Z., Bouaoun, L., Nasr, R., Tfayli, A., Cuenin, C., Akika, R., Boustany, R.-M., Makoukji, J., Tamim, H., Zgheib, N. K., and Ghantous, A. (2021), “LINE-1 Methylation Mediates the Inverse Association Between Body Mass Index and Breast Cancer Risk: A Pilot Study in the Lebanese Population,” Environmental Research, 197, 111094. DOI: 10.1016/j.envres.2021.111094.
  • Baldi, P., Sadowski, P., and Whiteson, D. (2014), “Searching for Exotic Particles in High-Energy Physics with Deep Learning,” Nature Communications, 5, 4308. DOI: 10.1038/ncomms5308.
  • Brand, J. P., van Buuren, S., Groothuis-Oudshoorn, K., and Gelsema, E. S. (2003), “A Toolkit in SAS for the Evaluation of Multiple Imputation Methods,” Statistica Neerlandica, 57, 36–45. DOI: 10.1111/1467-9574.00219.
  • Breiman, L. (2001), “Random Forests,” Machine Learning, 45, 5–32. DOI: 10.1023/A:1010933404324.
  • Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984), Classification and Regression Trees, Belmont, CA: Wadsworth.
  • Breslow, N. E., and Chatterjee, N. (1999), “Design and Analysis of Two-Phase Studies with Binary Outcome Applied to Wilms Tumour Prognosis,” Journal of the Royal Statistical Society, Series C, 48, 457–468. DOI: 10.1111/1467-9876.00165.
  • Breslow, N. E., Lumley, T., Ballantyne, C. M., Chambless, L. E., and Kulich, M. (2009), “Using the Whole Cohort in the Analysis of Case-Cohort Data,” American Journal of Epidemiology, 169, 1398–1405. DOI: 10.1093/aje/kwp055.
  • Chen, T., and Guestrin, C. (2016), “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York: Association for Computing Machinery, pp. 785–794.
  • Chen, T., and Lumley, T. (2020), “Optimal Multiwave Sampling for Regression Modeling in Two-Phase Designs,” Statistics in Medicine, 39, 4912–4921. DOI: 10.1002/sim.8760.
  • D’Angio, G. J., Breslow, N., Beckwith, J. B., Evans, A., Baum, E., Delorimier, A., Fernbach, D., Hrabovsky, E., Jones, B., Kelalis, P. et al. (1989), “Treatment of Wilms’ Tumor. Results of the Third National Wilms’ Tumor Study,” Cancer, 64, 349–360. DOI: 10.1002/1097-0142(19890715)64:2<349::AID-CNCR2820640202>3.0.CO;2-Q.
  • Deng, Y. (2023), mixgb: Multiple Imputation Through XGBoost, R package version 1.0.1. DOI: 10.1080/10618600.2023.2252501.
  • Doove, L. L., van Buuren, S., and Dusseldorp, E. (2014), “Recursive Partitioning for Missing Data Imputation in the Presence of Interaction Effects,” Computational Statistics & Data Analysis, 72, 92–104. DOI: 10.1016/j.csda.2013.10.025.
  • Kaggle (2016), Allstate Claims Severity Dataset, available at https://www.kaggle.com/competitions/allstate-claims-severity/data.
  • Kelly, M., Longjohn, R., and Nottingham, K. (2023), The UCI Machine Learning Repository, available at https://archive.ics.uci.edu.
  • Kulich, M., and Lin, D. Y. (2004), “Improving the Efficiency of Relative-Risk Estimation in Case-Cohort Studies,” Journal of the American Statistical Association, 99, 832–844. DOI: 10.1198/016214504000000584.
  • Little, R. J. (1988), “Missing-Data Adjustments in Large Surveys,” Journal of Business & Economic Statistics, 6, 287–296. DOI: 10.2307/1391878.
  • Mersmann, O. (2021), microbenchmark: Accurate Timing Functions, R package version 1.4.9.
  • R Core Team (2022), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing.
  • Rubin, D. B. (1978), “Multiple Imputations in Sample Surveys-A Phenomenological Bayesian Approach to Nonresponse,” in Proceedings of the Survey Research Methods Section, American Statistical Association, pp. 20–34.
  • ———(1986), “Statistical Matching Using File Concatenation with Adjusted Weights and Multiple Imputations,” Journal of Business & Economic Statistics, 4, 87–94.
  • ———(1987), Multiple Imputation for Nonresponse in Surveys, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics, New York: Wiley.
  • Stekhoven, D. J., and Bühlmann, P. (2012), “MissForest–Non-Parametric Missing Value Imputation for Mixed-Type Data,” Bioinformatics, 28, 112–118. DOI: 10.1093/bioinformatics/btr597.
  • Su, Y. S., Gelman, A., Hill, J., and Yajima, M. (2011), “Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box,” Journal of Statistical Software, 45, 1–31. DOI: 10.18637/jss.v045.i02.
  • van Buuren, S. (2018), Flexible Imputation of Missing Data (2nd ed.), Boca Raton, FL: Chapman & Hall/CRC Press.
  • van Buuren, S., and Groothuis-Oudshoorn, K. (2011), “Mice: Multivariate Imputation by Chained Equations in R,” Journal of Statistical Software, 45, 1–67. DOI: 10.18637/jss.v045.i03.
  • Wendt, F. R., Pathak, G. A., Levey, D. F., Nuñez, Y. Z., Overstreet, C., Tyrrell, C., Adhikari, K., De Angelis, F., Tylee, D. S., Goswami, A., Krystal, J. H., Abdallah, C. G., Stein, M. B., Kranzler, H. R., Gelernter, J., and Polimanti, R. (2021), “Sex-Stratified Gene-by-Environment Genome-Wide Interaction Study of Trauma, Posttraumatic-Stress, and Suicidality,” Neurobiology of Stress, 14, 100309. DOI: 10.1016/j.ynstr.2021.100309.
  • Wickham, H. (2016), ggplot2: Elegant Graphics for Data Analysis, New York: Springer-Verlag.
  • Wright, M. N., and Ziegler, A. (2017), “Ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R,” Journal of Statistical Software, 77, 1–17. DOI: 10.18637/jss.v077.i01.
  • Yeh, I.-C., and Lien, C.-h. (2009), “The Comparisons of Data Mining Techniques for the Predictive Accuracy of Probability of Default of Credit Card Clients,” Expert systems with applications, 36, 2473–2480. DOI: 10.1016/j.eswa.2007.12.020.
  • Zhang, X., Yan, C., Gao, C., Malin, B., and Chen, Y. (2019), “XGBoost Imputation for Time Series Data,” in 2019 IEEE International Conference on Healthcare Informatics (ICHI), pp. 1–3. DOI: 10.1109/ICHI.2019.8904666.