1,249
Views
0
CrossRef citations to date
0
Altmetric
Statistical Practice

Missing Data Imputation with High-Dimensional Data

ORCID Icon &
Pages 240-252 | Received 03 Feb 2022, Accepted 19 Aug 2023, Published online: 17 Nov 2023

References

  • Audigier, V., Husson, F., and Josse, J. (2016), “Multiple Imputation for Continuous Variables Using a Bayesian Principal Component Analysis,” Journal of Statistical Computation and Simulation, 86, 2140–2156. DOI: 10.1080/00949655.2015.1104683.
  • Bras, L. P., and Menezes, J. C. (2006), “Dealing with Gene Expression Missing Data,” IEE Proceedings-Systems Biology, 153, 105–119. DOI: 10.1049/ip-syb:20050056.
  • Burgette, L. F., and Reiter, J. P. (2010), “Multiple Imputation for Missing Data via Sequential Regression Trees,” American Journal of Epidemiology, 172, 1070–1076. DOI: 10.1093/aje/kwq260.
  • Chandrasekher, K. A., Alaoui, A. E., and Montanari, A. (2020), “Imputation for High-Dimensional Linear Regression,” https://arxiv.org/abs/2001.09180.
  • Degenhardt, F., Seifert, S., and Szymczak, S. (2017), “Evaluation of Variable Selection Methods for Random Forests and Omics Data Sets,” Briefings in Bioinformatics, 20, 492–503. DOI: 10.1093/bib/bbx124.
  • Deng, Y., Chang, C., Seyoum Ido, M., and Long, Q. (2016), “Multiple Imputation for General Missing Data Patterns in the Presence of High-Dimensional Data,” Scientific Reports, 6, 21689. DOI: 10.1038/srep21689.
  • Doove, L. L., Van Buuren, S., and Dusseldorp, E. (2014), “Recursive Partitioning for Missing Data Imputation in the Presence of Interaction Effects,” Computational Statistics & Data Analysis, 72, 92–104. DOI: 10.1016/j.csda.2013.10.025.
  • Dray, S., and Josse, J. (2015), “Principal Component Analysis with Missing Values: A Comparative Survey of Methods,” Plant Ecology, 216, 657–667. DOI: 10.1007/s11258-014-0406-z.
  • Engel, J., Buydens, L., and Blanchet, L. (2017), “An Overview of Large-Dimensional Covariance and Precision Matrix Estimators with Applications in Chemometrics,” Journal of Chemometrics, 31, e2880. DOI: 10.1002/cem.2880.
  • Hamid, Z., Zimmerman, K. D., Guillen-Ahlers, H., Li, C., Nathanielsz, P., Cox, L. A., and Olivier, M. (2021), “Assessment of Label-Free Quantification and Missing Value Imputation for Proteomics in Non-human Primates,” bioRxiv. http://biorxiv.org/content/early/2021/07/31/2021.07.30.454221.abstract.
  • Honaker, J., King, G., and Blackwell, M. (2015), “Package “amelia ii”,” Journal of Statistical Software, 45, 1–54. DOI: 10.18637/jss.v045.i07.
  • Howard, W. J., Rhemtulla, M., and Little, T. D. (2015), “Using Principal Components as Auxiliary Variables in Missing Data Estimation,” Multivariate Behavioural Research, 3, 285–299. DOI: 10.1080/00273171.2014.999267.
  • Josse, J., and Husson, F. (2012), “Selecting the Number of Components in Pincipal Component Analysis Using Cross-Validation Approximations,” Computational Statistics & Data Analysis, 56, 1869–1879. http://www.sciencedirect.com/science/article/pii/S0167947311004099. DOI: 10.1016/j.csda.2011.11.012.
  • Josse, J., and Husson, F. (2016), “missmda: A Package for Handling Missing Values in Multivariate Data Analysis,” Journal of Statistical Software, 70, 1–31. DOI: 10.18637/jss.v070.i01.
  • Josse, J., Pags, J., and Husson, F. (2011), “Multiple Imputation in Principal Component Analysis,” Advances in Data Analysis and Classification, 5, 231–246. DOI: 10.1007/s11634-011-0086-7.
  • Kucheryavskiy, S. (2020), “mdatools - r Package for Chemometrics,” Chemometrics and Intelligent Laboratory Systems, 198, 103937. https://www.sciencedirect.com/science/article/pii/S0169743919305672. DOI: 10.1016/j.chemolab.2020.103937.
  • Lang, K. M. (2015), “Miben: Robust Multiple Imputation with the Bayesian Elastic-Net.”
  • Lang, K. M. (2019), Mibrr: Multiple Imputation with Bayesian Regularized Regression. R package version 0.3.0.9000. Available at http://github.com/kylelang/MIBRR
  • Laqueur, H. S., Shev, A. B., and Kagawa, R. M. C. (2022), “Supermice: An Ensemble Machine Learning Approach to Multiple Imputation by Chained Equations,” American Journal of Epidemiology, 191, 516–525. DOI: 10.1093/aje/kwab271.
  • Ledoit, O., and Wolf, M. (2004), “A Well-Conditioned Estimator for Large-Dimensional Covariance Matrices,” Journal of Multivariate Analysis, 88, 365–411. DOI: 10.1016/S0047-259X(03)00096-4.
  • Li, Q., and Lin, N. (2010), “The Bayesian Elastic Net,” Bayesian Analysis, 5, 151–170. DOI: 10.1214/10-BA506.
  • Liao, S. G., Lin, Y., Kang, D. D., Chandra, D., Bon, J., Kaminski, N., Sciurba, F. C., and Tseng, G. C. (2014), “Missing Value Imputation in High-Dimensional Phenomic Data: Imputable or Not, and How?” BMC Bioinformatics, 15, 346. DOI: 10.1186/s12859-014-0346-6.
  • Little, R. J. A. (1988), “Missing Data Adjustments in Large Surveys,” Journal of Business & Economic Statistics, 6, 287–296. DOI: 10.2307/1391878.
  • Matrov, D., Imbeault, S., Kanarik, M., Shkolnaya, M., Schikorra, P., Miljan, R., Shimmo, E., and Harro, J. (2020), “Comprehensive Mapping of Cytochrome c Oxidase Activity in the Rat Brain After Sub-Chronic Ketamine Administration,” Acta Histochemica, 122, 151531. DOI: 10.1016/j.acthis.2020.151531.
  • Molenberghs, G., and Verbeke, G. (2005), “Models for Discrete Longitudinal Data,” in Springer Series in Statistics, Diepenbeek and Leuven: Springer.
  • Newgard, C. D., and Haukoos, J. S. (2007), “Advanced Statistics: Missing Data in Clinical Research - Part 2: Multiple Imputation,” Academic Emergency Medicine, 14, 669–678. DOI: 10.1197/j.aem.2006.11.038.
  • Rubin, D. B. (2004), Multiple Imputation for Nonresponse in Surveys (Vol. 81), Hoboken, NJ: Wiley.
  • Samudrala, D., Geurts, B., Brown, P. A., Szymaska, E., Mandon, J., Jansen, J., Buydens, L., Harren, F. J. M., and Cristescu, S. M. (2015), “Changes in Urine Headspace Composition as an Effect of Strenuous Walking,” Metabolomics, 11, 1656–1666. DOI: 10.1007/s11306-015-0813-8.
  • Schafer, J. L. (1997), Analysis of Incomplete Multivariate Data, London: Chapman and Hall.
  • Schafer, J. L., and Graham, J. (2002), “Missing Data: Our View of the State of the Art,” Psychological Methods 2, 147–177. DOI: 10.1037/1082-989X.7.2.147.
  • Shah, A. D., Bartlett, J. W., Carpenter, J., Nicholas, O., and Hemingway, H. (2014), “Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE: A Caliber Study,” American Journal of Epidemiology, 179, 764–774. DOI: 10.1093/aje/kwt312.
  • Soroushmehr, R. S. M., and Najarian, K. (2016), “Transforming Big Data into Computational Models for Personalized Medicine and Health Care,” Dialogues in Clinical Neuroscience, 18, 339–343. https://europepmc.org/articles/PMC5067150. DOI: 10.31887/DCNS.2016.18.3/ssoroushmehr.
  • Tibshirani, R. (1996), “Regression Shrinkage and Selection via the Lasso,” Journal of the Royal Statistical Society, Series B, 58, 267–288. http://www.jstor.org/stable/2346178. DOI: 10.1111/j.2517-6161.1996.tb02080.x.
  • van Buuren, S. (2018), Flexible Imputation of Missing Sata (2nd ed.), Boca Raton, FL: CRC Press.
  • van Buuren, S., and Groothuis-Oudshoorn, K. (2011), “mice: Multivariate Imputation by Chained Equations in R,” Journal of Statistical Software, 45, 1–67. DOI: 10.18637/jss.v045.i03.
  • Verbanck, M., Josse, J., and Husson, F. (2015), “Regularised PCA to Denoise and Visualise Data,” Statistics and Computing, 25, 471–486. DOI: 10.1007/s11222-013-9444-y.
  • Voillet, V., Besse, P., Liaubet, L., San Cristobal, M., and González, I. (2016), “Handling Missing Rows in Multi-Omics Data Integration: Multiple Imputation in Multiple Factor Analysis Framework,” BMC Bioinformatics, 17, 402. DOI: 10.1186/s12859-016-1273-5.
  • Wold, H. (1966), “Estimation of Principal Components and Related Models by Iterative Least Squares,” in Multivariate Analysis, ed. P. R. Krishnajah, pp. 391–420, New York: Academic Press.
  • Wold, S., Esbensen, K., and Geladi, P. (1987), “Principal Component Analysis,” Chemometrics and Intelligent Laboratory Systems, 2, 37–52. ISSN 0169-7439. http://www.sciencedirect.com/science/article/pii/0169743987800849. Proceedings of the Multivariate Statistical Workshop for Geologists and Geochemists. DOI: 10.1016/0169-7439(87)80084-9.
  • Zahid, F. M. (2018), mispr: Multiple Imputation with Sequential Penalized Regression. R package version 1.0.0. vailable at https://cran.r-project.org/package=mispr
  • Zahid, F. M., and Heumann, C. (2019), “Multiple Imputation with Sequential Penalized Regression,” Statistical Methods in Medical Research, 28, 1311–1327. DOI: 10.1177/0962280218755574.
  • Zahid, F. M., Faisal, S., and Heumann, C. (2021), “Multiple Imputation with Compatibility for High-Dimensional Data,” PloS One, 16, e0254112. DOI: 10.1371/journal.pone.0254112.
  • Zhao, Y., and Long, Q. (2016), “Multiple Imputation in the Presence of High-Dimensional Data,” Statistical Methods in Medical Research, 25, 2021–2035. DOI: 10.1177/0962280213511027.
  • Zou, H. (2006), “The Adaptive Lasso and its Oracle Properties,” Journal of the American Statistical Association, 101, 1418–1429. DOI: 10.1198/016214506000000735.
  • Zou, H., and Hastie, T. (2012), elasticnet: Elastic-Net for Sparse Estimation and Sparse PCA, R package version 1.1. Available at https://cran.r-project.org/package=elasticnet