187
Views
0
CrossRef citations to date
0
Altmetric
ORIGINAL RESEARCH

A Principled Approach to Characterize and Analyze Partially Observed Confounder Data from Electronic Health Records

ORCID Icon, , , , , , ORCID Icon, , ORCID Icon, , , , ORCID Icon, & ORCID Icon show all
Pages 329-343 | Received 19 Aug 2023, Accepted 09 Apr 2024, Published online: 23 May 2024

References

  • Desai RJ, Matheny ME, Johnson K, et al. Broadening the reach of the FDA Sentinel system: a roadmap for integrating electronic health record data in a causal analysis framework. Npj Digit Med. 2021;4(1):170. doi:10.1038/s41746-021-00542-0
  • Little RJ, Rubin DB. Statistical Analysis with Missing Data. Vol. 793. John Wiley & Sons; 2019.
  • Heymans MW, Twisk JWR. Handling missing data in clinical research. J Clin Epidemiol. 2022;151:185–188. doi:10.1016/j.jclinepi.2022.08.016
  • Rubin DB. Inference and missing data. Biometrika. 1976;63(3):581–592. doi:10.1093/biomet/63.3.581
  • Lee KJ, Tilling KM, Cornish RP, et al. Framework for the treatment and reporting of missing data in observational studies: the treatment and reporting of missing data in observational studies framework. J Clin Epidemiol. 2021;134:79–88. doi:10.1016/j.jclinepi.2021.01.008
  • Carpenter JR, Smuk M. Missing data: a statistical framework for practice. Biom J. 2021;63(5):915–947. doi:10.1002/bimj.202000196
  • Sondhi A, Weberpals J, Yerram P, et al. A systematic approach towards missing lab data in electronic health records: a case study in non-small cell lung cancer and multiple myeloma. CPT Pharmacomet Syst Pharmacol. 2023;12:1201–1212. doi:10.1002/psp4.12998
  • Lee KJ, Carlin JB, Simpson JA, Moreno-Betancur M. Assumptions and analysis planning in studies with missing data in multiple variables: moving beyond the MCAR/MAR/MNAR classification. Int J Epidemiol. 2023;dyad008. doi:10.1093/ije/dyad008
  • Choi J, Dekkers OM, le Cessie S. A comparison of different methods to handle missing data in the context of propensity score analysis. Eur J Epidemiol. 2019;34(1):23–36. doi:10.1007/s10654-018-0447-z
  • Madley-Dowd P, Hughes R, Tilling K, Heron J. The proportion of missing data should not be used to guide decisions on multiple imputation. J Clin Epidemiol. 2019;110:63–73. doi:10.1016/j.jclinepi.2019.02.016
  • Franklin JM, Schneeweiss S, Polinski JM, Rassen JA. Plasmode simulation for the evaluation of pharmacoepidemiologic methods in complex healthcare databases. Comput Stat Data Anal. 2014;72:219–226. doi:10.1016/j.csda.2013.10.018
  • Nalichowski R, Keogh D, Chueh HC, Murphy SN. Calculating the Benefits of a Research Patient Data Repository. AMIA Annu Symp Proc. 2006;2006:1044.
  • Huitfeldt A, Stensrud MJ, Suzuki E. On the collapsibility of measures of effect in the counterfactual causal framework. Emerg Themes Epidemiol. 2019;16(1):1. doi:10.1186/s12982-018-0083-9
  • Bender R, Augustin T, Blettner M. Generating survival times to simulate Cox proportional hazards models. Stat Med. 2005;24(11):1713–1723. doi:10.1002/sim.2059
  • Gagne JJ, Glynn RJ, Avorn J, Levin R, Schneeweiss S. A combined comorbidity score predicted mortality in elderly patients better than existing scores. J Clin Epidemiol. 2011;64(7):749–759. doi:10.1016/j.jclinepi.2010.10.004
  • Stekhoven DJ, Bühlmann P. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics. 2012;28(1):112–118. doi:10.1093/bioinformatics/btr597
  • Buuren van S. Groothuis-Oudshoorn K. mice: multivariate imputation by chained equations in R. J Stat Softw. 2011;45:1–67. doi:10.18637/jss.v045.i03
  • Schouten RM, Lugtig P, Vink G. Generating missing values for simulation purposes: a multivariate amputation procedure. J Stat Comput Simul. 2018;88(15):2909–2930. doi:10.1080/00949655.2018.1491577
  • Mohan K, Pearl J. Graphical models for processing missing data. J Am Stat Assoc. 2021;116(534):1023–1037. doi:10.1080/01621459.2021.1874961
  • Moreno-Betancur M, Lee KJ, Leacy FP, White IR, Simpson JA, Carlin JB. Canonical causal diagrams to guide the treatment of missing data in epidemiologic studies. Am J Epidemiol. 2018;187(12):2705–2715. doi:10.1093/aje/kwy173
  • Hotelling H. The Generalization of Student’s Ratio. Ann Math Stat. 1931;2(3):360–378. doi:10.1214/aoms/1177732979
  • Little RJA. A test of missing completely at random for multivariate data with missing values. J Am Stat Assoc. 1988;83(404):1198–1202. doi:10.1080/01621459.1988.10478722
  • Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar Behav Res. 2011;46(3):399–424. doi:10.1080/00273171.2011.568786
  • Schober P, Vetter TR. Correct baseline comparisons in a randomized trial. Anesth Analg. 2019;129(3):639. doi:10.1213/ANE.0000000000004211
  • Ruddle RA, Adnan M, Hall M. Using set visualisation to find and explain patterns of missing values: a case study with NHS hospital episode statistics data. BMJ Open. 2022;12(11):e064887. doi:10.1136/bmjopen-2022-064887
  • Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Stat Med. 2019;38(11):2074–2102. doi:10.1002/sim.8086
  • Gasparini A. Rsimsum: summarise results from Monte Carlo simulation studies. J Open Source Softw. 2018;3(26):739. doi:10.21105/joss.00739
  • Patel UD, Hardy NC, Smith DH, et al. Validation of acute kidney injury cases in the mini-sentinel distributed database; 2013.
  • Weinstein RB, Ryan PB, Berlin JA, Schuemie MJ, Swerdel J, Fife D. Channeling bias in the analysis of risk of myocardial infarction, stroke, gastrointestinal bleeding, and acute renal failure with the use of paracetamol compared with ibuprofen. Drug Saf. 2020;43(9):927–942. doi:10.1007/s40264-020-00950-3
  • Gounden V, Bhatt H, Jialal I. Renal Function Tests. In: StatPearls. StatPearls Publishing; 2023. Available from. http://www.ncbi.nlm.nih.gov/books/NBK507821/. Accessed June 1, 2023.
  • Van Buuren S. Flexible Imputation of Missing Data. CRC press; 2018.
  • Carroll OU, Morris TP, Keogh RH. How are missing data in covariates handled in observational time-to-event studies in oncology? A systematic review. BMC Med Res Methodol. 2020;20(1):134. doi:10.1186/s12874-020-01018-7
  • Getz K, Hubbard RA, Linn KA. Performance of multiple imputation using modern machine learning methods in electronic health records data. Epidemiol Camb Mass. 2023;34(2):206–215. doi:10.1097/EDE.0000000000001578
  • Vader DT, Mamtani R, Li Y, Griffith SD, Calip GS, Hubbard RA. Inverse probability of treatment weighting and confounder missingness in electronic health record-based analyses: a comparison of approaches using plasmode simulation. Epidemiol Camb Mass. 2023;34:520–530. doi:10.1097/EDE.0000000000001618
  • Toh S, Rodríguez LAG, Hernán MA. Analyzing partially missing confounder information in comparative effectiveness and safety research of therapeutics. Pharmacoepidemiol Drug Saf. 2012;21(0 2):13–20. doi:10.1002/pds.3248
  • Seaman SR, White IR. Review of inverse probability weighting for dealing with missing data. Stat Methods Med Res. 2013;22(3):278–295. doi:10.1177/0962280210395740
  • Sun B, Tchetgen Tchetgen EJ. On inverse probability weighting for nonmonotone missing at random data. J Am Stat Assoc. 2018;113(521):369–379. doi:10.1080/01621459.2016.1256814
  • Mustillo S, Kwon S. Auxiliary variables in multiple imputation when data are missing not at random. J Math Sociol. 2015;39(2):73–91. doi:10.1080/0022250X.2013.877898
  • Leacy FP. Multiple Imputation under Missing Not at Random Assumptions via Fully Conditional Specification [Dissertation. Ph.D. Thesis]; 2018.
  • Tompsett DM, Leacy F, Moreno‐Betancur M, Heron J, White IR. On the use of the not‐at‐random fully conditional specification (NARFCS) procedure in practice. Stat Med. 2018;37(15):2338–2353. doi:10.1002/sim.7643
  • Weberpals J, Raman SR, Shaw PA, et al. smdi: an R package to perform structural missing data investigations on partially observed confounders in real-world evidence studies. JAMIA Open. 2024;7(1):ooae008. doi:10.1093/jamiaopen/ooae008