29
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Unsupervised Imputation of Non-Ignorably Missing Data Using Importance-Weighted Autoencoders

, , &
Received 19 Oct 2023, Accepted 11 Jun 2024, Published online: 15 Jul 2024
 

Abstract

Deep Learning (DL) methods have dramatically increased in popularity in recent years. While its initial success was demonstrated in the classification and manipulation of image data, there has been significant growth in the application of DL methods to problems in the biomedical sciences. However, the greater prevalence and complexity of missing data in biomedical datasets present significant challenges for DL methods. Here, we provide a formal treatment of missing data in the context of Variational Autoencoders (VAEs), a popular unsupervised DL architecture commonly used for dimension reduction, imputation, and learning latent representations of complex data. We propose a new VAE architecture, NIMIWAE, that is one of the first to flexibly account for both ignorable and non-ignorable patterns of missingness in input features at training time. Following training, samples can be drawn from the approximate posterior distribution of the missing data can be used for multiple imputation, facilitating downstream analyses on high dimensional incomplete datasets. We demonstrate through statistical simulation that our method outperforms existing approaches for unsupervised learning tasks and imputation accuracy. We conclude with a case study of an EHR dataset pertaining to 12,000 ICU patients containing a large number of diagnostic measurements and clinical outcomes, where many features are only partially observed.

Supplementary Materials

Supplementary Materials:Contains details of the NIMIWAE algorithm in Section A, additional simulation results and computational details in Sections B, and links and details of analyzed datasets in Section C. (pdf)

R-package for NIMIWAE:R-package NIMIWAE containing code to perform the diagnostic methods described in the article. The package can also be found at https://www.github.com/DavidKLim/NIMIWAE. (GNU zipped tar file)

Code for Reproducibility:Repository of code to reproduce all results, tables, and figures in the article. This repository can also be found at https://www.github.com/DavidKLim/NIMIWAE_Paper. (GNU zipped tar file)

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

The authors gratefully acknowledge NIH grants U01-CA274298, P50-CA257911, P50-CA058223, T32-CA106209, 1R01AA02687901A1, and 1OT2OD032581-02-321, and NSF grants IIS2133595 and DMS2324394 for funding this research.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 71.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.