Abstract
This article reviews recent advances in missing data research using graphical models to represent multivariate dependencies. We first examine the limitations of traditional frameworks from three different perspectives: transparency, estimability, and testability. We then show how procedures based on graphical models can overcome these limitations and provide meaningful performance guarantees even when data are missing not at random (MNAR). In particular, we identify conditions that guarantee consistent estimation in broad categories of missing data problems, and derive procedures for implementing this estimation. Finally, we derive testable implications for missing data models in both missing at random and MNAR categories.
Notes
Notes
1 These results apply to modified versions of MAR and MNAR as defined in Section 2.2.
2 For a gentle introduction to causal graphical models, see Elwert (Citation2013), Lauritzen (Citation2001), and Pearl (2009b, secs. 1.2 and 11.1.2).
3 For an introduction to d-separation, see http://bayes.cs.ucla.edu/BOOK-2K/d-sep.html and http://www.dagitty.net/learn/dsep/index.html.
4 The term identifiability is sometimes used in lieu of recoverability. We prefer using recoverability over identifiability since the latter is strongly associated with causal effects, while the former is a broader concept, applicable to statistical relationships as well. See Section 3.5.
5 This definition is more operational than the standard definition of identifiability for it states explicitly what is achievable under recoverability and more importantly, what problems may occur under nonrecoverability.
6 A variable is a collider on the path if the path enters and leaves the variable via arrowheads (a term suggested by the collision of causal forces at the variable) (Greenland and Pearl Citation2011).
7 Markov blanket MbX of variable X is any set of variables such that X is conditionally independent of all the other variables in the graph given MbX (Pearl Citation1988).
8 For an introduction to do-calculus, see Pearl and Bareinboim (Citation2014, sec. 2.5) and Koller and Friedman (Citation2009).
9 Unless otherwise specified nonrecoverability will assume joint distribution as a target and does not exclude recoverability of targets such as odds ratio (discussed in Bartlett, Harel, and Carpenter (Citation2015)).