Search in:

Statistical Theory and Related Fields Volume 3, 2019 - Issue 2

Submit an article Journal homepage

Free access

228

Views

CrossRef citations to date

Altmetric

Listen

Articles

An equivalence result for moment equations when data are missing at random

Marian HristacheUniv Rennes, Ensai, CNRS, CREST-UMR 9194, Rennes, FranceView further author information

Valentin PatileaUniv Rennes, Ensai, CNRS, CREST-UMR 9194, Rennes, FranceCorrespondence[email protected]
View further author information

Pages 199-207 | Received 19 Dec 2018, Accepted 21 Sep 2019, Published online: 09 Oct 2019

Cite this article
https://doi.org/10.1080/24754269.2019.1672021
CrossMark

In this article

ABSTRACT
1. Introduction
2. Equivalent moment model
3. Some examples revisited
4. Restricted estimators for quantile regressions and general conditional moment models with data missing at random
5. Is imputation really informative?
6. Conclusions
Disclosure statement
Additional information
References

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
View PDF PDF View EPUB EPUB

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

ABSTRACT

We consider general statistical models defined by moment equations when data are missing at random. Using the inverse probability weighting, such a model is shown to be equivalent with a model for the observed variables only, augmented by a moment condition defined by the missing mechanism. Our framework covers a large class of parametric and semiparametric models where we allow for missing responses, missing covariates and any combination of them. The equivalence result is stated under minimal technical conditions and sheds new light on various aspects of interest in the missing data literature, as for instance the efficiency bounds and the construction of the efficient estimators, the restricted estimators and the imputation.

KEYWORDS:

Efficiency bounds
imputation
inverse probability weighting
semiparametric regression
restricted estimators

1. Introduction

Models defined by moment and conditional moment equations are widely used in statistics, biostatistics and econometrics; see, for instance, Ai and Chen(Citation2003, Citation2012), Domínguez and Lobato (Citation2004), and the references therein. Here, we investigate general moment or conditional moment equation models with missing data. The main idea we propose is that under a missing at random assumption, the initial model with missing data is equivalent with a inverse probability weighting moment equations model for the complete observations, augmented by a moment condition defined by the missing mechanism. The equivalence, a generalisation of the GMM equivalence result of Graham (Citation2011), is stated in terms of sets of probability measures. It has numerous implications and provides valuable insight, for instance on the efficiency bound calculations and the construction of efficient estimators.

In the framework of missing data, the assumption of missing at random (MAR) is presumably the most used when trying to describe an ignorable mechanism on the missingness. However, this concept, first introduced by Rubin (Citation1976), does not have the same meaning for everyone. For simplicity, let the full observations be i.i.d. replications of a vector $L = (X, Y, Z)$ and let $R = (R_{X}, R_{Y}, R_{Z}) \in {0, 1}^{3}$ be a random vector such that its component takes the value 1 if we observe the corresponding component of L and 0 otherwise. For Rubin (Citation1976) (see also, for example, Little & Rubin, Citation2002; Robins & Gill, Citation1997), MAR means that missingness depends only on the observed components, denoted by $L_{(R)},$ of L: (1) $\begin{aligned} the conditional law L (R | L) of R given L \\ is the same as the conditional law L (R | L_{(R)}) of \\ R given L_{(R)} . \end{aligned}$ (1) This concept was generalised to CAR, coarsening at random, by Heitjan and Rubin (Citation1991) (see also, for example, van der Laan and Robins (Citation2003)): $L (C | L)$ is the same as $L (C | ϕ (C, L))$ for an always observable transformation $ϕ (C, L)$ of the full data L and the censoring variable C. In the context of regression-like models, the MAR assumption is usually stated in a different and more restrictive way. A strongly ignorable selection mechanism (also called conditional independence, or selection on observables, etc.) means that, assuming some components of L are always observed, (2) $\begin{aligned} the conditional law L (R | L) of R given L is the same \\ as the conditional law of R given the always observed \\ components of L . \end{aligned}$ (2) This assumption was originally introduced by Rosenbaum and Rubin (Citation1983) in the framework of randomised clinical trials, which corresponds in our simple example, with $L = (X, Y, Z)$ , to the case where, for example, X is always observed, and one and only one of Y and Z is observed. This means that the selection vector R takes the form $R = (1, D, 1 - D)$ , where Y is observed iff D = 1 and Z is observed iff D = 0. In this situation, MAR means $\begin{aligned} P (D = 1 ∣ X, Y, Z) & = P (D = 1 ∣ X, Y) \\ = 1 - P (D = 0 ∣ X, Y, Z) \\ = 1 - P (D = 0 ∣ X, Z) \\ = P (D = 1 ∣ X, Z), \end{aligned}$ or, equivalently, (3) $D ⊥⊥ Z ∣ X, Y a n d D ⊥⊥ Y ∣ X, Z .$ (3) Meanwhile a strongly ignorable missingness mechanism writes $P (D = 1 ∣ X, Y, Z) = P (D = 1 ∣ X),$ or, equivalently, (4) $D ⊥⊥ (Y, Z) ∣ X .$ (4) Clearly, condition (Equation4(4) $D ⊥⊥ (Y, Z) ∣ X .$ (4) ) implies condition (Equation3(3) $D ⊥⊥ Z ∣ X, Y a n d D ⊥⊥ Y ∣ X, Z .$ (3) ), but the reverse is not true in general. In the present work we consider the case of i.i.d. replications of a vector containing missing components for which the same subvector is missing for the incomplete replicates. In this case the MAR assumption (Equation1(1) $\begin{aligned} the conditional law L (R | L) of R given L \\ is the same as the conditional law L (R | L_{(R)}) of \\ R given L_{(R)} . \end{aligned}$ (1) ) and the the strongly ignorable MAR assumption (Equation2(2) $\begin{aligned} the conditional law L (R | L) of R given L is the same \\ as the conditional law of R given the always observed \\ components of L . \end{aligned}$ (2) ) coincide (and are equivalent to CAR), as is it is also the case, for example, in Cheng (Citation1994), Tsiatis (Citation2007), Graham (Citation2011), among others.

Other MAR-related assumptions appear in the literature. For instance, when the response Y is missing, while X and Z are observed, Wei, Ma, and Carroll (Citation2012) consider the assumption $R_{Y} ⊥⊥ (X, Y) | Z$ that is stronger than the MAR assumption (Equation2(2) $\begin{aligned} the conditional law L (R | L) of R given L is the same \\ as the conditional law of R given the always observed \\ components of L . \end{aligned}$ (2) ), commonly used for regression models. Another assumption for the missingness mechanism is introduced in Wooldridge (Citation2007) : $W = (X, Y)$ and $S \in {0, 1}$ is a random variable such that W and Z are observed whenever S = 1, and $S ⊥⊥ W | Z .$ Wooldridge's assumption is more general than the MAR condition (Equation2(2) $\begin{aligned} the conditional law L (R | L) of R given L is the same \\ as the conditional law of R given the always observed \\ components of L . \end{aligned}$ (2) ) where Z is supposed to be always observed. Indeed, Wooldridge (Citation2007) does not suppose that W and/or Z are missing if $S = 0.$

The paper is organised as follows. The main equivalence result is stated in Section 2. In Section 3, we revisit some examples considered in the literature in the MAR setup: estimating mean functionals in parametric and nonparametric regressions; and quantile regression with missing responses and/or covariates. For these examples, our equivalence result suggests new ways for calculating efficiency bounds and constructing efficient estimators, using for instance the GMM, empirical likelihood approaches, the SMD approach of Ai and Chen (Citation2007), or the kernel-based method of Lavergne and Patilea (Citation2013). In Section 4 we reinterpret some classes of so-called restricted estimators; see, for instance, Tsiatis (Citation2007) and Tan (Citation2011). Finally, in Section 5 we use our general result to discuss on a common belief that the (multiple) imputation is necessary in order to capture all the information from the partially observed data.

2. Equivalent moment model

The following statement is a version of Theorems 1 and 2 in Hristache and Patilea (Citation2017). The proof is very similar and hence will be omitted. In the following, vectors a columns matrices and for any matrix A, $A^{'}$ denotes its transpose.

Theorem 2.1

Let $M_{1}$ and $M_{2}$ be two models defined for random vectors $(D, W^{'}, V^{'}, U^{'})^{'} \in {0, 1} \times R^{d_{W}} \times R^{d_{V}} \times R^{d_{U}}$ as follows: (5) $M_{1} : \{\begin{cases} E [ρ_{j} (γ, W, V, U)] = 0, \forall j \in J, \\ D ⊥⊥ {U, V} ∣ W, \end{cases}$ (5) and (6) $M_{2} : \{\begin{cases} E [\frac{D}{π (W)} ρ_{j} (γ, W, V, U)] = 0, \forall j \in J, \\ E [\frac{D}{π (W)} - 1 | V, W] = 0, \end{cases}$ (6) where $γ \in Γ$ is an unknown (possibly infinite dimensional) parameter, $ρ_{j} : Γ \times R^{d_{W}} \times R^{d_{V}} \times R^{d_{U}} \to R$ , for $j \in J$ , is a collection of known measurable functions, and π is a unknown measurable function such that $π (W) > 0$ almost surely.

The models $M_{1}$ and $M_{2}$ are equivalent if restricted to the laws of $(D, W^{'}, V^{'}, D U^{'})^{'}$ ; more precisely,

$(D, W^{'}, V^{'}, U^{'})^{'} \in M_{1} \Rightarrow (D, W^{'}, V^{'}, U^{'})^{'} \in M_{2}$ ,
$(D, W^{'}, V^{'}, U^{'})^{'} \in M_{2} \Rightarrow \exists (\tilde{D}, {\tilde{W}}^{'}, {\tilde{V}}^{'}, {\tilde{U}}^{'})^{'} \in M_{1}$ such that $(\tilde{D}, {\tilde{W}}^{'}, {\tilde{V}}^{'}, \tilde{D} {\tilde{U}}^{'})^{'}$ and $(D, W^{'}, V^{'}, D U^{'})^{'}$ have the same distribution.

Remarks

The parameter γ in model $M_{1}$ could include parameters of interest and parameters of nuisance.
The function $π (\cdot)$ usually called the propensity score, could be considered completely unknown and modelled nonparametrically, or modelled using a parametric model. With at hand an estimate of $π (\cdot)$ obtained from the second equation in the model $M_{2}$ , one could use existing moment equation approaches for the estimation of the parameters in the first equation of $M_{2}$ . See our Example 3.1.
The link of this theorem with models where data are missing at random is made if we consider that the vector U is observed if and only if D = 1. The theorem then basically says that at the observational level, which means for the laws of the observed vector $(D, W^{'}, V^{'}, D U^{'})^{'}$ , the two models $M_{1}$ and $M_{2}$ are equivalent. As a consequence, inference for the law of $(D, W^{'}, V^{'}, U^{'})^{'}$ in the model $M_{1}$ , a moment conditions model under an assumption of data missing at random, could be done based on the model $M_{2}$ , which is defined using only the observed part $(D, W^{'}, V^{'}, D U^{'})^{'}$ of the vector vector $(D, W^{'}, V^{'}, U^{'})^{'}$ . In particular, efficiency bound calculations and efficient estimator constructions could be done in the model $M_{2}$ , which in many cases could be much easier.
The underlying condition ‘DU is always observed’ includes the usual case $\begin{aligned} D = 0 i f U i s n o t o b s e r v e d, \\ D = 1 i f U i s o b s e r v e d, \end{aligned}$ but it is more general. When D = 1 one observes the value of U. Meanwhile, one should read that when $D = 0,$ U could be observed or not since whatever the value of U is, DU = 0.

3. Some examples revisited

In this section we present two examples of models already studied in the literature for which our approach gives new insights and sometimes allows for simpler methods for obtaining efficiency bounds and asymptotically efficient estimators. The guiding principle is to use Theorem 2.1 and put the model of interest, in the presence of a MAR mechanism, under an equivalent form (7) $\begin{aligned} E [g_{1} (θ, α, X, Y, Z) | X] = 0 \\ E [g_{2} (α, X, Y, Z) | X, Z] = 0, \end{aligned}$ (7) where the two sets of equations are orthogonal, meaning that $E [g_{1} (θ, α, X, Y, Z) g_{2}^{'} (α, X, Y, Z) | X, Z] = 0.$ The equivalent model (Equation7(7) $\begin{aligned} E [g_{1} (θ, α, X, Y, Z) | X] = 0 \\ E [g_{2} (α, X, Y, Z) | X, Z] = 0, \end{aligned}$ (7) ) has a sequential moment structure that allows to compute the efficiency bound; see Ai and Chen (Citation2012). Moreover, the finite dimensional interest parameter θ can be efficiently estimated from the first equations, with the (possibly infinite dimensional) nuisance parameter α known or suitably estimated from the last equations. A similar statement on the efficient estimation of θ, in the particular case of a finite dimensional α and without conditioning on X and X, Z, can be found in Theorem 2.2, point 8, of Prokhorov and Schmidt (Citation2009).

3.1. Mean functionals with data missing at random

Consider the problem of estimating the mean of functionals of the variables in a parametric regression model with missing responses: (8) $\begin{aligned} E [h (X, Y) - θ] = 0 \\ E [Y - r (X, α) | X] = 0. \end{aligned}$ (8) The parameter of interest here is $θ = E [h (X, Y)]$ , where $h (\cdot, \cdot)$ is some given squared-integrable function; see Müller (Citation2009). Hristache and Patilea (Citation2017) considered the same framework and focused on the case where $h (X, Y)$ does not depend on X. Here we investigate the general case where $h (X, Y)$ that could also depend on X. Some usual examples are the mean of the response variable ( $h (x, y) = y$ ), the second-order moment of the response ( $h (x, y) = v e c (y y^{'})$ ), the cross-product of the response and the covariate vector ( $h (x, y) = v e c (y x^{'})$ ). (Here, $v e c (\cdot)$ is the vectorisation operator that transforms a matrix in a column vector by stacking the columns of the matrix.) For simplicity, we take Y with real values in the following of this section.

The regression function $r (x, α)$ has a known (parametric) form, X is always observed, Y is only observed when D = 1 and a MAR assumption holds : $D ⊥⊥ Y | X$ . With $π (x) = P (D = 1 | X = x)$ , the model can be written, at the observational level, under the following equivalent form: (9) $\begin{aligned} E \{\frac{D}{π (X)} [h (X, Y) - θ]\} = 0 \\ E {D [Y - r (X, α)] | X} = 0 \\ E [\frac{D}{π (X)} - 1 | X] = 0. \end{aligned}$ (9) The last two equations being orthogonal, since $\begin{aligned} E \{[\frac{D}{π (X)} - 1] D [Y - r (X, α)] | X\} \\ = [\frac{1}{π (X)} - 1] E {D [Y - r (X, α)] | X} = 0, \end{aligned}$ it is also equivalent to the model defined by the following system of orthogonal equations, where $σ^{2} (X)$ stands for the conditional variance $V (Y | X)$ : (10) $\begin{aligned} E \{\frac{D}{π (X)} [h (X, Y) - θ] \\ - \frac{1}{σ^{2} (X) π (X)} E [\frac{D}{π (X)} h (X, Y) (Y - r (X, α)) | X] \\ D [Y - r (X, α)] \\ - E [\frac{D}{π (X)} (h (X, Y) - θ) | X] c [\frac{D}{π (X)} - 1]\} = 0 \\ E {D [Y - r (X, α)] | X} = 0 \\ E [\frac{D}{π (X)} - 1 | X] = 0. \end{aligned}$ (10) Solving for θ, we get $θ = E [Φ (Y, X, D; α, σ^{2}, π, η_{1}, η_{2})]$ where $\begin{aligned} Φ (Y, X, D; α, σ^{2}, π, η_{1}, η_{2}) \\ = \frac{D}{π (X)} h (X, Y) - E [\frac{D}{π (X)} h (X, Y) | X] \\ \times [\frac{D}{π (X)} - 1] \\ - \frac{1}{σ^{2} (X) π (X)} E [\frac{D}{π (X)} h (X, Y) (Y - r (X, α)) | X] \\ \times D [Y - r (X, α)], \\ η_{1} (X) = E [D h (X, Y) | X] \end{aligned}$ and $η_{2} (X) = η_{2} (X; α) = E [D h (X, Y) (Y - r (X, α)) | X] .$ Let $\hat{α}$ be an estimator of α obtained in the model. With the variance $σ^{2} (\cdot)$ and the functions $η_{1} (\cdot)$ and $η_{2} (\cdot; \cdot)$ estimated nonparametrically, the plug-in estimator $\hat{θ} = \frac{1}{n} \sum_{i = 1}^{n} Φ (Y_{i}, X_{i}, D_{i}; \hat{α}, \hat{σ^{2}}, \hat{π}, \hat{η_{1}}, \hat{η_{2}})$ would be efficient. Since the first equation in system (Equation10(10) $\begin{aligned} E \{\frac{D}{π (X)} [h (X, Y) - θ] \\ - \frac{1}{σ^{2} (X) π (X)} E [\frac{D}{π (X)} h (X, Y) (Y - r (X, α)) | X] \\ D [Y - r (X, α)] \\ - E [\frac{D}{π (X)} (h (X, Y) - θ) | X] c [\frac{D}{π (X)} - 1]\} = 0 \\ E {D [Y - r (X, α)] | X} = 0 \\ E [\frac{D}{π (X)} - 1 | X] = 0. \end{aligned}$ (10) ) is orthogonalised with respect to the last one, for the propensity score $π (\cdot)$ , one could use a parametric model without affecting the efficiency bound.

3.2. Quantile regression with data missing at random

A particular setting of quantile regression with missing data at random is considered in Wei et al. (Citation2012). For $0 < τ < 1$ , the conditional quantile $Q_{τ} (Y | X, Z)$ of the always observed response Y given the regressor vectors Z (always observed) and X (observed iff D = 1) is assumed to be linear, (11) $Q_{τ} (Y | X, Z) = X^{'} β_{1, τ} + Z^{'} β_{2, τ},$ (11) and the missingness mechanism is defined by the strong missing at random condition (12) $D ⊥⊥ (X, Y) | Z .$ (12) Taking in (Equation6(6) $M_{2} : \{\begin{cases} E [\frac{D}{π (W)} ρ_{j} (γ, W, V, U)] = 0, \forall j \in J, \\ E [\frac{D}{π (W)} - 1 | V, W] = 0, \end{cases}$ (6) ) U = X, V = Y, W = Z, $ρ_{j} (β_{τ}, W, V, U) = (X^{'}, Z^{'})^{'} [𝟙_{{Y - X^{'} β_{1, τ} - Z^{'} β_{2, τ} \leq 0}} - τ] \times a_{j} (U, W) ≜ ρ (X, Y, Z, β_{τ}) \times a_{j} (X, Z)$ , $j \in N$ , where the family of functions ${a_{j}}_{j \in N}$ spans $L^{2} (X, Z)$ , the model defined by (Equation11(11) $Q_{τ} (Y | X, Z) = X^{'} β_{1, τ} + Z^{'} β_{2, τ},$ (11) ) and (Equation12(12) $D ⊥⊥ (X, Y) | Z .$ (12) ) can be written under the following equivalent form: (13) $\begin{aligned} E [D ρ (Y, X, Z, β_{τ}) | X, Z] = 0 \\ E [\frac{D}{π (Z)} - 1 | Z] = 0. \end{aligned}$ (13) The two sets of equations being already orthogonal (with respect to the σ-field $σ (X, Z)$ ), in this situation we can efficiently estimate the parameter $β_{τ} = (β_{1, τ}^{'}, β_{2, τ}^{'})^{'}$ from the complete data only, that is from the model defined by (Equation11(11) $Q_{τ} (Y | X, Z) = X^{'} β_{1, τ} + Z^{'} β_{2, τ},$ (11) ) keeping for the statistical analysis only the observations for which all the components of the vector $(Y, X^{'}, Z^{'})^{'}$ are observed. The gain in efficiency observed in the simulation experiment of Wei et al. (Citation2012) for their multiple imputation improved estimator comes, in our opinion, from the supplementary parametric assumption on the form of the conditional density of X given Z (see their Assumption 4).

A more general linear quantile regression model defined by (Equation11(11) $Q_{τ} (Y | X, Z) = X^{'} β_{1, τ} + Z^{'} β_{2, τ},$ (11) ) with missing data at random is considered in Chen, Wan, and Zhou (Citation2014). With their notations, we have (14) $Y = Z^{'} θ (τ) + ε, P (ε \leq 0 | Z) = τ, 0 < τ < 1,$ (14) for the full data model. They also denote by X the always observed components of the vector $(Y, Z^{'})^{'}$ and with $X^{c}$ the components of the same vector that are observed iff the binary variable D takes the value 1 and use the ‘standard’ missing at random assumption $P (D = 1 | Y, Z) = P (D = 1 | X, X^{c}) = P (D = 1 | X) = π (X)$ . This fits our framework by taking U = X, V = 1, $W = X^{c}$ and $\begin{aligned} ρ_{j} (θ (τ), W, V, U) & = Z [𝟙_{{Y - Z^{'} θ (τ) \leq 0}} - τ] \times a_{j} (U, W) \\ ≜ ρ (Y, Z, θ (τ)) \times a_{j} (Z), j \in N, \end{aligned}$ where the family of functions ${a_{j}}_{j \in N}$ spans $L^{2} (Z)$ . The equivalent moment equations model, at the observational level, can be written as (15) $\begin{aligned} E \{\frac{D}{π (X)} Z [𝟙_{{Y - Z^{'} θ (τ) \leq 0}} - τ] | Z\} = 0 \\ E [\frac{D}{π (X)} - 1 | X] = 0. \end{aligned}$ (15) The information bound for this model is given in Hristache and Patilea (Citation2016). It can not be calculated explicitly, except some special cases, which includes the missing responses as before or the case where X or/and Z are discrete. It is different from the information bound given in Chen, Hong, and Tarozzi (Citation2008) which corresponds to a model defined by an unconditional quantile moment and a MAR assumption and could be represented equivalently under the form (16) $\begin{aligned} E \{\frac{D}{π (X)} Z [𝟙_{{Y - Z^{'} θ (τ) \leq 0}} - τ]\} = 0 \\ E [\frac{D}{π (X)} - 1 | X] = 0. \end{aligned}$ (16) Models (Equation15(15) $\begin{aligned} E \{\frac{D}{π (X)} Z [𝟙_{{Y - Z^{'} θ (τ) \leq 0}} - τ] | Z\} = 0 \\ E [\frac{D}{π (X)} - 1 | X] = 0. \end{aligned}$ (15) ) and (Equation16(16) $\begin{aligned} E \{\frac{D}{π (X)} Z [𝟙_{{Y - Z^{'} θ (τ) \leq 0}} - τ]\} = 0 \\ E [\frac{D}{π (X)} - 1 | X] = 0. \end{aligned}$ (16) ) are quite different and so are the corresponding efficiency bounds, so that no estimation procedure given in Chen et al. (Citation2014) could be efficient in their linear quantile regression model (Equation14(14) $Y = Z^{'} θ (τ) + ε, P (ε \leq 0 | Z) = τ, 0 < τ < 1,$ (14) ) with missing data at random.

4. Restricted estimators for quantile regressions and general conditional moment models with data missing at random

The model defined by the regression-like equation $E [ρ (θ, Y, X, V) | X, V] = 0,$ and the MAR selection mechanism $P (D = 1 | Y, X, V, W) = P (D = 1 | W) = π (W)$ is equivalent, at the observational level, to the following model defined by conditional moment equations : $P : \{\begin{cases} E [\frac{D}{π (W)} ρ (θ, Y, X, V) | X, V] = 0, \\ E [\frac{D}{π (W)} - 1 | W] = 0. \end{cases}$ This framework includes many situations. For instance, taking $W^{'} = (Y^{'}, V^{'}, Z^{'})$ we obtain the case in which some regressors (conditioning variables) X are missing, while with $W^{'} = (X^{'}, V^{'}, Z^{'})$ we cover the case of missing responses. Splitting Y in an observed subvector $Y_{o}$ and a not always observed subvector $Y_{u}$ , with $W^{'} = (Y_{o}^{'}, V^{'}, Z^{'})$ this corresponds to the case where both some responses and some covariates are missing. In all these examples, U is the vector of not always observed components of the data vector.

For the model $P_{(1)} : E [\frac{D}{π (W)} ρ (θ, Y, X, V) | X, V] = 0,$ denoting by $P_{0}$ the true law of $(Y^{'}, X^{'}, V^{'}, Z^{'})^{'}$ , the tangent space is $\begin{aligned} T_{(1)} & = \{s \in {L^{2} (P_{0})}^{\oplus d} : E (s) = 0, \\ E [\frac{D}{π (W)} ρ (θ, Y, X, V) s^{'} (Y, X, V, Z) | X, V] \\ = 0\} . \end{aligned}$ For the model $P_{(2)} : E [\frac{D}{π (W)} - 1 | W] = 0,$ the tangent space is $\begin{aligned} T_{(2)} & = \{s \in {L^{2} (P_{0})}^{\oplus d} : E (s) = 0, \\ E [(\frac{D}{π (W)} - 1) s^{'} (Y, X, V, Z) | W] = 0\} . \end{aligned}$ The tangent space $T$ of ${P = P}_{(1)} \cap P_{(2)}$ is (see Hristache & Patilea, Citation2016) ${T = T}_{(1)} \cap T_{(2)} .$ We obtain the efficient score ${\bar{S}}_{θ}$ by projecting the score $S_{θ}$ on $T^{⊥}$ , ${\bar{S}}_{θ} = Π (S_{θ} | T^{⊥}) = Π (S_{θ} | \bar{T_{(1)}^{⊥} + T_{(2)}^{⊥}}),$ which gives the following solution : $\begin{aligned} {\bar{S}}_{θ} & = a_{1}^{*} (X, V) \frac{D}{π (W)} ρ (θ, Y, X, V) + a_{2}^{*} (W) \\ \times (\frac{D}{π (W)} - 1) \in T_{(1)}^{⊥} + T_{(2)}^{⊥}, \end{aligned}$ where $\begin{aligned} a_{1}^{*} (X, V) & = \{- E (\partial_{θ} ρ^{'} | X, V) \\ + E [E (a_{1}^{*} ρ | W) \frac{1 - π}{π} ρ^{'} | X, V]\} \\ \times E^{- 1} (\frac{1}{π (W)} ρ ρ^{'} | X, V), \\ a_{2}^{*} (W) & = - E [a_{1}^{*} (X, V) ρ | W] . \end{aligned}$

Remark

${\bar{S}}_{θ}$ is also the efficient score in the model $P : \{\begin{cases} E [a_{1}^{*} (X, V) \frac{D}{π (W)} ρ (θ, Y, X, V)] = 0 \\ E [a_{2}^{*} (W) (\frac{D}{π (W)} - 1)] = 0, \end{cases},$ or in the model defined by the moment condition $\begin{aligned} E [a_{1}^{*} (X, V) \frac{D}{π (W)} ρ (θ, Y, X, V) + a_{2}^{*} (W) \\ \times (\frac{D}{π (W)} - 1)] = 0. \end{aligned}$ As shown in Hristache and Patilea (Citation2016), $a_{1}^{*}$ satisfies an equation of the form $a_{1}^{*} (X, V) = γ (X, V) + T (a_{1}^{*} (X, V)),$ with T a contraction operator. The solution of this equation is unique, but in order to obtain it one needs to use nonparametric estimators at each step of the iterative procedure. An alternative approach would be to consider finite dimensional subspaces $S_{1} \subset T_{(1)}^{⊥}$ and $S_{2} \subset T_{(2)}^{⊥}$ when calculating the ‘efficient score’, leading to an approximately efficient score. We obtain in this way what is known in the literature as restricted estimators. We can write: $T_{(1)}^{⊥} = \{s = a_{1} (X, V) \frac{D}{π (W)} ρ (θ, Y, X, V) : a_{1} \in L^{2} (P_{0})\}$ $S_{1} \subset T_{(1)}^{⊥}$ finite dimensional $\Rightarrow \exists a_{1}^{(1)}, \dots, a_{1}^{(k)} \in L^{2} (P_{0})$ s.t. $\begin{aligned} S_{1} & = l i n \{a_{1}^{(i)} (X, V) \frac{D}{π (W)} ρ (θ, Y, X, V) : \\ 1 \leq i \leq k\} \\ \Leftrightarrow S_{1}^{⊥} = \{s \in {L^{2} (P_{0})}^{\oplus d} : E (a_{1}^{(i)} \frac{D}{π} ρ s^{'}) = 0, \\ 1 \leq i \leq k\} . \end{aligned}$ Compare to $T_{(1)} = \{s \in {L^{2} (P_{0})}^{\oplus d} : E (\frac{D}{π} ρ s^{'} | X, V) = 0\} .$ Similarly for $S_{2} \subset T_{(2)}^{⊥}$ : $T_{(2)}^{⊥} = \{s = a_{2} (W) (\frac{D}{π (W)} - 1) : a_{2} \in L^{2} (P_{0})\}$ $S_{2} \subset T_{(2)}^{⊥}$ finite dimensional $\Rightarrow \exists a_{2}^{(1)}, \dots, a_{2}^{(l)} \in L^{2} (P_{0})$ s.t. $\begin{aligned} S_{2} & = l i n \{a_{2}^{(j)} (W) (\frac{D}{π (W)} - 1) : 1 \leq j \leq l\} \\ \Leftrightarrow S_{2}^{⊥} & = \{s \in {L_{0}^{2} (P_{0})}^{\oplus d} : E [a_{2}^{(j)} (\frac{D}{π (W)} - 1) s^{'}] \\ = 0, 1 \leq j \leq k\} . \end{aligned}$ An optimal class 1 restricted estimator (see Tan, Citation2011; Tsiatis, Citation2007) is solution of the approximated efficient score equation $\begin{aligned} E \{{\bar{a}}_{1}^{(1)} (X, V) \frac{D}{π (W)} ρ (θ, Y, X, V) + {\bar{a}}_{2}^{(1)} (W) \\ \times (\frac{D}{π (W)} - 1)\} = 0, \end{aligned}$ where ${\bar{a}}_{1}^{(1)}$ and ${\bar{a}}_{2}^{(2)}$ are given by $\begin{aligned} {\bar{S}}_{θ} & = Π (S_{θ} | S_{1} + S_{2}) \\ = {\bar{a}}_{1}^{(1)} (X, V) \frac{D}{π (W)} ρ (θ, Y, X, V) + {\bar{a}}_{2}^{(1)} (W) \\ \times (\frac{D}{π (W)} - 1) . \end{aligned}$ In fact, ${\bar{S}}_{θ}$ is the efficient score in the following moment equations model: $P^{'} : \{\begin{cases} E [a_{1}^{(1)} (X, V) \frac{D}{π (W)} ρ (θ, Y, X, V)] = 0 \\ ⋮ \\ E [a_{1}^{(k)} (X, V) \frac{D}{π (W)} ρ (θ, Y, X, V)] = 0 \\ E [a_{2}^{(1)} (W) (\frac{D}{π (W)} - 1)] = 0 \\ ⋮ \\ E [a_{2}^{(l)} (W) (\frac{D}{π (W)} - 1)] = 0 \end{cases}$ This allows for a new, simple and intuitive interpretation of the optimal class 1 restricted estimators as efficient estimators in a larger model, obtained from the initial one by using appropriate ‘instruments’ to transform the conditional moment equations in a (growing) number of unconditional moment conditions. Another advantage of this new perspective is the access to the most commonly used methods of obtaining efficient estimators in moment equations models such as GMM, SMD (see Lavergne & Patilea, Citation2013) or empirical likelihood estimators.

Similar procedures can be used for class 2 restricted estimators, based on $\begin{aligned} Π (S_{θ} | S_{1} + T_{(2)}^{⊥}) & = {\bar{a}}_{1}^{(2)} (X, V) \frac{D}{π (W)} ρ (θ, Y, X, V) \\ + {\bar{a}}_{2}^{(2)} (W) (\frac{D}{π (W)} - 1) \end{aligned}$ and class 3 restricted estimators (Tan, Citation2011), based on $\begin{aligned} Π (S_{θ} | T_{(1)}^{⊥} + S_{2}) & = {\bar{a}}_{1}^{(3)} (X, V) \frac{D}{π (W)} ρ (θ, Y, X, V) \\ + {\bar{a}}_{2}^{(3)} (W) (\frac{D}{π (W)} - 1) . \end{aligned}$

4.1. Simulation study

The approach on restricted estimators is illustrated in a setting already considered by Chen, Wan, and Zhou(Citation2015); see their Example 1, scenario $S_{2}$ . With the notations of the previous section, the data are generated from the following model: (17) $\begin{aligned} Y & = θ_{0} + θ_{1} X + θ_{2} V + 0.5 [1 + (X + V)] ε, \\ ε \sim N (0, 1), \end{aligned}$ (17) where $(θ_{0}, θ_{1}, θ_{2}) = (1, - 1, 1)$ and $(X, V)$ follows a centred bivariate normal distribution with unit variances and correlation equal to 0.5. The parameter of interest here is the vector coefficient $(θ_{0} (τ), θ_{1} (τ), θ_{2} (τ))$ of the conditional quantile of Y given X and V: (18) $Q_{τ} (Y ∣ X, V) = θ_{0} (τ) + θ_{1} (τ) X + θ_{2} (τ) V,$ (18) with $θ_{0} (τ) = θ_{0} + Q_{τ} (ε)$ , $θ_{1} (τ) = θ_{1} + 0.5 Q_{τ} (ε)$ , $θ_{2} (τ) = θ_{2} + 0.5 Q_{τ} (ε)$ , where $Q_{τ} (ε)$ is the τth quantile of ϵ, $τ \in (0, 1)$ . Herein, we only report the case $τ = 0.75$ . The variables Y and V are always observed, while X is observed if and only if D = 1, where D is a Bernoulli random variable such that $P (D = 1 ∣ Y, V) = 0.4 (1 + \sin^{2} (Y - V)) 𝟙_{{| Y - V | \leq 1}} + 1 - 𝟙_{{| Y - V | \leq 1}} = π (W)$ , with $W = (Y, V)$ . The model for the fully observed data is defined by the regression-like equation $E [ρ (θ, Y, X, V) | X, V] = 0,$ where $ρ (θ, Y, X, V) = 𝟙_{{Y - θ_{0} - θ_{1} X - θ_{2} V \leq 0}} - τ$ . Under the MAR selection mechanism $P (D = 1 | Y, X, V) = P (D = 1 | Y, V) = π (W) .$ it is equivalent, at the observational level, to the following model defined by conditional moment equations: $P : \{\begin{cases} E [\frac{D}{π (W)} (𝟙_{{Y - θ_{0} - θ_{1} X - θ_{2} V \leq 0}} - τ) ∣ X, V] = 0, \\ E [\frac{D}{π (W)} - 1 ∣ W] = 0. \end{cases}$ The restricted estimators considered are obtained by the generalised method of moments in the following models $P_{s}$ , $s \in {a, b, c, d, e, f}$ , which contain the model $P$ : $P_{s} : \{\begin{cases} E [\frac{D}{π_{s} (φ, Y, V)} (𝟙_{{Y - θ_{0} - θ_{1} X - θ_{2} V \leq 0}} - τ) \\ a_{1 s}^{(k)} (X, V)] = 0, k \in {1, \dots, k_{s}} \\ E \{[\frac{D}{π_{s} (φ, Y, V)} - 1] a_{2 s}^{(l)} (Y, V)\} \\ = 0, l \in {1, \dots, l_{s}}, \end{cases}$ where:

$π_{a} \equiv 1$ , $D \equiv 1$ , $k_{a} = 3$ , $a_{1 a}^{(1)} (X, V) = 1$ , $a_{1 a}^{(2)} (X, V) = X$ , $a_{1 a}^{(3)} (X, V) = V$ (no missing data, 1, X and V as instruments);
$π_{b} \equiv 1$ , $D \equiv 1$ , $k_{b} = 3$ , $a_{1 b}^{(1)} (X, V) = (1 + X^{2})^{- 1}$ , $a_{1 b}^{(2)} (X, V) = (1 + V^{2})^{- 1}$ , $a_{1 b}^{(3)} (X, V) = (1 + | X | + | V |)^{- 2}$ (no missing data, $(1 + X^{2})^{- 1}$ , $(1 + V^{2})^{- 1}$ and $(1 + | X | + | V |)^{- 2}$ as instruments);
$π_{c} (Y, V) = 0.4 (1 + \sin^{2} (Y - V)) 𝟙_{{| Y - V | \leq 1}} + 1 - 𝟙_{{| Y - V | \leq 1}}$ , $k_{c} = 3$ , $a_{1 c}^{(1)} (X, V) = (1 + X^{2})^{- 1}$ , $a_{1 c}^{(2)} (X, V) = (1 + V^{2})^{- 1}$ , $a_{1 c}^{(3)} (X, V) = (1 + | X | + | V |)^{- 2}$ (true propensity score, $(1 + X^{2})^{- 1}$ , $(1 + V^{2})^{- 1}$ and $(1 + | X | + | V |)^{- 2}$ as instruments);
$π_{d} (Y, V) = {1 + \exp [- (φ_{0} + φ_{1} Y + φ_{2} V)]}^{- 1}$ , with $φ_{0}$ , $φ_{1}$ and $φ_{2}$ estimated from a logistic regression, $k_{d} = 3$ , $a_{1 d}^{(1)} (X, V) = (1 + X^{2})^{- 1}$ , $a_{1 d}^{(2)} (X, V) = (1 + V^{2})^{- 1}$ , $a_{1 d}^{(3)} (X, V) = (1 + | X | + | V |)^{- 2}$ (propensity score estimated by a logistic regression on Y and V, $(1 + X^{2})^{- 1}$ , $(1 + V^{2})^{- 1}$ and $(1 + | X | + | V |)^{- 2}$ as instruments for the IPW quantile equation);
$π_{e} (Y, V) = {1 + \exp [- (φ_{0} + φ_{1} Y + φ_{2} V)]}^{- 1}$ , $k_{e} = 3$ , $a_{1 e}^{(1)} (X, V) = (1 + X^{2})^{- 1}$ , $a_{1 e}^{(2)} (X, V) = (1 + V^{2})^{- 1}$ , $a_{1 e}^{(3)} (X, V) = (1 + | X | + | V |)^{- 2}$ , $l_{e} = 3$ , $a_{2 e}^{(1)} (Y, V) = 1$ , $a_{2 e}^{(2)} (Y, V) = Y$ , $a_{2 e}^{(3)} (Y, V) = V$ , ( $(1 + X^{2})^{- 1}$ , $(1 + V^{2})^{- 1}$ and $(1 + | X | + | V |)^{- 2}$ as instruments for the IPW quantile equation, 1, Y and V as instruments for the propensity score equation);
$π_{f} (Y, V) = {1 + \exp {- [φ_{0} + φ_{1} (Y - V) + φ_{2} (Y - V)^{2}]}}^{- 1}$ , $k_{f} = 3$ , $a_{1 f}^{(1)} (X, V) = (1 + X^{2})^{- 1}$ , $a_{1 f}^{(2)} (X, V) = (1 + V^{2})^{- 1}$ , $a_{1 f}^{(3)} (X, V) = (1 + | X | + | V |)^{- 2}$ , $l_{f} = 3$ , $a_{2 f}^{(1)} (Y, V) = 1$ , $a_{2 f}^{(2)} (Y, V) = Y - V$ , $a_{2 f}^{(3)} (Y, V) = (Y - V)^{2}$ , ( $(1 + X^{2})^{- 1}$ , $(1 + V^{2})^{- 1}$ and $(1 + | X | + | V |)^{- 2}$ as instruments for the IPW quantile equation, 1, Y −V and $(Y - V)^{2}$ as instruments for the propensity score equation).

The estimates of the MSE obtained in the case $τ = 0.75$ from 1000 replications, with sample size $n \in {200, 400, \dots, 1400, 1600}$ , are given in Table .

Table 1. Estimates of $E (∥ \hat{θ} - θ ∥^{2})$ over 1000 replicates when $τ = 0.75$ .

Display Table

Note that none of the GMM estimators in models $P_{s}$ could be efficient in the initial model $P$ , but only approximately efficient, if the instruments are suitably chosen, which could be a delicate point in practice. Here we observe that the instruments $a_{1 b}^{(k)}$ , involved in the first equations in the first equations of the model $P_{b}$ performs better that the instruments $a_{1 a}^{(k)}$ used in $P_{a}$ . We observe a similar phenomenon for the propensity score equations when looking at the columns $P_{d}$ , $P_{e}$ and $P_{f}$ . The case in $P_{d}$ corresponds to common practice when one trusts the logistic regression for the propensity score. The cases in $P_{e}$ and $P_{f}$ correspond to our approach based on instruments with more effective instruments in the later case. The non-orthogonality of the quantile model equations and the propensity score equations could explain the better results in $P_{f}$ . A joint estimation of the two set of equations with effective instruments could improve over the common practice. Next, let us notice that the models $P_{c}$ and $P_{f}$ are similar: we use the same instruments for the first equation in $P_{s}$ . Moreover, in $P_{c}$ we use the true propensity score, while in $P_{f}$ we use an estimated propensity score obtained from a model that is somehow close to the true propensity score. As the two equations in the model $P_{s}$ are not orthogonal, estimating the propensity score could improve the asymptotic variance of the estimators of $\hat{θ}$ . This is related to the so-called puzzling phenomenon noticed by Prokhorov and Schmidt (Citation2009). Here, even if propensity score the model is slightly wrong, there is still a gain of MSE. Let us also note the surprisingly good results for the model $P_{f}$ in which $\log [π / (1 - π)]$ is approximated by a quadratic function of Y −V. Using the same instruments for the conditional quantile equations, the estimation with missing data is even better than in the case with full data (compare results for model $P_{f}$ to those for model $P_{b}$ ). This could be explained by the fact that in model $P_{b}$ we do not use the optimal instruments that should be proportional to the conditional density of the error term at the origin. The weighting introduced by the propensity score seems, in some sense, to compensate the non-optimal instruments. This suggests further possible improvements based on other choices of instrumental variables in order to approach efficiency.

5. Is imputation really informative?

Multiple imputation is a widely used method to generate substitute values when data are missing. However, under the MAR assumption, the interest of multiple imputation in the context of conditional moment restriction models is at least questionable, as discussed in the following.

Consider that $(D, W^{'}, V^{'}, D U^{'})^{'}$ is always observed and consider the MAR assumption (19) $(U, V) ⊥⊥ D ∣ W .$ (19) Then, any substitute observation generated from the law of $\tilde{U}$ is adequate to replace a missing U, where the law of $\tilde{U}$ should be such that $\begin{aligned} L (\tilde{U} | \tilde{W}, \tilde{V}, \tilde{D} = 0) = L (U | W, V, D = 1) \\ = L (\tilde{U} | \tilde{W}, \tilde{V}, \tilde{D} = 1) . \end{aligned}$ (Here, $L (V_{1} | V_{2})$ denotes the conditional law of $V_{1}$ given $V_{2}$ .) Since, in general, the law $L (U ∣ W, V, D = 1)$ is unknown, one can estimate it, parametrically or nonparametrically, and generate substitute observations from this estimate. This is the so-called parametric or nonparametric imputation. See, for instance, Wang and Chen (Citation2009), Wei et al. (Citation2012), Chen and Van Keilegom (Citation2013) for some nonparametric imputation applications.

The equivalence established by Theorem 2.1 for models defined by moment restrictions, implies that all the information on the parameter θ in the initial model under the MAR assumption (Equation19(19) $(U, V) ⊥⊥ D ∣ W .$ (19) ) is contained in the model defined by the equations (Equation6(6) $M_{2} : \{\begin{cases} E [\frac{D}{π (W)} ρ_{j} (γ, W, V, U)] = 0, \forall j \in J, \\ E [\frac{D}{π (W)} - 1 | V, W] = 0, \end{cases}$ (6) ). Let us point out that the last equation of the model (Equation6(6) $M_{2} : \{\begin{cases} E [\frac{D}{π (W)} ρ_{j} (γ, W, V, U)] = 0, \forall j \in J, \\ E [\frac{D}{π (W)} - 1 | V, W] = 0, \end{cases}$ (6) ) includes the information contained in the incomplete observations. Indeed, to estimate $π (\cdot),$ parametrically or nonparametrically, one uses all the observations of W. This remark opens new perspectives for defining estimators of θ without using substitute observations. Moreover, this remark sheds some new light on a common justification used in the literature, namely that imputation is necessary in order to capture the information contained in the partially observed data.

6. Conclusions

We consider a statistical model defined by an arbitrary number of moment equations. Our framework includes a large panel of models defined through conditional and/or unconditional moments. Next, we assume that some variables are missing at random. In this setup of modelling with missing data, we present a model equivalence result. It states that the initial statistical model together with the MAR mechanism is equivalent to a moment equations model. Using the equivalent model could greatly simplify the estimation and the inference with missing data problems. We discuss several consequences for widely used models, including the quantile regressions.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Notes on contributors

Marian Hristache

Marian Hristache is Associated Professor, Ecole Nationale de la Statistique et de l'Analyse de l'Information (Ensai), Rennes, France (E-mail: [email protected]).

Valentin Patilea

Valentin Patilea is Professor, Ecole Nationale de la Statistique et de l'Analyse de l'Information (Ensai), and Center for Research in Economics and Statistics (CREST), Rennes, France (E-mail: [email protected]).

References

Ai, C., & Chen, X. (2003). Efficient estimation of models with conditional moment restrictions containing unknown functions. Econometrica, 71, 1795–1843. doi: 10.1111/1468-0262.00470
Web of Science ®Google Scholar
Ai, C., & Chen, X. (2007). Estimation of possibly misspecified semiparametric conditional moment restriction models with different conditioning variables. Journal of Econometrics, 141, 5–43. doi: 10.1016/j.jeconom.2007.01.013
Web of Science ®Google Scholar
Ai, C., & Chen, X. (2012). The semiparametric efficiency bound for models of sequential moment restrictions containing unknown functions. Journal of Econometrics, 170, 442–457. Thirtieth Anniversary of Generalized Method of Moments. doi: 10.1016/j.jeconom.2012.05.015
Web of Science ®Google Scholar
Chen, X., Hong, H., & Tarozzi, A. (2008). Semiparametric efficiency in GMM models with auxiliary data. The Annals of Statistics, 36, 808–843. doi: 10.1214/009053607000000947
Web of Science ®Google Scholar
Chen, S. X., & Van Keilegom, I. (2013). Estimation in semiparametric models with missing data. Annals of the Institute of Statistical Mathematics, 65, 785–805. doi: 10.1007/s10463-012-0393-6
Web of Science ®Google Scholar
Chen, X., Wan, A. T. K., & Zhou, Y. (2014). Efficient quantile regression analysis with missing observations. Journal of the American Statistical Association, 110(510), 723–741. doi: 10.1080/01621459.2014.928219
Web of Science ®Google Scholar
Chen, X., Wan, A. T. K., & Zhou, Y. (2015). Efficient quantile regression analysis with missing observations. Journal of the American Statistical Association, 110, 723–741. doi: 10.1080/01621459.2014.928219
Web of Science ®Google Scholar
Cheng, P. E. (1994). Nonparametric estimation of mean functionals with data missing at random. Journal of the American Statistical Association, 89, 81–87. doi: 10.1080/01621459.1994.10476448
Web of Science ®Google Scholar
Domínguez, M. A., & Lobato, I. N. (2004). Consistent estimation of models defined by conditional moment restrictions. Econometrica, 72, 1601–1615. doi: 10.1111/j.1468-0262.2004.00545.x
Web of Science ®Google Scholar
Graham, B. S. (2011). Efficiency bounds for missing data models with semiparametric restrictions. Econometrica, 79, 437–452. doi: 10.3982/ECTA7379
Web of Science ®Google Scholar
Heitjan, D. F., & Rubin, D. B. (1991). Ignorability and coarse data. The Annals of Statistics, 19, 2244–2253. doi: 10.1214/aos/1176348396
Web of Science ®Google Scholar
Hristache, M., & Patilea, V. (2016). Semiparametric efficiency bounds for conditional moment restriction models with different conditioning variables. Econometric Theory, 32, 917–946. doi: 10.1017/S0266466615000080
Web of Science ®Google Scholar
Hristache, M., & Patilea, V. (2017). Conditional moment models with data missing at random. Biometrika, 104, 735–742. doi: 10.1093/biomet/asx025
Web of Science ®Google Scholar
Lavergne, P., & Patilea, V. (2013). Smooth minimum distance estimation and testing with conditional estimating equations: uniform in bandwidth theory. Journal of Econometrics, 177, 47–59. doi: 10.1016/j.jeconom.2013.05.006
Web of Science ®Google Scholar
Little, R., & Rubin, D. (2002). Statistical analysis with missing data. Wiley series in probability and mathematical statistics. Probability and mathematical statistics. John Wiley & Sons, Inc., Hoboken, New Jersey.
Google Scholar
Müller, U. U. (2009). Estimating linear functionals in nonlinear regression with responses missing at random. The Annals of Statistics, 37, 2245–2277. doi: 10.1214/08-AOS642
Web of Science ®Google Scholar
Prokhorov, A., & Schmidt, P. (2009). GMM redundancy results for general missing data problems. Journal of Econometrics, 151, 47–55. doi: 10.1016/j.jeconom.2009.03.010
Web of Science ®Google Scholar
Robins, J. M., & Gill, R. D. (1997). Non-response models for the analysis of non-monotone ignorable missing data. Statistics in Medicine, 16, 39–56. doi: 10.1002/(SICI)1097-0258(19970115)16:1<39::AID-SIM535>3.0.CO;2-D
PubMed Web of Science ®Google Scholar
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55. doi: 10.1093/biomet/70.1.41
Web of Science ®Google Scholar
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592. doi: 10.1093/biomet/63.3.581
Web of Science ®Google Scholar
Tan, Z. (2011). Efficient restricted estimators for conditional mean models with missing data. Biometrika, 98, 663–684. doi: 10.1093/biomet/asr007
Web of Science ®Google Scholar
Tsiatis, A. (2007). Semiparametric theory and missing data. New York: Springer-Verlag.
Google Scholar
van der Laan, M. J., & Robins, J. M. (2003). Unified methods for censored longitudinal data and causality. New York: Springer-Verlag.
Google Scholar
Wang, D., & Chen, S. X. (2009). Empirical likelihood for estimating equations with missing values. The Annals of Statistics, 37, 490–517. doi: 10.1214/07-AOS585
Web of Science ®Google Scholar
Wei, Y., Ma, Y., & Carroll, R. J. (2012). Multiple imputation in quantile regression. Biometrika, 99, 423–438. doi: 10.1093/biomet/ass007
PubMed Web of Science ®Google Scholar
Wooldridge, J. M. (2007). Inverse probability weighted estimation for general missing data problems. Journal of Econometrics, 141, 1281–1301. doi: 10.1016/j.jeconom.2007.02.002
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Download PDF

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

An equivalence result for moment equations when data are missing at random

ABSTRACT

1. Introduction

2. Equivalent moment model

3. Some examples revisited

3.1. Mean functionals with data missing at random

3.2. Quantile regression with data missing at random

4. Restricted estimators for quantile regressions and general conditional moment models with data missing at random

4.1. Simulation study

Table 1. Estimates of $E (∥ \hat{θ} - θ ∥^{2})$ over 1000 replicates when $τ = 0.75$ .

5. Is imputation really informative?

6. Conclusions

Disclosure statement

Notes on contributors

Marian Hristache

Valentin Patilea

References

Information for

Open access

Opportunities

Help and information

An equivalence result for moment equations when data are missing at random

ABSTRACT

1. Introduction

2. Equivalent moment model

3. Some examples revisited

3.1. Mean functionals with data missing at random

3.2. Quantile regression with data missing at random

4. Restricted estimators for quantile regressions and general conditional moment models with data missing at random

4.1. Simulation study

Table 1. Estimates of E(∥θˆ−θ∥2) over 1000 replicates when τ=0.75.

5. Is imputation really informative?

6. Conclusions

Disclosure statement

Additional information

Notes on contributors

Marian Hristache

Valentin Patilea

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date

Table 1. Estimates of $E (∥ \hat{θ} - θ ∥^{2})$ over 1000 replicates when $τ = 0.75$ .