225
Views
1
CrossRef citations to date
0
Altmetric
Articles

An equivalence result for moment equations when data are missing at random

&
Pages 199-207 | Received 19 Dec 2018, Accepted 21 Sep 2019, Published online: 09 Oct 2019

ABSTRACT

We consider general statistical models defined by moment equations when data are missing at random. Using the inverse probability weighting, such a model is shown to be equivalent with a model for the observed variables only, augmented by a moment condition defined by the missing mechanism. Our framework covers a large class of parametric and semiparametric models where we allow for missing responses, missing covariates and any combination of them. The equivalence result is stated under minimal technical conditions and sheds new light on various aspects of interest in the missing data literature, as for instance the efficiency bounds and the construction of the efficient estimators, the restricted estimators and the imputation.

1. Introduction

Models defined by moment and conditional moment equations are widely used in statistics, biostatistics and econometrics; see, for instance, Ai and Chen(Citation2003Citation2012), Domínguez and Lobato (Citation2004), and the references therein. Here, we investigate general moment or conditional moment equation models with missing data. The main idea we propose is that under a missing at random assumption, the initial model with missing data is equivalent with a inverse probability weighting moment equations model for the complete observations, augmented by a moment condition defined by the missing mechanism. The equivalence, a generalisation of the GMM equivalence result of Graham (Citation2011), is stated in terms of sets of probability measures. It has numerous implications and provides valuable insight, for instance on the efficiency bound calculations and the construction of efficient estimators.

In the framework of missing data, the assumption of missing at random (MAR) is presumably the most used when trying to describe an ignorable mechanism on the missingness. However, this concept, first introduced by Rubin (Citation1976), does not have the same meaning for everyone. For simplicity, let the full observations be i.i.d. replications of a vector L=(X,Y,Z) and let R=(RX,RY,RZ){0,1}3 be a random vector such that its component takes the value 1 if we observe the corresponding component of L and 0 otherwise. For Rubin (Citation1976) (see also, for example, Little & Rubin, Citation2002; Robins & Gill, Citation1997), MAR means that missingness depends only on the observed components, denoted by L(R), of L: (1) the conditional law L(R|L) of R given Lis the same as the conditional law L(R|L(R)) of R given L(R).(1) This concept was generalised to CAR, coarsening at random, by Heitjan and Rubin (Citation1991) (see also, for example, van der Laan and Robins (Citation2003)): L(C|L) is the same as L(C|ϕ(C,L)) for an always observable transformation ϕ(C,L) of the full data L and the censoring variable C. In the context of regression-like models, the MAR assumption is usually stated in a different and more restrictive way. A strongly ignorable selection mechanism (also called conditional independence, or selection on observables, etc.) means that, assuming some components of L are always observed, (2) the conditional law L(R|L) of R given L is the same as the conditional law of R given the always observedcomponents of L.(2) This assumption was originally introduced by Rosenbaum and Rubin (Citation1983) in the framework of randomised clinical trials, which corresponds in our simple example, with L=(X,Y,Z), to the case where, for example, X is always observed, and one and only one of Y and Z is observed. This means that the selection vector R takes the form R=(1,D,1D), where Y is observed iff D = 1 and Z is observed iff D = 0. In this situation, MAR means P(D=1X,Y,Z)=P(D=1X,Y)=1P(D=0X,Y,Z)=1P(D=0X,Z)=P(D=1X,Z), or, equivalently, (3) D⊥⊥ZX,YandD⊥⊥YX,Z.(3) Meanwhile a strongly ignorable missingness mechanism writes P(D=1X,Y,Z)=P(D=1X), or, equivalently, (4) D⊥⊥(Y,Z)X.(4) Clearly, condition (Equation4) implies condition (Equation3), but the reverse is not true in general. In the present work we consider the case of i.i.d. replications of a vector containing missing components for which the same subvector is missing for the incomplete replicates. In this case the MAR assumption (Equation1) and the the strongly ignorable MAR assumption (Equation2) coincide (and are equivalent to CAR), as is it is also the case, for example, in Cheng (Citation1994), Tsiatis (Citation2007), Graham (Citation2011), among others.

Other MAR-related assumptions appear in the literature. For instance, when the response Y is missing, while X and Z are observed, Wei, Ma, and Carroll (Citation2012) consider the assumption RY⊥⊥(X,Y)|Z that is stronger than the MAR assumption (Equation2), commonly used for regression models. Another assumption for the missingness mechanism is introduced in Wooldridge (Citation2007) : W=(X,Y) and S{0,1} is a random variable such that W and Z are observed whenever S = 1, and S⊥⊥W|Z. Wooldridge's assumption is more general than the MAR condition (Equation2) where Z is supposed to be always observed. Indeed, Wooldridge (Citation2007) does not suppose that W and/or Z are missing if S=0.

The paper is organised as follows. The main equivalence result is stated in Section 2. In Section 3, we revisit some examples considered in the literature in the MAR setup: estimating mean functionals in parametric and nonparametric regressions; and quantile regression with missing responses and/or covariates. For these examples, our equivalence result suggests new ways for calculating efficiency bounds and constructing efficient estimators, using for instance the GMM, empirical likelihood approaches, the SMD approach of Ai and Chen (Citation2007), or the kernel-based method of Lavergne and Patilea (Citation2013). In Section 4 we reinterpret some classes of so-called restricted estimators; see, for instance, Tsiatis (Citation2007) and Tan (Citation2011). Finally, in Section 5 we use our general result to discuss on a common belief that the (multiple) imputation is necessary in order to capture all the information from the partially observed data.

2. Equivalent moment model

The following statement is a version of Theorems 1 and 2 in Hristache and Patilea (Citation2017). The proof is very similar and hence will be omitted. In the following, vectors a columns matrices and for any matrix A, A denotes its transpose.

Theorem 2.1

Let M1 and M2 be two models defined for random vectors (D,W,V,U){0,1}×RdW×RdV×RdU as follows: (5) M1:E[ρj(γ,W,V,U)]=0,jJ,D⊥⊥{U,V}W,(5) and (6) M2:EDπ(W) ρj(γ,W,V,U)=0,jJ,EDπ(W)1 | V,W=0,(6) where γΓ is an unknown (possibly infinite dimensional) parameter, ρj:Γ×RdW×RdV×RdUR, for jJ, is a collection of known measurable functions, and π is a unknown measurable function such that π(W)>0 almost surely.

The models M1 and M2 are equivalent if restricted to the laws of (D,W,V,DU); more precisely,

  1. (D,W,V,U)M1  (D,W,V,U)M2,

  2. (D,W,V,U)M2   (D~,W~,V~,U~)M1 such that (D~,W~,V~,D~U~) and (D,W,V,DU) have the same distribution.

Remarks

  1. The parameter γ in model M1 could include parameters of interest and parameters of nuisance.

  2. The function π() usually called the propensity score, could be considered completely unknown and modelled nonparametrically, or modelled using a parametric model. With at hand an estimate of π() obtained from the second equation in the model M2, one could use existing moment equation approaches for the estimation of the parameters in the first equation of M2. See our Example 3.1.

  3. The link of this theorem with models where data are missing at random is made if we consider that the vector U is observed if and only if D = 1. The theorem then basically says that at the observational level, which means for the laws of the observed vector (D,W,V,DU), the two models M1 and M2 are equivalent. As a consequence, inference for the law of (D,W,V,U) in the model M1, a moment conditions model under an assumption of data missing at random, could be done based on the model M2, which is defined using only the observed part (D,W,V,DU) of the vector vector (D,W,V,U). In particular, efficiency bound calculations and efficient estimator constructions could be done in the model M2, which in many cases could be much easier.

  4. The underlying condition ‘DU is always observed’ includes the usual case D=0if U is not observed,D=1if U is observed, but it is more general. When D = 1 one observes the value of U. Meanwhile, one should read that when D=0, U could be observed or not since whatever the value of U is, DU = 0.

3. Some examples revisited

In this section we present two examples of models already studied in the literature for which our approach gives new insights and sometimes allows for simpler methods for obtaining efficiency bounds and asymptotically efficient estimators. The guiding principle is to use Theorem 2.1 and put the model of interest, in the presence of a MAR mechanism, under an equivalent form (7) E[g1(θ,α,X,Y,Z)|X]=0E[g2(α,X,Y,Z)|X,Z]=0,(7) where the two sets of equations are orthogonal, meaning that E[g1(θ,α,X,Y,Z)g2(α,X,Y,Z)|X,Z]=0. The equivalent model (Equation7) has a sequential moment structure that allows to compute the efficiency bound; see Ai and Chen (Citation2012). Moreover, the finite dimensional interest parameter θ can be efficiently estimated from the first equations, with the (possibly infinite dimensional) nuisance parameter α known or suitably estimated from the last equations. A similar statement on the efficient estimation of θ, in the particular case of a finite dimensional α and without conditioning on X and X, Z, can be found in Theorem 2.2, point 8, of Prokhorov and Schmidt (Citation2009).

3.1. Mean functionals with data missing at random

Consider the problem of estimating the mean of functionals of the variables in a parametric regression model with missing responses: (8) E[h(X,Y)θ]=0E[Yr(X,α)|X]=0.(8) The parameter of interest here is θ=E[h(X,Y)], where h(,) is some given squared-integrable function; see Müller (Citation2009). Hristache and Patilea (Citation2017) considered the same framework and focused on the case where h(X,Y) does not depend on X. Here we investigate the general case where h(X,Y) that could also depend on X. Some usual examples are the mean of the response variable (h(x,y)=y), the second-order moment of the response (h(x,y)=vec(yy)), the cross-product of the response and the covariate vector (h(x,y)=vec(yx)). (Here, vec() is the vectorisation operator that transforms a matrix in a column vector by stacking the columns of the matrix.) For simplicity, we take Y with real values in the following of this section.

The regression function r(x,α) has a known (parametric) form, X is always observed, Y is only observed when D = 1 and a MAR assumption holds : D⊥⊥Y|X. With π(x)=P(D=1|X=x), the model can be written, at the observational level, under the following equivalent form: (9) EDπ(X) [h(X,Y)θ]=0E{D[Yr(X,α)]|X}=0EDπ(X)1|X=0.(9) The last two equations being orthogonal, since EDπ(X)1D[Yr(X,α)]|X=1π(X)1E{D[Yr(X,α)]|X}=0, it is also equivalent to the model defined by the following system of orthogonal equations, where σ2(X) stands for the conditional variance V(Y|X): (10) EDπ(X)[h(X,Y)θ]1σ2(X)π(X)EDπ(X)h(X,Y)(Yr(X,α))|XD[Yr(X,α)]EDπ(X)(h(X,Y)θ)|XcDπ(X)1=0E{D [Yr(X,α)]|X}=0EDπ(X)1|X=0.(10) Solving for θ, we get θ=E[Φ(Y,X,D;α,σ2,π,η1,η2)] where Φ(Y,X,D;α,σ2,π,η1,η2)=Dπ(X)h(X,Y)EDπ(X)h(X,Y)|X×Dπ(X)11σ2(X)π(X)EDπ(X)h(X,Y)(Yr(X,α))|X×D[Yr(X,α)],η1(X)=E[Dh(X,Y)|X] and η2(X)=η2(X;α)=E[Dh(X,Y)(Yr(X,α))|X]. Let αˆ be an estimator of α obtained in the model. With the variance σ2() and the functions η1() and η2(;) estimated nonparametrically, the plug-in estimator θˆ=1ni=1nΦ(Yi,Xi,Di;αˆ,σ2ˆ,πˆ,η1ˆ,η2ˆ) would be efficient. Since the first equation in system (Equation10) is orthogonalised with respect to the last one, for the propensity score π(), one could use a parametric model without affecting the efficiency bound.

3.2. Quantile regression with data missing at random

A particular setting of quantile regression with missing data at random is considered in Wei et al. (Citation2012). For 0<τ<1, the conditional quantile Qτ(Y|X,Z) of the always observed response Y given the regressor vectors Z (always observed) and X (observed iff D = 1) is assumed to be linear, (11) Qτ(Y|X,Z)=Xβ1,τ+Zβ2,τ,(11) and the missingness mechanism is defined by the strong missing at random condition (12) D⊥⊥(X,Y)|Z.(12) Taking in (Equation6) U = X, V = Y, W = Z, ρj(βτ,W,V,U)=(X,Z)[𝟙{YXβ1,τZβ2,τ0}τ]×aj(U,W)ρ(X,Y,Z,βτ)×aj(X,Z), jN, where the family of functions {aj}jN spans L2(X,Z), the model defined by (Equation11) and (Equation12) can be written under the following equivalent form: (13) E[Dρ(Y,X,Z,βτ)|X,Z]=0EDπ(Z)1|Z=0.(13) The two sets of equations being already orthogonal (with respect to the σ-field σ(X,Z)), in this situation we can efficiently estimate the parameter βτ=(β1,τ,β2,τ) from the complete data only, that is from the model defined by (Equation11) keeping for the statistical analysis only the observations for which all the components of the vector (Y,X,Z) are observed. The gain in efficiency observed in the simulation experiment of Wei et al. (Citation2012) for their multiple imputation improved estimator comes, in our opinion, from the supplementary parametric assumption on the form of the conditional density of X given Z (see their Assumption 4).

A more general linear quantile regression model defined by (Equation11) with missing data at random is considered in Chen, Wan, and Zhou (Citation2014). With their notations, we have (14) Y=Zθ(τ)+ε,P(ε0|Z)=τ,0<τ<1,(14) for the full data model. They also denote by X the always observed components of the vector (Y,Z) and with Xc the components of the same vector that are observed iff the binary variable D takes the value 1 and use the ‘standard’ missing at random assumption P(D=1|Y,Z)=P(D=1|X,Xc)=P(D=1|X)=π(X). This fits our framework by taking U = X, V = 1, W=Xc and ρj(θ(τ),W,V,U)=Z[𝟙{YZθ(τ)0}τ]×aj(U,W)ρ(Y,Z,θ(τ))×aj(Z),jN, where the family of functions {aj}jN spans L2(Z). The equivalent moment equations model, at the observational level, can be written as (15) EDπ(X)Z[𝟙{YZθ(τ)0}τ]|Z=0EDπ(X)1|X=0.(15) The information bound for this model is given in Hristache and Patilea (Citation2016). It can not be calculated explicitly, except some special cases, which includes the missing responses as before or the case where X or/and Z are discrete. It is different from the information bound given in Chen, Hong, and Tarozzi (Citation2008) which corresponds to a model defined by an unconditional quantile moment and a MAR assumption and could be represented equivalently under the form (16) EDπ(X)Z[𝟙{YZθ(τ)0}τ]=0EDπ(X)1|X=0.(16) Models (Equation15) and (Equation16) are quite different and so are the corresponding efficiency bounds, so that no estimation procedure given in Chen et al. (Citation2014) could be efficient in their linear quantile regression model (Equation14) with missing data at random.

4. Restricted estimators for quantile regressions and general conditional moment models with data missing at random

The model defined by the regression-like equation E[ρ(θ,Y,X,V)|X,V]=0, and the MAR selection mechanism P(D=1|Y,X,V,W)=P(D=1|W)=π(W) is equivalent, at the observational level, to the following model defined by conditional moment equations : P:EDπ(W) ρ(θ,Y,X,V)|X,V=0,EDπ(W)1|W=0. This framework includes many situations. For instance, taking W=(Y,V,Z) we obtain the case in which some regressors (conditioning variables) X are missing, while with W=(X,V,Z) we cover the case of missing responses. Splitting Y in an observed subvector Yo and a not always observed subvector Yu, with W=(Yo,V,Z) this corresponds to the case where both some responses and some covariates are missing. In all these examples, U is the vector of not always observed components of the data vector.

For the model P(1):EDπ(W)ρ(θ,Y,X,V)|X,V=0, denoting by P0 the true law of (Y,X,V,Z), the tangent space is T(1)=s{L2(P0)}d: E(s)=0,Dπ(W)EDπ(W)ρ(θ,Y,X,V)s(Y,X,V,Z)|X,VEDπ(W)ρ(θ,Y,X,V)s(Y,X,V,Z)|X,V=0. For the model P(2):EDπ(W)1|W=0, the tangent space is T(2)=s{L2(P0)}d: E(s)=0,Dπ(W)1EDπ(W)1s(Y,X,V,Z)|W=0. The tangent space T of P=P(1)P(2) is (see Hristache & Patilea, Citation2016) T=T(1)T(2). We obtain the efficient score S¯θ by projecting the score Sθ on T, S¯θ=Π(Sθ|T)=Π(Sθ|T(1)+T(2)¯), which gives the following solution : S¯θ=a1(X,V)Dπ(W)ρ(θ,Y,X,V)+a2(W)×Dπ(W)1T(1)+T(2), where a1(X,V)=E(θρ|X,V)+EE(a1ρ|W)1ππρ|X,V×E11π(W)ρρ|X,V,a2(W)=E[a1(X,V)ρ|W].

Remark

S¯θ is also the efficient score in the model P:Ea1(X,V)Dπ(W)ρ(θ,Y,X,V)=0Ea2(W)Dπ(W)1=0,, or in the model defined by the moment condition Ea1(X,V)Dπ(W)ρ(θ,Y,X,V)+a2(W)×Dπ(W)1=0. As shown in Hristache and Patilea (Citation2016), a1 satisfies an equation of the form a1(X,V)=γ(X,V)+T(a1(X,V)), with T a contraction operator. The solution of this equation is unique, but in order to obtain it one needs to use nonparametric estimators at each step of the iterative procedure. An alternative approach would be to consider finite dimensional subspaces S1T(1) and S2T(2) when calculating the ‘efficient score’, leading to an approximately efficient score. We obtain in this way what is known in the literature as restricted estimators. We can write: T(1)=s=a1(X,V)Dπ(W)ρ(θ,Y,X,V):a1L2(P0)S1T(1) finite dimensional   a1(1),,a1(k)L2(P0) s.t. S1=lina1(i)(X,V)Dπ(W)ρ(θ,Y,X,V):a1(i)(X,V)Dπ(W)ρ(θ,Y,X,V):1ikS1=s{L2(P0)}d:Ea1(i)Dπρs=0,a1(i)(X,V)Dπ(W)ρ(θ,Y,X,V):1ik. Compare to T(1)=s{L2(P0)}d:EDπρs|X,V=0. Similarly for S2T(2): T(2)=s=a2(W)Dπ(W)1:a2L2(P0)S2T(2) finite dimensional   a2(1),,a2(l)L2(P0) s.t. S2=lina2(j)(W)Dπ(W)1:1jlS2=s{L02(P0)}d:Ea2(j)Dπ(W)1sa2(j)Dπ(W)1s=0,1jk. An optimal class 1 restricted estimator (see Tan, Citation2011; Tsiatis, Citation2007) is solution of the approximated efficient score equation Ea¯1(1)(X,V)Dπ(W)ρ(θ,Y,X,V)+a¯2(1)(W)×Dπ(W)1=0, where a¯1(1) and a¯2(2) are given by S¯θ=Π(Sθ|S1+S2)=a¯1(1)(X,V)Dπ(W)ρ(θ,Y,X,V)+a¯2(1)(W)×Dπ(W)1. In fact, S¯θ is the efficient score in the following moment equations model: P:Ea1(1)(X,V)Dπ(W)ρ(θ,Y,X,V)=0Ea1(k)(X,V)Dπ(W)ρ(θ,Y,X,V)=0Ea2(1)(W)Dπ(W)1=0Ea2(l)(W)Dπ(W)1=0 This allows for a new, simple and intuitive interpretation of the optimal class 1 restricted estimators as efficient estimators in a larger model, obtained from the initial one by using appropriate ‘instruments’ to transform the conditional moment equations in a (growing) number of unconditional moment conditions. Another advantage of this new perspective is the access to the most commonly used methods of obtaining efficient estimators in moment equations models such as GMM, SMD (see Lavergne & Patilea, Citation2013) or empirical likelihood estimators.

Similar procedures can be used for class 2 restricted estimators, based on Π(Sθ|S1+T(2))=a¯1(2)(X,V)Dπ(W)ρ(θ,Y,X,V)+a¯2(2)(W)Dπ(W)1 and class 3 restricted estimators (Tan, Citation2011), based on Π(Sθ|T(1)+S2)=a¯1(3)(X,V)Dπ(W)ρ(θ,Y,X,V)+a¯2(3)(W)Dπ(W)1.

4.1. Simulation study

The approach on restricted estimators is illustrated in a setting already considered by Chen, Wan, and Zhou(Citation2015); see their Example 1, scenario S2. With the notations of the previous section, the data are generated from the following model: (17) Y=θ0+θ1X+θ2V+0.5[1+(X+V)]ε,εN(0,1),(17) where (θ0,θ1,θ2)=(1,1,1) and (X,V) follows a centred bivariate normal distribution with unit variances and correlation equal to 0.5. The parameter of interest here is the vector coefficient (θ0(τ),θ1(τ),θ2(τ)) of the conditional quantile of Y given X and V: (18) Qτ(YX,V)=θ0(τ)+θ1(τ)X+θ2(τ)V,(18) with θ0(τ)=θ0+Qτ(ε), θ1(τ)=θ1+0.5Qτ(ε), θ2(τ)=θ2+0.5Qτ(ε), where Qτ(ε) is the τth quantile of ϵ, τ(0,1). Herein, we only report the case τ=0.75. The variables Y and V are always observed, while X is observed if and only if D = 1, where D is a Bernoulli random variable such that P(D=1Y,V)=0.4(1+sin2(YV))𝟙{|YV|1} + 1𝟙{|YV|1}=π(W), with W=(Y,V). The model for the fully observed data is defined by the regression-like equation E[ρ(θ,Y,X,V)|X,V]=0, where ρ(θ,Y,X,V)=𝟙{Yθ0θ1Xθ2V0}τ. Under the MAR selection mechanism P(D=1|Y,X,V)=P(D=1|Y,V)=π(W). it is equivalent, at the observational level, to the following model defined by conditional moment equations: P:EDπ(W)(𝟙{Yθ0θ1Xθ2V0}τ)X,V=0,EDπ(W)1W=0. The restricted estimators considered are obtained by the generalised method of moments in the following models Ps, s{a,b,c,d,e,f}, which contain the model P: Ps:EDπs(φ,Y,V)(𝟙{Yθ0θ1Xθ2V0}τ)Dπs(φ,Y,V)(𝟙{Yθ0θ1Xθ2V0}τ)a1s(k)(X,V)=0,k{1,,ks}EDπs(φ,Y,V)1a2s(l)(Y,V)=0,l{1,,ls}, where:

  1. πa1, D1, ka=3, a1a(1)(X,V)=1, a1a(2)(X,V)=X, a1a(3)(X,V)=V (no missing data, 1, X and V as instruments);

  2. πb1, D1, kb=3, a1b(1)(X,V)=(1+X2)1, a1b(2)(X,V)=(1+V2)1, a1b(3)(X,V)=(1+|X|+|V|)2 (no missing data, (1+X2)1, (1+V2)1 and (1+|X|+|V|)2 as instruments);

  3. πc(Y,V)=0.4(1+sin2(YV))𝟙{|YV|1}+1𝟙{|YV|1}, kc=3, a1c(1)(X,V)=(1+X2)1, a1c(2)(X,V)=(1+V2)1, a1c(3)(X,V)=(1+|X|+|V|)2 (true propensity score, (1+X2)1, (1+V2)1 and (1+|X|+|V|)2 as instruments);

  4. πd(Y,V)={1+exp[(φ0+φ1Y+φ2V)]}1, with φ0, φ1 and φ2 estimated from a logistic regression, kd=3, a1d(1)(X,V)=(1+X2)1, a1d(2)(X,V)=(1+V2)1, a1d(3)(X,V)=(1+|X|+|V|)2 (propensity score estimated by a logistic regression on Y and V, (1+X2)1, (1+V2)1 and (1+|X|+|V|)2 as instruments for the IPW quantile equation);

  5. πe(Y,V)={1+exp[(φ0+φ1Y+φ2V)]}1, ke=3, a1e(1)(X,V)=(1+X2)1, a1e(2)(X,V)=(1+V2)1, a1e(3)(X,V)=(1+|X|+|V|)2, le=3, a2e(1)(Y,V)=1, a2e(2)(Y,V)=Y, a2e(3)(Y,V)=V, ((1+X2)1, (1+V2)1 and (1+|X|+|V|)2 as instruments for the IPW quantile equation, 1, Y and V as instruments for the propensity score equation);

  6. πf(Y,V)={1+exp{[φ0+φ1(YV)+φ2(YV)2]}}1, kf=3, a1f(1)(X,V)=(1+X2)1, a1f(2)(X,V)=(1+V2)1, a1f(3)(X,V)=(1+|X|+|V|)2, lf=3, a2f(1)(Y,V)=1, a2f(2)(Y,V)=YV, a2f(3)(Y,V)=(YV)2, ((1+X2)1, (1+V2)1 and (1+|X|+|V|)2 as instruments for the IPW quantile equation, 1, YV and (YV)2 as instruments for the propensity score equation).

The estimates of the MSE obtained in the case τ=0.75 from 1000 replications, with sample size n{200,400,,1400,1600}, are given in Table .

Table 1. Estimates of E(θˆθ2) over 1000 replicates when τ=0.75.

Note that none of the GMM estimators in models Ps could be efficient in the initial model P, but only approximately efficient, if the instruments are suitably chosen, which could be a delicate point in practice. Here we observe that the instruments a1b(k), involved in the first equations in the first equations of the model Pb performs better that the instruments a1a(k) used in Pa. We observe a similar phenomenon for the propensity score equations when looking at the columns Pd, Pe and Pf. The case in Pd corresponds to common practice when one trusts the logistic regression for the propensity score. The cases in Pe and Pf correspond to our approach based on instruments with more effective instruments in the later case. The non-orthogonality of the quantile model equations and the propensity score equations could explain the better results in Pf. A joint estimation of the two set of equations with effective instruments could improve over the common practice. Next, let us notice that the models Pc and Pf are similar: we use the same instruments for the first equation in Ps. Moreover, in Pc we use the true propensity score, while in Pf we use an estimated propensity score obtained from a model that is somehow close to the true propensity score. As the two equations in the model Ps are not orthogonal, estimating the propensity score could improve the asymptotic variance of the estimators of θˆ. This is related to the so-called puzzling phenomenon noticed by Prokhorov and Schmidt (Citation2009). Here, even if propensity score the model is slightly wrong, there is still a gain of MSE. Let us also note the surprisingly good results for the model Pf in which log[π/(1π)] is approximated by a quadratic function of YV. Using the same instruments for the conditional quantile equations, the estimation with missing data is even better than in the case with full data (compare results for model Pf to those for model Pb). This could be explained by the fact that in model Pb we do not use the optimal instruments that should be proportional to the conditional density of the error term at the origin. The weighting introduced by the propensity score seems, in some sense, to compensate the non-optimal instruments. This suggests further possible improvements based on other choices of instrumental variables in order to approach efficiency.

5. Is imputation really informative?

Multiple imputation is a widely used method to generate substitute values when data are missing. However, under the MAR assumption, the interest of multiple imputation in the context of conditional moment restriction models is at least questionable, as discussed in the following.

Consider that (D,W,V,DU) is always observed and consider the MAR assumption (19) (U,V)⊥⊥DW.(19) Then, any substitute observation generated from the law of U~ is adequate to replace a missing U, where the law of U~ should be such that L(U~|W~,V~,D~=0)=L(U|W,V,D=1)=L(U~|W~,V~,D~=1). (Here, L(V1 | V2) denotes the conditional law of V1 given V2.) Since, in general, the law L(UW,V,D=1) is unknown, one can estimate it, parametrically or nonparametrically, and generate substitute observations from this estimate. This is the so-called parametric or nonparametric imputation. See, for instance, Wang and Chen (Citation2009), Wei et al. (Citation2012), Chen and Van Keilegom (Citation2013) for some nonparametric imputation applications.

The equivalence established by Theorem 2.1 for models defined by moment restrictions, implies that all the information on the parameter θ in the initial model under the MAR assumption (Equation19) is contained in the model defined by the equations (Equation6). Let us point out that the last equation of the model (Equation6) includes the information contained in the incomplete observations. Indeed, to estimate π(), parametrically or nonparametrically, one uses all the observations of W. This remark opens new perspectives for defining estimators of θ without using substitute observations. Moreover, this remark sheds some new light on a common justification used in the literature, namely that imputation is necessary in order to capture the information contained in the partially observed data.

6. Conclusions

We consider a statistical model defined by an arbitrary number of moment equations. Our framework includes a large panel of models defined through conditional and/or unconditional moments. Next, we assume that some variables are missing at random. In this setup of modelling with missing data, we present a model equivalence result. It states that the initial statistical model together with the MAR mechanism is equivalent to a moment equations model. Using the equivalent model could greatly simplify the estimation and the inference with missing data problems. We discuss several consequences for widely used models, including the quantile regressions.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Notes on contributors

Marian Hristache

Marian Hristache is Associated Professor, Ecole Nationale de la Statistique et de l'Analyse de l'Information (Ensai), Rennes, France (E-mail: [email protected]).

Valentin Patilea

Valentin Patilea is Professor, Ecole Nationale de la Statistique et de l'Analyse de l'Information (Ensai), and Center for Research in Economics and Statistics (CREST), Rennes, France (E-mail: [email protected]).

References

  • Ai, C., & Chen, X. (2003). Efficient estimation of models with conditional moment restrictions containing unknown functions. Econometrica, 71, 1795–1843. doi: 10.1111/1468-0262.00470
  • Ai, C., & Chen, X. (2007). Estimation of possibly misspecified semiparametric conditional moment restriction models with different conditioning variables. Journal of Econometrics, 141, 5–43. doi: 10.1016/j.jeconom.2007.01.013
  • Ai, C., & Chen, X. (2012). The semiparametric efficiency bound for models of sequential moment restrictions containing unknown functions. Journal of Econometrics, 170, 442–457. Thirtieth Anniversary of Generalized Method of Moments. doi: 10.1016/j.jeconom.2012.05.015
  • Chen, X., Hong, H., & Tarozzi, A. (2008). Semiparametric efficiency in GMM models with auxiliary data. The Annals of Statistics, 36, 808–843. doi: 10.1214/009053607000000947
  • Chen, S. X., & Van Keilegom, I. (2013). Estimation in semiparametric models with missing data. Annals of the Institute of Statistical Mathematics, 65, 785–805. doi: 10.1007/s10463-012-0393-6
  • Chen, X., Wan, A. T. K., & Zhou, Y. (2014). Efficient quantile regression analysis with missing observations. Journal of the American Statistical Association, 110(510), 723–741. doi: 10.1080/01621459.2014.928219
  • Chen, X., Wan, A. T. K., & Zhou, Y. (2015). Efficient quantile regression analysis with missing observations. Journal of the American Statistical Association, 110, 723–741. doi: 10.1080/01621459.2014.928219
  • Cheng, P. E. (1994). Nonparametric estimation of mean functionals with data missing at random. Journal of the American Statistical Association, 89, 81–87. doi: 10.1080/01621459.1994.10476448
  • Domínguez, M. A., & Lobato, I. N. (2004). Consistent estimation of models defined by conditional moment restrictions. Econometrica, 72, 1601–1615. doi: 10.1111/j.1468-0262.2004.00545.x
  • Graham, B. S. (2011). Efficiency bounds for missing data models with semiparametric restrictions. Econometrica, 79, 437–452. doi: 10.3982/ECTA7379
  • Heitjan, D. F., & Rubin, D. B. (1991). Ignorability and coarse data. The Annals of Statistics, 19, 2244–2253. doi: 10.1214/aos/1176348396
  • Hristache, M., & Patilea, V. (2016). Semiparametric efficiency bounds for conditional moment restriction models with different conditioning variables. Econometric Theory, 32, 917–946. doi: 10.1017/S0266466615000080
  • Hristache, M., & Patilea, V. (2017). Conditional moment models with data missing at random. Biometrika, 104, 735–742. doi: 10.1093/biomet/asx025
  • Lavergne, P., & Patilea, V. (2013). Smooth minimum distance estimation and testing with conditional estimating equations: uniform in bandwidth theory. Journal of Econometrics, 177, 47–59. doi: 10.1016/j.jeconom.2013.05.006
  • Little, R., & Rubin, D. (2002). Statistical analysis with missing data. Wiley series in probability and mathematical statistics. Probability and mathematical statistics. John Wiley & Sons, Inc., Hoboken, New Jersey.
  • Müller, U. U. (2009). Estimating linear functionals in nonlinear regression with responses missing at random. The Annals of Statistics, 37, 2245–2277. doi: 10.1214/08-AOS642
  • Prokhorov, A., & Schmidt, P. (2009). GMM redundancy results for general missing data problems. Journal of Econometrics, 151, 47–55. doi: 10.1016/j.jeconom.2009.03.010
  • Robins, J. M., & Gill, R. D. (1997). Non-response models for the analysis of non-monotone ignorable missing data. Statistics in Medicine, 16, 39–56. doi: 10.1002/(SICI)1097-0258(19970115)16:1<39::AID-SIM535>3.0.CO;2-D
  • Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55. doi: 10.1093/biomet/70.1.41
  • Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592. doi: 10.1093/biomet/63.3.581
  • Tan, Z. (2011). Efficient restricted estimators for conditional mean models with missing data. Biometrika, 98, 663–684. doi: 10.1093/biomet/asr007
  • Tsiatis, A. (2007). Semiparametric theory and missing data. New York: Springer-Verlag.
  • van der Laan, M. J., & Robins, J. M. (2003). Unified methods for censored longitudinal data and causality. New York: Springer-Verlag.
  • Wang, D., & Chen, S. X. (2009). Empirical likelihood for estimating equations with missing values. The Annals of Statistics, 37, 490–517. doi: 10.1214/07-AOS585
  • Wei, Y., Ma, Y., & Carroll, R. J. (2012). Multiple imputation in quantile regression. Biometrika, 99, 423–438. doi: 10.1093/biomet/ass007
  • Wooldridge, J. M. (2007). Inverse probability weighted estimation for general missing data problems. Journal of Econometrics, 141, 1281–1301. doi: 10.1016/j.jeconom.2007.02.002

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.