198

Views

CrossRef citations to date

Altmetric

SHORT COMMUNICATIONS

Semiparametric propensity weighting for nonignorable nonresponse: a discussion of ‘Statistical inference for nonignorable missing data problems: a selective review’ by Niansheng Tang and Yuanyuan Ju

Jun ShaoSchool of Statistics, East China Normal University, Shanghai, People's Republic of China;Department of Statistics, University of Wisconsin-Madison, Madison, WI, USACorrespondence[email protected]
View further author information

Professors Tang and Ju deserve a warm congratulation for their great work on a review of statistical inference for nonignorable missing data problems. Although they called their review ‘a selective review’, it actually covers most of contemporary advances in the difficult problem of dealing with nonignorable missing data.

In Section 3.1 of Tang and Ju's review, they discussed the weighting approach in estimation with nonignorable nonresponse, which is one of the most popular and effective methods of handling nonresponse. The key in the weighting approach is the estimation of the unknown propensity, the probability of observing the value of a response variable conditional on the value of the response variable and associated covariate values, regardless of whether the response value is observed or not. Tang and Ju reviewed the early developments with parametric models on propensity (e.g., Lee & Tang, Citation2006; Qin, Leung, & Shao, Citation2002; Wang, Shao, & Kim, Citation2014) as well as the more recent advances on semiparametric propensity modelling (Kim & Yu, Citation2011; Shao & Wang, Citation2016). The purpose of this note is to add some results and discussions on semiparametric propensity estimation.

We start with some notation. Let be a univariate response variable of interest and be the associated multivariate covariate for the ith sampled unit, , where is observed if and is missing if , and is always observed. We assume that , , are independent and identically distributed. The propensity is defined to be the conditional probability . Since may be missing, this propensity or nonresponse mechanism is nonignorable, and it is ignorable if and only if .

A parametric model may be imposed on the propensity, but results derived under parametric models may be sensitive to the violations of parametric models, and thus, it is desired to make weaker assumptions. The following semiparametric model is assumed in Kim and Yu (Citation2011), (1) where γ is an unknown parameter and g is an unspecified (nonparametric) function. Note that, under assumption (Equation1(1) ), nonresponse is ignorable if and only if , in which case the ignorable propensity is nonparametric. Thus, assumption (Equation1(1) ) is better than any parametric assumption on propensity because if , any parametric assumption on propensity is unnecessary for handling ignorable missing data.

An extension to (Equation1(1) ) is (2) where is a parametric function and γ is a possibly multi-dimensional unknown parameter.

As shown in Shao and Wang (Citation2016), under either (Equation1(1) ) or (Equation2(2) ) the unknown g and γ are not identifiable. Some additional condition is needed to identify the unknown g and γ so that valid estimation and inference is possible. For example, Kim and Yu (Citation2011) assumed that γ is known or can be estimated externally. A more reasonable assumption is to assume that some components of can be excluded from the right-hand side of (Equation2(2) ). That is, can be decomposed into and such that (3) while , the part of the covariate vector not in the right-hand side of (Equation3(3) ), is still a useful covariate in the sense that the conditional distribution of given depends on . This idea was developed in Wang et al. (Citation2014), Zhao and Shao (Citation2015), and Shao and Wang (Citation2016), where they named the covariate to be a nonresponse instrument.

Following Tang and Ju's review and Shao and Wang (Citation2016), we can show that assumption (Equation3(3) ) implies that (4) and (5) where is a vector function of . Under suitable conditions on , asymptotically valid estimators of g and γ can be obtained based on (Equation4(4) ) and (Equation5(5) ), using either the method of generalised moments (Shao & Wang, Citation2016) or the empirical likelihood method in Tang and Ju's review (Section 3.2).

Once g and γ are estimated, the weighting approach using the inverse of propensity in (Equation3(3) ) with g and γ replaced by their estimators can be applied to estimate parameters of interest in the distribution of or the conditional distribution of given.

Now, consider changing assumption (Equation3(3) ) to (6) where g is nonparametric, is parametric, and both are between 0 and 1. Note that (Equation6(6) ) is a multiplicity model considered by Zhao and Shao (Citation2017). Under (Equation6(6) ),counterparts of (Equation4(4) ) and (Equation5(5) ) are, respectively, (7) and (8) Asymptotically valid estimators of g and γ can be obtained using (Equation7(7) ) and (Equation8(8) ) and similar techniques in Shao and Wang (Citation2016).

Alternatively, we may change (Equation3(3) ) to (9) where g is nonparametric, is parametric, and is between 0 and 1. Note that the difference between model (Equation3(3) ) and model (Equation9(9) ) is that the former has a multiplicity effect of and on propensity, whereas the latter has an additive effect of and . Under (Equation9(9) ), counter parts of (Equation4(4) ) and (Equation5(5) ) are, respectively, (10) and (11) Again, asymptotically valid estimators of g and γ can be obtained using (Equation10(10) ) and (Equation11(11) ) and similar techniques in Shao and Wang (Citation2016).

We conclude with the following question. What is a general assumption for which (Equation3(3) ), (Equation6(6) ) and (Equation9(9) ) are all special cases and results similar to (Equation4(4) ) and (Equation5(5) ) can be derived?

Consider (12) where g is nonparametric, is parametric, and is a two-dimensional known function. Note that (Equation3(3) ), (Equation6(6) ) and (Equation9(9) ) are all special cases of (Equation12(12) ). The previous results (Equation4(4) ), (Equation7(7) ) and (Equation10(10) ) are all derived based on Define (13) a function of and . If, for each fixed , ψ is a strictly monotone function of , then (Equation13(13) ) defines a possibly implicit function of and similar results may be derived.

Disclosure statement

No potential conflict of interest was reported by the author.

Additional information

Funding

This work was supported by the Chinese 111 Project, the Fundamental Research Funds for the Central Universities and the U.S. National Science Foundation (Directorate for Mathematical and Physical Sciences) grant DMS-1612873.

References

Kim, J. K., & Yu, C. L. (2011). A semiparametric estimation of mean functionals with nonignorable missing data. Journal of the American Statistical Association, 106, 157–165. doi: 10.1198/jasa.2011.tm10104
Web of Science ®Google Scholar
Lee, S. Y., & Tang, N. S. (2006). Bayesian analysis of nonlinear structural equation models with nonignorable missing data. Psychometrika, 71, 541–564. doi: 10.1007/s11336-006-1177-1
Web of Science ®Google Scholar
Qin, J., Leung, D., & Shao, J. (2002). Estimation with survey data under non-ignorable nonresponse or informative sampling. Journal of American Statistical Association, 97, 93–200. doi: 10.1198/016214502753479338
Google Scholar
Shao, J., & Wang, L. (2016). Semiparametric inverse propensity weighting for nonignorable missing data. Biometrika, 103, 175–187. doi: 10.1093/biomet/asv071
Web of Science ®Google Scholar
Wang, S., Shao, J., & Kim, J. (2014). An instrumental variable approach for identification and estimation with nonignorable nonresponse. Statistica Sinica, 24, 1097–1116.
Web of Science ®Google Scholar
Zhao, J., & Shao, J. (2015). Semiparametric pseudo likelihoods in generalized linear models with nonignorable missing data. Journal of American Statistical Association, 110, 1577–1590. doi: 10.1080/01621459.2014.983234
Web of Science ®Google Scholar
Zhao, J., & Shao, J. (2017). Approximate conditional likelihood for generalized linear models with general missing data mechanism. Journal of System Science and Complexity, 30, 139–153. doi: 10.1007/s11424-017-6188-3
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Semiparametric propensity weighting for nonignorable nonresponse: a discussion of ‘Statistical inference for nonignorable missing data problems: a selective review’ by Niansheng Tang and Yuanyuan Ju

Disclosure statement

References

Information for

Open access

Opportunities

Help and information

Semiparametric propensity weighting for nonignorable nonresponse: a discussion of ‘Statistical inference for nonignorable missing data problems: a selective review’ by Niansheng Tang and Yuanyuan Ju

Disclosure statement

Additional information

Funding

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date