217
Views
0
CrossRef citations to date
0
Altmetric
SHORT COMMUNICATIONS

Rejoinder: statistical inference for non-ignorable missing-data problems: a selective review

&
Pages 146-149 | Received 02 Oct 2018, Accepted 03 Oct 2018, Published online: 24 Oct 2018

We thank five discussants for their thoughtful comments. All have made significant contributions to the general theme raised in our paper. We will try our best to answer each of points that five discussers have made.

1. Response to Dr. Fang and Ni

We agree with the assessment of Fang and Ni that our work mainly focuses on low-dimensional data analysis and feature screening with missing data has not been well addressed. Fang and Ni presented several literatures on feature screening with response missing at random, and noted that there are two possible research topics for feature screening with missing data such as screening with non-ignorable missing response that is rather challenging, and screening with missing covariates.

For feature screening with missing categorical response Y at random that has R (R>2) classes , if we define if Y is observed and if Y is missing, and that is a dependence measure between for given and continuous covariate X, where , thus one can pursue feature screening by investigating the relationship between and . Also, similar to Cui, Li, and Zhong (Citation2015), when there is missing data, one can consider the following index . When categorical response Y is missing at random, one can consider the relationship between and . When continuous response Y is missing at random, a quantile association-based index for measuring the dependence between Y and X can be developed to identify the important covariates. If continuous response Y is missing at random and X is continuous, one can use the following index

to measure the dependence between X and Y, where and are the distribution functions of X and Y, respectively, and δ is the missing data indictor for Y, i.e. if Y is observed and if Y is missing.

For feature screening with missing covariates, Fang and Ni only considered a special case: response Y is binary and covariates are categorical, and proposed considering the relationship between and , where the screening index ‘information value’ is defined as in which X is a categorical covariate with values , and if the missing data indicator (i.e. X is observed) and if (i.e. X is missing). In addition, they also presented an available case (AC) method and a two-step screening procedure. These methods are useful for screening important features under the considered case. However, when response Y is not binary but categorical or continuous, or covariate X is continuous, it is rather challenging to develop a new feature screening procedure with missing data.

On the other hand, their method may cause several problems in practical applications. First, the first step of their two-step screening procedure may lead to a biased estimator. Second, under some strong conditions, a relatively high proportion of missing data and unbalanced categorical data, it is rather difficult to guarantee the accuracy of estimation in the second step. To wit, it follows from that the condition probability is strong condition, where and correspond to covariates with and without missing values, respectively, is the missing data indicator for is a small subset of and.

2. Response to Dr. Wang

Longitudinal data are commonly encountered in clinical trials, medicine and social sciences. In longitudinal studies, dropout data invariably occur, that is, some participants might dropout of the study or be lost to follow-up due to some reasons, which leads to the loss of their outcome measurements. In these cases, there are two types of dropout mechanisms: ignorable dropout (i.e. the probability that a participant dropouts the study only depends on the observed data) and non-ignorable dropout (i.e. the probability that a participant dropouts the study depends the missing values possibly except for the observed data). Generally, there are two types of dropout patterns: monotone and non-monotone dropout data. Non-monotone dropout data occur when study participants intermittently miss scheduled visits, while monotone dropout data can be from discontinued participation, loss to follow-up, and mortality.

Longitudinal data analysis with dropout data is not a new topic, and has been studied by many authors. For example, see Hedeker and Gibbons (Citation2006) and Tseng, Elashoff, Li, and Li (Citation2016). Recently, Wang, Qi, and Shao (Citation2018) discussed longitudinal data analysis with non-ignorable dropout by incorporating the idea of instrument variables and estimating equations and a parametric dropout propensity score, presented a two-step generalised method of moments (GMMs) to estimate unknown parameters in the considered parametric propensity score, and investigated asymptotic properties of the proposed GMM estimator. However, they did not consider dropout patterns, which may lead to new approaches for handling dropout data.

On the other hand, considering dropout patterns may improve the efficiency of GMM estimators given in Wang et al. (Citation2018) and solve the optimal estimation issue you concerned. Also, to address the optimal estimation issue, one can make use of all possible information included in responses and covariates and dropout study in constructing estimating equations. Wang et al. (Citation2018) considered a parametric propensity score model for each of non-ignorable dropout data by incorporating history response data and time-independent covariates U. However, the considered parametric propensity score model has the following several problems. First, there are too many parameters, which may lead to the identifiable issue even if Wang et al. (Citation2018) considered instrument variables, when the sample size is small and T is relatively large. Second, it may lead to the well-known ill-posed issue when T is growing with the sample size n, i.e. a slow convergence rate in evaluating GMM estimates of parameters. Third, some of interactions of covariates may have a large effect on participant's dropout. Forth, it is impossible to test the plausibility of the posited parametric propensity score model. To address the aforementioned issues, one may consider a sequence of one-dimensional parametric propensity score models as done in Equation (19) of our paper, which can be used to construct estimating equations and select the important variables that have a large effect on participant's dropout via penalty method.

Although a semiparametric model can be adopted for dropout data analysis, there is the well-known ‘curve of dimensionality’ when T is large. In this case, one may consider an additive model for .

3. Response to Dr. Zhao

We agree with your valuable assessment that one intrinsic complication of missing data analysis is that it is quite difficult to verify its underlying truth in practical applications because of missing values involved, and it is interesting to develop versatile statistical procedures that are robustness to the misspecification of dropout mechanism model. In this discussion, you considered two types of missingness data mechanism models: conditional independence mechanism and statistical chromatography mechanism, and then discussed model identification, point estimation, hypothesis testing and high-dimensional variable selection. Conditional independence assumption for dropout data utilises the concept ofnon-response instrument, the resultant statistical inference on model parameter of interest can be carried out without the need of estimating missingness data mechanism, hence it has received a lot of attention in recent years. For example, see Zhao and Shao (Citation2015) and Zhao and Shao (Citation2017). But this method only utilises the observed data information when covariates serve as non-response instrument variables. Moreover, when the considered model is rather complicated, for example, our considered exponential family nonlinear structure equation models, Zhao and Shao (Citation2015) involved intractable high-dimensional integrals. Under non-ignorable dropout assumption, a possible improvement for Zhao and Shao's (Citation2015) method is to estimate unknown parameters in via the conditional likelihood of , where δ is the missing data indicator for response variable Y, i.e. if Y is observed and 0 otherwise.

Statistical chromatography model for dropout data: is an unspecified missing data mechanism, and corresponds to various missing data mechanisms by taking different forms of and . Based on statistical chromatography model, using the idea of the conditional likelihood and decomposing the observed 's as its rank statistic and order statistic, considering the likelihood conditional on the order statistic, Liang and Qin (Citation2000) developed an approach to estimate model parameters in . But Liang and Qin (Citation2000) only utilises the fully observed data, which indicates that their method may lead to biased estimator when missing proportion is high and may be not robust to the misspecification of the considered model. More importantly, their method requires specifying the distribution of response variable, which may limit its applications. In particular, when the considered model is quite complicated, for example, generalised linear model with non-canonical link function, or involves latent variables, such as our considered exponential family nonlinear structural equation models, this method requires handling high-dimensional integrals.

For generalised linear model with canonical link function, i.e. with , one can obtain the estimation of by minimising the objective function: , where the first m subjects are fully observed without of generality, but it is impossible to obtain the estimation of dispersion parameter φ and intercept parameter α via the method of Liang and Qin (Citation2000). It is interesting to develop an efficient approach to estimate all the parameters in , φ and α based on statistical chromatography mechanism.

4. Response to Dr. Morikawa and Professor Kim

We agree with Dr. Morikawa and Professor Kim's comment on the semiparametric estimation of mean functionals, which is a good supplement to our presented estimation procedure for mean functionals in the presence of non-ignorable missing data. Dr. Morikawa and Professor Kim proposed a new approach to restore Wilks' phenomenon in empirical likelihood inference withnon-ignorable missing values. A remarkable feature of Morikawa and Kim's method is that the resulting profile empirical log-likelihood ratio statistic can be directly used to construct confidence interval of parameter of interest. However, Morikawa and Kim's empirical likelihood approach has a very limited application scope because it only works for response mean or a single parameter. Let be a vector of parameters of interest, and be the true value of . Here is uniquely defined via generalised estimating equations of the form , where is a vector of r () functions, variables and follow some unknown joint distribution , and represents the expectation taken with respect to. Generalised estimating equations encompass a large class of statistical models. For example, for mean functionals of responses, which is studied by Kim and Yu (Citation2001). Let δ be the missing data indicator, taking 1 if Y is observed and 0 if Y is missing. It is assumed that are fully observed, and is an instrument variable and missing data mechanism is specified by a semiparametric propensity score function . Define and , where is some consistent estimator of . Following the idea of Morikawa and Kim and using the idea of Shao and Wang (Citation2016) introduced in the beginning of Section 3 as the calibration conditions, we define the following profile empirical log-likelihood ratio function for : An empirical log-likelihood ratio statistic for testing hypothesis is defined as Under some regularity conditions, it is easily shown that asymptotically follows the chi-squared distribution with p degrees of freedom, which is a natural extension of the Wilks' theorem to a general parameter case.

Lazar (Citation2003) pointed out that empirical likelihood can also be used in a posterior inference in place of the parametric likelihood function in Bayes' theorem. Zhang and Tang (Citation2017) extended Bayesian empirical likelihood to quantile structural equation models. Given empirical likelihood function and specify a prior for , we obtain the quasi-posterior density where and is a normalising constant such that . Since it is quite easy to calculate the value of empirical log-likelihood function given and , the implement ofMetropolis–Hastings algorithm is feasible for sampling observations required in making Bayesian inference on from the posterior . Bayesian empirical likelihood approach is a more flexible and effective tool in that it not only can calculate point estimates and confidence intervals but also allows incorporation of prior information, and can circumvent the inherent ‘curse of dimensionality’ in evaluating empirical likelihood estimators.

5. Response to Professor Shao

Professor Shao explores general assumptions for semiparametric non-ignorable propensity model and further leaves an open question on how to perform the nonparametric regression analysis in non-ignorable missing data problems when the semiparametric non-ignorable propensity is of a general form. The key to perform nonparametric regression estimation for non-ignorable missing data is to find a kernel type estimator of the nonparametric part in a semiparametric non-ignorable propensity. Motivated by the comments of Shao, we consider the following more general definition of semiparametric non-ignorable propensity.

Let . It can be shown that (1) Assume that , is an arbitrary user-specified function. Such semiparametric conditional odds model also defines a semiparametric non-ignorable propensity . It follows from (Equation1) that the nonparametric function can be profiled using the kernel regression method when the function is decomposable and has of the forms: and , among others. Here is an arbitrary known function. When has of a general (implicit) form, developing a kernel regression estimation of semiparametric non-ignorable propensity is challenging, and thus remains an open problem as pointed out by Shao. Therefore, it deserves further research.

Acknowledgments

The authors are grateful to the Editor for organising the discussion of our paper and the discussants Dr. Fang Fang, Dr. Lyu Ni, Dr. Lei Wang, Dr. Jiwei Zhao, Dr. Kosuke Morikawa, Professor Jae Kwang Kim and Professor Jun Shao, seven leading figures on statistical inference for missing data, for their stimulating comments and insightful contributions to this work.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

This work was supported by the grants from the Key Projects of the National Natural Science Foundation of China [grant number 11731101] and the National Natural Science Foundation of China [grant number 11671349].

References

  • Cui, H., Li, R., & Zhong, W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110, 630–641. doi: 10.1080/01621459.2014.920256
  • Hedeker, D., & Gibbons, R. D. (2006). Longitudinal data analysis. Hoboken, NJ: John Wiley & Sons.
  • Kim, J. K., & Yu, C. L. (2011). A semiparametric estimation of mean functionals with nonignorable missing data. Journal of the American Statistical Association, 106, 157–165. doi: 10.1198/jasa.2011.tm10104
  • Lazar, N. A. (2003). Bayesian empirical likelihood. Biometrika, 90, 319–326. doi: 10.1093/biomet/90.2.319
  • Liang, K. Y., & Qin, J. (2000). Regression analysis under non-standard situations: A pairwise pseudolikelihood approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62, 773–786. doi: 10.1111/1467-9868.00263
  • Shao, J., & Wang, L. (2016). Semiparametric inverse propensity weighting for nonignorable missing data. Biometrika, 103, 175–187. doi: 10.1093/biomet/asv071
  • Tseng, C. H., Elashoff, R., Li, N., & Li, G. (2016). Longitudinal data analysis with non-ignorable missing data. Statistical Methods in Medical Research, 25, 205–220. doi: 10.1177/0962280212448721
  • Wang, L., Qi, C., & Shao, J. (2018). Model-assisted regression estimators for longitudinal data with nonignorable dropout. International Statistical Review, to appear.
  • Zhang, Y. Q., & Tang, N. S. (2017). Bayesian empirical likelihood estimation of quantile structural equation models. Journal of Systems Science and Complexity, 30, 122–138. doi: 10.1007/s11424-017-6254-x
  • Zhao, J., & Shao, J. (2016). Semiparametric pseudo-likelihoods in generalized linear models with nonignorable missing data. Journal of the American Statistical Association, 110, 1577–1590. doi: 10.1080/01621459.2014.983234
  • Zhao, J. W., & Shao, J. (2017). Approximate conditional likelihood for generalized linear models with general missing data mechanism. Journal of Systems Science and Complexity, 30, 139–153. doi: 10.1007/s11424-017-6188-3

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.