We thank five discussants for their thoughtful comments. All have made significant contributions to the general theme raised in our paper. We will try our best to answer each of points that five discussers have made.
1. Response to Dr. Fang and Ni
We agree with the assessment of Fang and Ni that our work mainly focuses on low-dimensional data analysis and feature screening with missing data has not been well addressed. Fang and Ni presented several literatures on feature screening with response missing at random, and noted that there are two possible research topics for feature screening with missing data such as screening with non-ignorable missing response that is rather challenging, and screening with missing covariates.
For feature screening with missing categorical response Y at random that has R (R>2) classes , if we define
if Y is observed and
if Y is missing, and
that is a dependence measure between
for given
and continuous covariate X, where
, thus one can pursue feature screening by investigating the relationship between
and
. Also, similar to Cui, Li, and Zhong (Citation2015), when there is missing data, one can consider the following index
. When categorical response Y is missing at random, one can consider the relationship between
and
. When continuous response Y is missing at random, a quantile association-based index for measuring the dependence between Y and X can be developed to identify the important covariates. If continuous response Y is missing at random and X is continuous, one can use the following index
to measure the dependence between X and Y, where and
are the distribution functions of X and Y, respectively, and δ is the missing data indictor for Y, i.e.
if Y is observed and
if Y is missing.
For feature screening with missing covariates, Fang and Ni only considered a special case: response Y is binary and covariates are categorical, and proposed considering the relationship between and
, where the screening index ‘information value’ is defined as
in which X is a categorical covariate with values
, and
if the missing data indicator
(i.e. X is observed) and
if
(i.e. X is missing). In addition, they also presented an available case (AC) method and a two-step screening procedure. These methods are useful for screening important features under the considered case. However, when response Y is not binary but categorical or continuous, or covariate X is continuous, it is rather challenging to develop a new feature screening procedure with missing data.
On the other hand, their method may cause several problems in practical applications. First, the first step of their two-step screening procedure may lead to a biased estimator. Second, under some strong conditions, a relatively high proportion of missing data and unbalanced categorical data, it is rather difficult to guarantee the accuracy of estimation in the second step. To wit, it follows from that the condition probability
is strong condition, where
and
correspond to covariates with and without missing values, respectively,
is the missing data indicator for
is a small subset of
and
.
2. Response to Dr. Wang
Longitudinal data are commonly encountered in clinical trials, medicine and social sciences. In longitudinal studies, dropout data invariably occur, that is, some participants might dropout of the study or be lost to follow-up due to some reasons, which leads to the loss of their outcome measurements. In these cases, there are two types of dropout mechanisms: ignorable dropout (i.e. the probability that a participant dropouts the study only depends on the observed data) and non-ignorable dropout (i.e. the probability that a participant dropouts the study depends the missing values possibly except for the observed data). Generally, there are two types of dropout patterns: monotone and non-monotone dropout data. Non-monotone dropout data occur when study participants intermittently miss scheduled visits, while monotone dropout data can be from discontinued participation, loss to follow-up, and mortality.
Longitudinal data analysis with dropout data is not a new topic, and has been studied by many authors. For example, see Hedeker and Gibbons (Citation2006) and Tseng, Elashoff, Li, and Li (Citation2016). Recently, Wang, Qi, and Shao (Citation2018) discussed longitudinal data analysis with non-ignorable dropout by incorporating the idea of instrument variables and estimating equations and a parametric dropout propensity score, presented a two-step generalised method of moments (GMMs) to estimate unknown parameters in the considered parametric propensity score, and investigated asymptotic properties of the proposed GMM estimator. However, they did not consider dropout patterns, which may lead to new approaches for handling dropout data.
On the other hand, considering dropout patterns may improve the efficiency of GMM estimators given in Wang et al. (Citation2018) and solve the optimal estimation issue you concerned. Also, to address the optimal estimation issue, one can make use of all possible information included in responses and covariates and dropout study in constructing estimating equations. Wang et al. (Citation2018) considered a parametric propensity score model for each of non-ignorable dropout data by incorporating history response data
and time-independent covariates U. However, the considered parametric propensity score model has the following several problems. First, there are too many parameters, which may lead to the identifiable issue even if Wang et al. (Citation2018) considered instrument variables, when the sample size is small and T is relatively large. Second, it may lead to the well-known ill-posed issue when T is growing with the sample size n, i.e. a slow convergence rate in evaluating GMM estimates of parameters. Third, some of interactions of covariates may have a large effect on participant's dropout. Forth, it is impossible to test the plausibility of the posited parametric propensity score model. To address the aforementioned issues, one may consider a sequence of one-dimensional parametric propensity score models as done in Equation (19) of our paper, which can be used to construct estimating equations and select the important variables that have a large effect on participant's dropout via penalty method.
Although a semiparametric model can be adopted for dropout data analysis, there is the well-known ‘curve of dimensionality’ when T is large. In this case, one may consider an additive model for .
3. Response to Dr. Zhao
We agree with your valuable assessment that one intrinsic complication of missing data analysis is that it is quite difficult to verify its underlying truth in practical applications because of missing values involved, and it is interesting to develop versatile statistical procedures that are robustness to the misspecification of dropout mechanism model. In this discussion, you considered two types of missingness data mechanism models: conditional independence mechanism and statistical chromatography mechanism, and then discussed model identification, point estimation, hypothesis testing and high-dimensional variable selection. Conditional independence assumption for dropout data utilises the concept ofnon-response instrument, the resultant statistical inference on model parameter of interest can be carried out without the need of estimating missingness data mechanism, hence it has received a lot of attention in recent years. For example, see Zhao and Shao (Citation2015) and Zhao and Shao (Citation2017). But this method only utilises the observed data information when covariates serve as non-response instrument variables. Moreover, when the considered model is rather complicated, for example, our considered exponential family nonlinear structure equation models, Zhao and Shao (Citation2015) involved intractable high-dimensional integrals. Under non-ignorable dropout assumption, a possible improvement for Zhao and Shao's (Citation2015) method is to estimate unknown parameters in
via the conditional likelihood of
, where δ is the missing data indicator for response variable Y, i.e.
if Y is observed and 0 otherwise.
Statistical chromatography model for dropout data: is an unspecified missing data mechanism, and corresponds to various missing data mechanisms by taking different forms of
and
. Based on statistical chromatography model, using the idea of the conditional likelihood and decomposing the observed
's as its rank statistic and order statistic, considering the likelihood conditional on the order statistic, Liang and Qin (Citation2000) developed an approach to estimate model parameters in
. But Liang and Qin (Citation2000) only utilises the fully observed data, which indicates that their method may lead to biased estimator when missing proportion is high and may be not robust to the misspecification of the considered model. More importantly, their method requires specifying the distribution of response variable, which may limit its applications. In particular, when the considered model is quite complicated, for example, generalised linear model with non-canonical link function, or involves latent variables, such as our considered exponential family nonlinear structural equation models, this method requires handling high-dimensional integrals.
For generalised linear model with canonical link function, i.e. with
, one can obtain the estimation of
by minimising the objective function:
, where the first m subjects are fully observed without of generality, but it is impossible to obtain the estimation of dispersion parameter φ and intercept parameter α via the method of Liang and Qin (Citation2000). It is interesting to develop an efficient approach to estimate all the parameters in
, φ and α based on statistical chromatography mechanism.
4. Response to Dr. Morikawa and Professor Kim
We agree with Dr. Morikawa and Professor Kim's comment on the semiparametric estimation of mean functionals, which is a good supplement to our presented estimation procedure for mean functionals in the presence of non-ignorable missing data. Dr. Morikawa and Professor Kim proposed a new approach to restore Wilks' phenomenon in empirical likelihood inference withnon-ignorable missing values. A remarkable feature of Morikawa and Kim's method is that the resulting profile empirical log-likelihood ratio statistic can be directly used to construct confidence interval of parameter of interest. However, Morikawa and Kim's empirical likelihood approach has a very limited application scope because it only works for response mean or a single parameter. Let be a vector of parameters of interest, and
be the true value of
. Here
is uniquely defined via generalised estimating equations of the form
, where
is a vector of r (
) functions, variables
and
follow some unknown joint distribution
, and
represents the expectation taken with respect to
. Generalised estimating equations encompass a large class of statistical models. For example,
for mean functionals of responses, which is studied by Kim and Yu (Citation2001). Let δ be the missing data indicator, taking 1 if Y is observed and 0 if Y is missing. It is assumed that
are fully observed, and
is an instrument variable and missing data mechanism is specified by a semiparametric propensity score function
. Define
and
, where
is some consistent estimator of
. Following the idea of Morikawa and Kim and using the idea of Shao and Wang (Citation2016) introduced in the beginning of Section 3 as the calibration conditions, we define the following profile empirical log-likelihood ratio function for
:
An empirical log-likelihood ratio statistic for testing hypothesis
is defined as
Under some regularity conditions, it is easily shown that
asymptotically follows the chi-squared distribution with p degrees of freedom, which is a natural extension of the Wilks' theorem to a general parameter case.
Lazar (Citation2003) pointed out that empirical likelihood can also be used in a posterior inference in place of the parametric likelihood function in Bayes' theorem. Zhang and Tang (Citation2017) extended Bayesian empirical likelihood to quantile structural equation models. Given empirical likelihood function and specify a prior
for
, we obtain the quasi-posterior density
where
and
is a normalising constant such that
. Since it is quite easy to calculate the value of empirical log-likelihood function
given
and
, the implement ofMetropolis–Hastings algorithm is feasible for sampling observations required in making Bayesian inference on
from the posterior
. Bayesian empirical likelihood approach is a more flexible and effective tool in that it not only can calculate point estimates and confidence intervals but also allows incorporation of prior information, and can circumvent the inherent ‘curse of dimensionality’ in evaluating empirical likelihood estimators.
5. Response to Professor Shao
Professor Shao explores general assumptions for semiparametric non-ignorable propensity model and further leaves an open question on how to perform the nonparametric regression analysis in non-ignorable missing data problems when the semiparametric non-ignorable propensity is of a general form. The key to perform nonparametric regression estimation for non-ignorable missing data is to find a kernel type estimator of the nonparametric part in a semiparametric non-ignorable propensity. Motivated by the comments of Shao, we consider the following more general definition of semiparametric non-ignorable propensity.
Let . It can be shown that
(1) Assume that
,
is an arbitrary user-specified function. Such semiparametric conditional odds model also defines a semiparametric non-ignorable propensity
. It follows from (Equation1
(1) ) that the nonparametric function
can be profiled using the kernel regression method when the function
is decomposable and has of the forms:
and
, among others. Here
is an arbitrary known function. When
has of a general (implicit) form, developing a kernel regression estimation of semiparametric non-ignorable propensity is challenging, and thus remains an open problem as pointed out by Shao. Therefore, it deserves further research.
Acknowledgments
The authors are grateful to the Editor for organising the discussion of our paper and the discussants Dr. Fang Fang, Dr. Lyu Ni, Dr. Lei Wang, Dr. Jiwei Zhao, Dr. Kosuke Morikawa, Professor Jae Kwang Kim and Professor Jun Shao, seven leading figures on statistical inference for missing data, for their stimulating comments and insightful contributions to this work.
Disclosure statement
No potential conflict of interest was reported by the authors.
Additional information
Funding
References
- Cui, H., Li, R., & Zhong, W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110, 630–641. doi: 10.1080/01621459.2014.920256
- Hedeker, D., & Gibbons, R. D. (2006). Longitudinal data analysis. Hoboken, NJ: John Wiley & Sons.
- Kim, J. K., & Yu, C. L. (2011). A semiparametric estimation of mean functionals with nonignorable missing data. Journal of the American Statistical Association, 106, 157–165. doi: 10.1198/jasa.2011.tm10104
- Lazar, N. A. (2003). Bayesian empirical likelihood. Biometrika, 90, 319–326. doi: 10.1093/biomet/90.2.319
- Liang, K. Y., & Qin, J. (2000). Regression analysis under non-standard situations: A pairwise pseudolikelihood approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62, 773–786. doi: 10.1111/1467-9868.00263
- Shao, J., & Wang, L. (2016). Semiparametric inverse propensity weighting for nonignorable missing data. Biometrika, 103, 175–187. doi: 10.1093/biomet/asv071
- Tseng, C. H., Elashoff, R., Li, N., & Li, G. (2016). Longitudinal data analysis with non-ignorable missing data. Statistical Methods in Medical Research, 25, 205–220. doi: 10.1177/0962280212448721
- Wang, L., Qi, C., & Shao, J. (2018). Model-assisted regression estimators for longitudinal data with nonignorable dropout. International Statistical Review, to appear.
- Zhang, Y. Q., & Tang, N. S. (2017). Bayesian empirical likelihood estimation of quantile structural equation models. Journal of Systems Science and Complexity, 30, 122–138. doi: 10.1007/s11424-017-6254-x
- Zhao, J., & Shao, J. (2016). Semiparametric pseudo-likelihoods in generalized linear models with nonignorable missing data. Journal of the American Statistical Association, 110, 1577–1590. doi: 10.1080/01621459.2014.983234
- Zhao, J. W., & Shao, J. (2017). Approximate conditional likelihood for generalized linear models with general missing data mechanism. Journal of Systems Science and Complexity, 30, 139–153. doi: 10.1007/s11424-017-6188-3