591
Views
1
CrossRef citations to date
0
Altmetric
SHORT COMMUNICATIONS

Some issues on longitudinal data with nonignorable dropout, a discussion of “Statistical Inference for Nonignorable Missing-Data Problems: A Selective Review” by Niansheng Tang and Yuanyuan Ju

Pages 137-139 | Received 07 Sep 2018, Accepted 10 Sep 2018, Published online: 18 Sep 2018

We thank Tang and Ju for their review on statistical inference for univariate response data with nonignorable missing. In this paper, we mainly discuss some issues on longitudinal data with nonignorable dropout. In research areas such as medicine, population health, economics, social sciences and sample surveys, data are often collected from every sampled subject at T time points, which are referred to as longitudinal data. Let be a T dimensional vector of the study variable with distribution denoted by , and X be a q-dimensional time-independent continuous covariate associated with Y. Our interest is to estimate parameters in such as the mean vector . We consider the situation where X is always observed, but subjects may drop out prior to the end of the study, which results in incomplete Y data. Let be the vector of response indicators, where if is observed and if are not observed. Dropout is ignorable if the propensity is a function of the observed values (Little & Rubin, Citation2002), where is a generic notation for conditional distribution or density. Otherwise, dropout is nonignorable.

When missing data are ignorable dropout, Little (Citation1995) presented some well-established methods. However, in practice dropout is often nonignorable (Troxel, Harrington, & Lipsitz, Citation1998). In this case, for identifiability Wang, Qi, and Shao (Citation2018) assumed that with an instrument Z satisfying and that depends on Z. Furthermore, a parametric dropout propensity model is also imposed as follows, (1) where , is a row vector of unknown parameters, Ψ is a known monotone function defined on , is defined to be 1 and is defined to be U. For , define the following L=t+q estimating equations where and . If is the true value of , it can be verified that Let be an independent and identically distributed sample from , and be the tth components of and for , respectively, where the covariate vector is fully observed and the response is observed if and only if . A sample version of estimating equation is Wang et al. (Citation2018) applied the two-step generalised method of moments (GMM) to estimate the unknown parameter vector as follows: (2) where is the parameter space for , is the inverse of the matrix and . Then, the marginal mean of , , can be estimated by (3) where and is a consistent estimator of . In theory, the method proposed by Wang et al. (Citation2018) can handle the longitudinal data with nonignorable dropout. However, its performance may be hindered by the following three problems.

Optimal estimation The efficiency of proposed GMM estimators in (Equation2) depends on the choice of , which may not be optimal if we only use the first order moments of data. Other moments or characteristics of Z and may provide more information and, hence, result in more efficient GMM estimators. When Y is univariate, Morikawa and Kim (Citation2016) investigated the efficient estimation of the parameters in the propensity and derived the corresponding semiparametric efficiency bound. In addition, Morikawa, Kim, and Kano (Citation2017) proposed to improve the efficiency based on the semiparametric maximum likelihood approach. Recently, Ai, Linton, and Zhang (Citation2018) proposed a simple and efficient estimation method based on the GMM. If the number of moments increases appropriately, they showed that the GMM estimator can achieve the semiparametric efficiency bound derived in Morikawa and Kim (Citation2016), but under weaker regularity conditions. Motivated by Morikawa and Kim (Citation2016) or Morikawa et al. (Citation2017), we may propose two adaptive estimators for with a parametric working model or a nonparametric estimator for by estimating the efficient score functions, or extend the semiparametric maximum likelihood method by assuming a parametric form for only. On the other hand, as in Ai et al. (Citation2018), we can consider a set of known basis functions based on and then obtain the efficient GMM estimator.

Variable selection For longitudinal data, the dimension of is not low when t increases. If the unknown parameter vector is sparse, the unregularized two-step GMM estimator of may lose some efficiency. To obtain more efficient estimator, we can apply the penalised-GMM as follows (4) where is a penalty function, is a tuning parameter and is the jth element of . The penalty function is a nonnegative, nondecreasing and differentiable function on (Fan & Li, Citation2001; Zou, Citation2006). The tuning parameter determines the amount of shrinkage These properties ensure that the estimates of in (Equation4) can shrunk to zero if they are small. The corresponding covariates of the estimates that are zero are the insignificant predictors whereas the estimates that are not zero correspond to those which are statistically significant predictors.

Semiparametric model Note that the estimators in (Equation3) are not consistent unless the parametric model (Equation1) is correct. Since the parametric approach is sensitive to failure of the assumed models, we may consider a semiparametric propensity model (Kim & Yu, Citation2011) as follows: (5) where is a known function of with an unknown q-dimensional parameter vector and is a completely unspecified function of , . Semiparametric model (Equation5) encompasses a large class of dropout propensity models. As in Shao and Wang (Citation2016), we have (6) which is equivalent to Then, we can obtain the following kernel regression estimator for , (7) where , is a symmetric kernel function and h is a bandwidth. After is profiled by (Equation7), in (Equation5) can be obtained by two-step GMM. Therefore, we have a consistent estimator of , and the estimators of unknown quantities in or the marginal of Y can be obtained using the inverse propensity weighting with the estimated propensity as the weight function.

Disclosure statement

No potential conflict of interest was reported by the author.

Additional information

Funding

This research was also supported by the National Natural Science Foundation of China (11501208, 11871287) and Fundamental Research Funds for the Central Universities.

Notes on contributors

Lei Wang

Dr Lei Wang holds a PhD in statistics from East China Normal University. He is an assistant professor of statistics at Nankai University. His research interests include empirical likelihood and missing data problems.

References

  • Ai, C., Linton, O., & Zhang, Z. (2018). A simple and efficient estimation method for models with nonignorable missing data. ArXiv preprint arXiv:1801.04202.
  • Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96, 1348–1360. doi: 10.1198/016214501753382273
  • Kim, J. K., & Yu, C. L. (2011). A semiparametric estimation of mean functionals with nonignorable missing data. Journal of the American Statistical Association, 106, 157–165. doi: 10.1198/jasa.2011.tm10104
  • Little, R. J. A. (1995). Modeling the drop-out mechanism in repeated-measures studies. Journal of the American Statistical Association, 90, 1112–1121. doi: 10.1080/01621459.1995.10476615
  • Little, R. J. A., & Rubin, D. B. (2002). Statistical Analysis with Missing Data. 2nd ed. New York: Wiley.
  • Morikawa, K., & Kim, J. K. (2016). Semiparametric adaptive estimation with nonignorable nonresponse data. ArXiv preprint arXiv:1612.09207.
  • Morikawa, K., Kim, J. K., & Kano, Y. (2017). Semiparametric maximum likelihood estimation with data missing not at random. Canadian Journal of Statistics, 45, 393–409. doi: 10.1002/cjs.11340
  • Shao, J., & Wang, L. (2016). Semiparametric inverse propensity weighting for nonignorable missing data. Biometrika, 103, 175–187. doi: 10.1093/biomet/asv071
  • Troxel, A. B., Harrington, D. P., & Lipsitz, S. R. (1998). Analysis of longitudinal data with non-ignorable non-monotone missing values. Journal of the Royal Statistical Society: Series C (Applied Statistics), 47, 425–438. doi: 10.1111/1467-9876.00119
  • Wang, L., Qi, C., & Shao, J. (2018). Model-assisted regression estimators for longitudinal data with nonignorable dropout. International Statistical Review, to appear.
  • Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American statistical association, 101, 1418–1429. doi: 10.1198/016214506000000735

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.