128
Views
0
CrossRef citations to date
0
Altmetric
ARTICLES

Discussion of the paper by R. L. Prentice and Y. Huang: Optimal designs and efficient inference for biomarker studies

Pages 21-22 | Received 28 May 2018, Accepted 23 Jun 2018, Published online: 16 Jul 2018

It gives me great pleasure to congratulate Drs. Prentice and Huang on an excellent, thought-provoking article on statistical issues and opportunities in nutritional epidemiology research. The authors addressed several important challenges in obtaining reliable information on dietary intake, such as random and systematic biases in self-reported dietary data and high cost of biomarker measurements. They also suggested ways to develop additional biomarkers and to improve statistical strategies. In this brief commentary, I will focus on a couple of questions posed in Section 3 of the article, namely, how to make efficient statistical inference when expensive biomarkers are measured only on a subset of cohort members and how to optimally select cohort members for biomarker measurements.

Let T denote the failure time, denote the set of expensive biomarkers, denote the set of inexpensive covariates that is potentially correlated with and denote the set of inexpensive covariates that is known to be independent of . We specify that the hazard function of T conditional on , and satisfies the proportional hazards model (Cox, Citation1972) (1) where is an unspecified baseline hazard function, and , and are unknown regression parameters.

The failure time T is subject to right censoring by C, such that we observe and Δ instead of T, where , , and is the indicator function. Let denote the set of cohort members who are selected for measurements of , and denote the complement of . The selection can depend on in any manner.

Write and , and let denote a conditional density function. For a subject in , the likelihood contribution is the density of ; for a subject in , the likelihood contribution is the density of . Thus, the log-likelihood function concerning , and takes the form Under the assumption that is independent of conditional on , (Kalbfleisch & Prentice, Citation2002, p. 54). If is independent of and conditional on , then is the product of and a function that does not involve or and thus can be factored out of the integral in the second term of the log-likelihood function.

We adopt nonparametric maximum likelihood estimation, under which both and are nonparametric. The estimation can be carried out through EM algorithms. The resulting estimators of and are consistent and asymptotically normal. In addition, the estimator of achieves the semiparametric efficiency bound. The interested readers are referred to Zeng and Lin (Citation2014,1) for details.

Drs. Prentice and Huang described case–cohort and nested case–control designs, which assume that all cases are selected. In large cohorts with relatively common diseases, it may not be economically feasible to measure biomarkers on all cases. Indeed, it is unclear whether or not cases should take precedence over controls or which cases and controls are the most informative. Lawless (Citation2018) suggested to select the cases with the smallest failure times and the controls with the largest censoring times. In addition, Borgan, Langholz, Samuelsen, Goldstein and Pagoda (Citation2000) stratified the selection of the subcohort in the case–cohort design on inexpensive covariates, and Langholz and Borgan (Citation1995) used inexpensive covariates to select ‘counter-matched’ controls at the failure time of each case.

In recent unpublished work, my colleagues Drs. Ran Tao and Donglin Zeng and I investigated the efficiency of such sampling designs. The design efficiency pertains to the semiparametric efficiency bound for estimating the regression coefficients of expensive covariates. We developed optimal designs that are the most efficient among all possible sampling designs. We found that the design suggested by Lawless (Citation2018) is optimal if there are no inexpensive covariates. In the presence of inexpensive covariates, a design that selects an equal number of subjects at the two extreme tails of martingale residuals in each stratum of inexpensive covariates is optimal and can be substantially more efficient than the existing sampling designs.

Disclosure statement

No potential conflict of interest was reported by the author.

Additional information

Funding

This work was supported by National Institutes of Health [R01GM047845, R01HG009974, and P01CA142538].

Notes on contributors

D. Y. Lin

D. Y. Lin is the Dennis Gillings Distinguished Professor of Biostatistics at the University of North Carolina at Chapel Hill. He is an internationally recognized expert on survival analysis, with many influential publications. One of this current research interests is efficient designs and analysis of two-phase studies. He is a fellow of IMS and ASA and an Associate Editor for Biometrika and JASA.

References

  • Borgan, Ø., Langholz, B., Samuelsen, S. O., Goldstein, L., & Pagoda, J. (2000). Exposure stratified case–cohort designs. Lifetime Data Analysis, 6, 39–58. doi: 10.1023/A:1009661900674
  • Cox, D. R. (1972). Regression models and life-tables (with discussion). Journal of the Royal Statistical Society, Series B, 34, 187–220.
  • Kalbfleisch, J. D., & Prentice, R. L. (2002). The statistical analysis of failure time data (2nd ed.). Hoboken: Wiley.
  • Langholz, B., & Borgan, Ø. (1995). Counter-matching: A stratified nested case–control sampling method. Biometrika, 82, 69–79. doi: 10.1093/biomet/82.1.69
  • Lawless, J. F. (2018). Two-phase outcome-dependent studies for failure times and testing for effects of expensive covariates. Liftetime Data Analysis, 24, 28–44. doi: 10.1007/s10985-016-9386-8
  • Zeng, D., & Lin, D. Y. (2014). Efficient estimation of semiparametric transformation models for two-phase cohort studies. Journal of the American Statistical Association, 109, 371–383. doi: 10.1080/01621459.2013.842172
  • Zeng, D., & Lin, D. Y. (2018). Maximum likelihood estimation for case–cohort and nested case–control studies. In Ø. Borgan, N. Breslow, N. Chatterjee, M. Gail, & A. Scott (Eds.), Handbook of statistical methods for case–control studies (Chapter 24). New York: Chapman and Hall Press.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.