![MathJax Logo](/templates/jsp/_style2/_tandf/pb2/images/math-jax.gif)
Abstract
In randomized clinical trials with right-censored time-to-event outcomes, the popular log-rank test without adjusting for baseline covariates is asymptotically valid for treatment effect under simple randomization of treatments but is too conservative under covariate-adaptive randomization. The stratified log-rank test, which adjusts baseline covariates in the test procedure by stratification, is asymptotically valid regardless of what treatment randomization is applied. In the literature, however, under simple randomization there is no affirmative conclusion about whether the stratified log-rank test is asymptotically more powerful than the unstratified log-rank test. In this article we show when the stratified and unstratified log-rank tests aim for the same null hypothesis and that, under simple randomization, the stratified log-rank test is asymptotically more powerful than the unstratified log-rank test in the region of alternative hypothesis that is specified by a Cox proportional hazards model. We also provide some discussion about why we do not have an affirmative conclusion in general.
1. Introduction
The log-rank test (Mantel, Citation1966) and stratified log-rank test (Peto et al., Citation1976) are the two longstanding and most popular nonparametric tests for treatment effect in randomized clinical trials with two treatment arms and right-censored time-to-event outcomes. What motivates the stratified version of log-rank test is that baseline prognostic factors (covariates), measured prior to treatment assignments and thus not affected by treatments, are adjusted through stratification for efficiency gain.
Adjusting baseline covariates has been widely advocated to improve efficiency for tests and other analyzes, in the following two aspects. (i) In the design stage, covariate-adaptive randomization can be used to enforce the balance of treatment assignments across baseline prognostic factors, which results in more efficient tests (EMA, Citation2015). More details about covariate-adaptive randomization are given in Section 2. (ii) In the analysis stage, ‘incorporating prognostic baseline factors in the primary statistical analysis of clinical trial data can result in a more efficient use of data to demonstrate and quantify the effects of treatment’ (FDA, Citation2021), ‘under approximately the same minimal statistical assumptions that would be needed for unadjusted’ (EMA, Citation2015; FDA, Citation2021; ICH E9, Citation1998).
If the log-rank test is considered as ‘unadjusted test’, then the stratified log-rank test qualifies as an adjusted test under the same minimal assumption because it is still a nonparametric test without using any model. Tests using the Cox proportional hazards model as a working model are also qualified (DiRienzo & Lagakos, Citation2002; Kong & Slud, Citation1997; Lin & Wei, Citation1989), but the resulting tests can be less efficient than the unadjusted log-rank test when the working model is wrong (Kong & Slud, Citation1997). In this paper we focus on the stratified and unstratified log-rank tests.
Although stratified log-rank test uses information from baseline prognostic factors and thus is expected to be more efficient, an affirmative conclusion about whether it is asymptotically more efficient than the unstratified log-rank test is not available, under simple randomization in which patients are assigned to treatments completely at random. Another issue is that the stratified log-rank actually tests a null hypothesis stronger than that of the log-rank test and, hence, a prerequisite in their comparison is to investigate when the two null hypotheses are the same.
The purpose of this paper is to establish some affirmative conclusions about the stratified and unstratified log-rank tests, in terms of null hypothesis, asymptotic validity of tests and Pitman's asymptotic relative efficiency. The research is important as these two longstanding tests are used a lot in applications without a guidance on which one should be used.
Section 2 describes data, design, and log-rank test statistics. Section 3 introduces hypotheses, assumptions and the concept of validity for log-rank tests. Some theoretical results for stratified and unstratified log-rank tests are given in Section 4, where we show that, under simple randomization, the stratified log-rank test is asymptotically more powerful in the region of alternative hypothesis that is specified by a Cox proportional hazards model. Section 5 contains conclusions and Appendix provides technical proofs.
2. Data, design and test statistics
For a patient from the population under investigation, let and
be the potential life time and right-censoring time, respectively, under treatment
or 1, and W be the vector of all baseline covariates and other time-varying covariates, observed or unobserved. Suppose that a random sample of n patients is obtained from the population with independent
,
, identically distributed as
. For each patient, only one of the two treatments is assigned and received.
Let be a binary treatment indicator for patient i and
be the pre-specified treatment assignment proportion for treatment 1. Consider the design, i.e. the generation of
's for n sequentially arrived patients. Simple randomization assigns patients to treatments completely at random with
for all i, which may yield treatment proportions that substantially deviate from the target π across levels of some baseline prognostic factors. Because of this, covariate-adaptive randomization using Z, a sub-vector of W containing observed baseline prognostic factors with finitely many joint levels, is widely applied. When patient i with baseline
is arrived, a treatment is assigned using a mechanism dependent on all previously assigned treatments for patients with
. For example, the most popular covariate-adaptive randomization scheme, the stratified permuted block design (Zelen, Citation1974), randomly assigns sequentially arrived patients with
in blocks of size B, each having
patients in treatment 1, where B is appropriately chosen so that
is an integer and the last block is allowed to be incomplete. Another popular covariate-adaptive randomization is Pocock-Simon's minimization (Pocock & Simon, Citation1975; Taves, Citation1974). Other schemes can be found in two reviews, Schulz and Grimes (Citation2002) and Shao (Citation2021). To see how popular covariate-adaptive randomization is, it was used in more than 500 clinical trials between 1989 and 2008 (Taves, Citation2010) and 237 trials among nearly 300 trials published in two years, 2009 and 2014 (Ciolino et al., Citation2019). All commonly used covariate-adaptive randomization schemes satisfy the following mild condition (Antognini & Zagoraiou, Citation2015).
Given
,
and
are conditionally independent;
for all i; and for every level z of Z,
in probability as
, where
is the number of patients with
and
is the number of patients with
and
.
Most commonly used covariate-adaptive randomization schemes except Pocock-Simon's minimization also satisfy the next condition.
Conditional on
, the vector whose zth component is
with z ranging over all levels of Z converges in distribution to
, where Ω is the diagonal matrix whose zth diagonal entry is
and
is a known constant depending on the randomization scheme.
Although simple randomization is not counted as covariate-adaptive randomization, it satisfies (D1) and (D2) with .
After is assigned, the observed outcome from patient i is
with
and
, together with an indicator of
.
The log-rank test statistic is (1)
(1) where
,
,
the indicator of the event
,
,
,
,
is the indicator of the event
, and the upper limit τ in the integral is a point satisfying
for j = 0, 1.
The stratified log-rank test statistic is a weighted average of the stratum-specific log-rank test statistics with strata constructed using Z, (2)
(2) where
,
, and
.
It is clear that in terms of test statistics, the stratified in (Equation2
(2)
(2) ) utilizes Z values whereas the unstratified
in (Equation1
(1)
(1) ) is unadjusted. Under covariate-adaptive randomization,
is not completely unadjusted since it uses Z-information through assignments
's, although it does not adjust for covariate-adaptive randomization in a correct way. On the other hand, the stratified
uses Z-information in both design and analysis stages.
We consider stratification with all levels of Z. In applications, it is allowed to use more covariates to form strata. The conclusions in what follows remain the same. However, it is not a good idea to use fewer levels of Z for stratification, because it may result in a test that is not asymptotically valid.
3. Null hypothesis, assumption and validity
Throughout, denotes a given significance level and
is the
th quantile of the standard normal distribution. When
, the log-rank test rejects the following null hypothesis
of no treatment effect,
(3)
(3) where
is the unconditional hazard function of
, j = 0, 1.
in (Equation3
(3)
(3) ) is a commonly adopted null hypothesis of no treatment effect unconditional on covariates.
The log-rank test is nonparametric. Its validity requires non-informative censoring (DiRienzo & Lagakos, Citation2002; Kong & Slud, Citation1997), i.e.,
(C) is independent of
given j.
Under simple randomization, it is well-known (Kalbfleisch & Prentice, Citation2011) that the log-rank test is asymptotically valid in the sense that
(4)
(4) with equality holding for at least one population P under
.
Unlike simple randomization, covariate-adaptive randomization generates a dependent sequence of treatment assignments, which may render conventional methods developed under simple randomization, such as the log-rank test, not valid under covariate-adaptive randomization (EMA, Citation2015; FDA, Citation2021). It is shown in Ye and Shao (Citation2020) that, under covariate-adaptive randomization with ν in (D2) strictly smaller than , the log-rank test is asymptotically conservative in the sense that,
(5)
(5) for all P under
.
The stratified log-rank in (Equation2
(2)
(2) ) actually tests the null hypothesis
(6)
(6) where
is the hazard function of
conditional on Z = z, j = 0, 1. Note that
in (Equation6
(6)
(6) ) holds if and only if the hazard functions are the same in every stratum z and, thus, is stronger than
in (Equation3
(3)
(3) ).
The validity of stratified log-rank test requires the following assumption on censoring:
(CZ) is independent of
given j and Z.
Conditions (C) and (CZ) are not comparable, although both are implied by that is independent of
given j, a reasonable condition for non-informative censoring.
Under simple randomization and covariate-adaptive randomization satisfying (D1) in Section 2, (Equation4(4)
(4) ) holds with
replaced by
and
replaced by
(Ye & Shao, Citation2020), provided that all levels of Z are used in stratification.
Since is stronger than
, the stratified and unstratified log-rank tests are not comparable. Thus, a prerequisite for the comparison of efficiency of two log-rank tests is
. Is there a scenario under which
? Consider the following transformation model assumption.
There is an increasing function h such that
for all
and a constant θ, where V is a vector of covariates,
, and both h and θ can be unknown.
Assumption (TR) is discussed in Cheng et al. (Citation1995), which includes many commonly used semiparametric models as special cases, for example, the Cox proportional hazards model (see formula (Equation7(7)
(7) ) in Section 4). It is a mild assumption since h is unknown and we only need to know it exists.
The proof of following result is in the Appendix.
Theorem 3.1
Under (TR), in (Equation6
(6)
(6) ) is the same as
in (Equation3
(3)
(3) ).
4. Comparison of two log-rank tests
When , is the stratified log-rank test
more efficient than the unstratified log-rank test
under simple randomization when both tests are asymptotic valid? Intuitively this sounds correct since
does not adjust for covariates.
Unfortunately, there is no result on this in the literature. In this section we try to fill this gap to some extent and explain why the two log-rank tests are not comparable in terms of efficiency. This is important because both stratified and unstratified log-rank tests are used a lot in applications.
To this goal, we first state the following asymptotic result (whose proof is given in Appendix) for the asymptotic distributions of stratified and unstratified log-rank tests under local alternatives. Define
where
,
,
, and
. Also, we use
to denote
for any i and
to denote
for any i and z. Note that, under the null hypothesis
,
for j = 0, 1, and under the null hypothesis
,
for all z and j = 0, 1.
Theorem 4.1
Assume (CZ) and (D1). Under the local alternative hypothesis that
with
's not depending on n and that
is bounded and
for every t and z,
, where
denotes convergence in distribution as
,
, and
denotes variance under
.
Assume (C), (D1), and (D2). Under the local alternative hypothesis that
with
's not depending on n and that
is bounded and
for every t,
, where
,
,
for ν given in (D2), and
and
denote expectation and variance under
, respectively.
Because the local alternative hypotheses specified in (a) and (b) of Theorem 4.1 do not follow any model, and
can be arbitrarily very different and, thus,
and
may be not comparable in terms of asymptotic efficiency. In other words, the space of alternative hypothesis is too large to compare efficiency of
and
, as there is no model at all. A semiparametric model on alternative hypothesis narrowing down the space of alternative hypothesis may result in affirmative results of comparing efficiency. We derive a result under the Cox proportional hazards model to highlight this.
Suppose that the true hazard follows a Cox proportional hazards model,
(7)
(7) where
,
is the hazard conditional on covariate V, θ is an unknown parameter, η is an unknown parameter vector, and
is an unspecified function. Under model (Equation7
(7)
(7) ), (TR) holds with
and
.
Corollary 4.1
Assume that model (Equation7(7)
(7) ) holds,
is independent of
given j, and
for all t. Then, under simple randomization, the stratified log-rank test
is always more efficient than the unstratified log-rank test
in terms of Pitman's asymptotic relative efficiency.
The proof is given in the Appendix. A key to the proof is that the local alternative hypotheses in (a) and (b) of Theorem 4.1 can be unified into with the help of model (Equation7
(7)
(7) ).
As both log-rank tests are nonparametric and do not need model (Equation7(7)
(7) ), what does Corollary 4.1 tell us? It says that, under simple randomization, the stratified log-rank test
is more efficient in the region of alternative hypothesis specified by model (Equation7
(7)
(7) ), although we cannot claim that
is more efficient in the entire alternative hypothesis space.
We now turn to covariate-adaptive randomization, under which the unstratified log-rank test is not valid but conservative, as we discussed in Section 3. On the other hand, by Theorem 4.1(a), the stratified log-rank test
is valid for testing
regardless of which covariate-adaptive randomization is applied. Therefore, stratified log-rank test is a clear winner when covariate-adaptive randomization is applied.
Another way to adjust for covariates used in randomization is the modified (unstratified) log-rank test proposed by Ye and Shao (Citation2020), where
is a consistent estimator of
(see §3.2 of Ye and Shao Citation2020).
removes the conservativeness of
and is valid for testing
in (Equation3
(3)
(3) ) under covariate-adaptive randomization.
Even if model (Equation7(7)
(7) ) holds,
and
are not comparable in terms of asymptotic efficiency. We provide two simulation examples here to demonstrate that
is more efficient in one scenario but less efficient in another scenario, compared with
. The simulation setting is model (Equation7
(7)
(7) ) with
for all t and
, where
is binary with
,
, and
and
are independent.
and discretized
with 4 equal probability categories are used for stratified permuted block randomization with block size 4. In scenario 1, censoring is independent of treatment and
and distributed as uniform on (10,40). In scenario 2, censoring is independent of treatment and
, but conditioned on
, and censoring is distributed as 10 + the exponential distribution with mean
. The power curves over θ with
and n = 500 based on 2000 simulations are given in Figure . Note that
is more powerful than
under scenario 1 but less powerful under scenario 2. Both
and
are more powerful than the conservative
in any case.
The reason why the stratified and the modified unstratified
are not comparable in asymptotic efficiency is that the two tests adopt different approaches in utilizing baseline covariates: the former adjusts baseline covariates by stratification, whereas the latter utilizes baseline covariates by modifying the unstratified
whose performance is affected by covariate-adaptive randomization.
5. Conclusion and discussion
Under some semiparametric models for survival time such as the transformation model (TR) described in Section 3, the null hypotheses of stratified and unstratified log-rank tests are the same.
Under simple randomization of treatment assignments, the stratified log-rank test is asymptotically more efficient than the unstratified log-rank test in terms of Pitman's relative efficiency in the region of alternative hypothesis specified by the Cox proportional hazards model given by (Equation7
(7)
(7) ). It is of interest to derive more affirmative results using assumptions/models other than the Cox model to narrow down the space of alternative hypothesis.
Under covariate-adaptive randomization of treatment assignments, the unstratified log-rank test is not asymptotically valid but conservative, whereas the stratified log-rank test is asymptotically valid as long as the covariates used in randomization are all included in stratification. Thus, the stratified log-rank test is a clear winner. A modified unstratified log-rank test removes conservativeness and is valid, but its relative efficiency compared with the stratified log-rank test has no definite conclusion, because the two tests apply different approaches in utilizing covariates.
Because the region specified by the Cox model is quite large and the stratified log-rank test is a clear winner under covariate-adaptive randomization, we recommend the stratified log-rank test over the unstratified log-rank test.
Acknowledgements
We would like to thank two anonymous referees and an associate editor for helpful comments and suggestions.
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- Antognini, A. B., & Zagoraiou, M. (2015). On the almost sure convergence of adaptive allocation procedures. Bernoulli Journal, 21(2), 881–908.
- Cheng, S., Wei, L., & Ying, Z. (1995). Analysis of transformation models with censored data. Biometrika, 82(4), 835–845. https://doi.org/10.1093/biomet/82.4.835
- Ciolino, J. D., Palac, H. L., Yang, A., Vaca, M., & Belli, H. M. (2019). Ideal vs. real: A systematic review on handling covariates in randomized controlled trials. BMC Medical Research Methodology, 19(1), 136. https://doi.org/10.1186/s12874-019-0787-8
- DiRienzo, A. G., & Lagakos, S. W. (2002). Effects of model misspecification on tests of no randomized treatment effect arising from cox's proportional hazards model. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(4), 745–757. https://doi.org/10.1111/1467-9868.00310
- EMA (2015). Guideline on adjustment for baseline covariates in clinical trials. Committee for Medicinal Products for Human Use, European Medicines Agency (EMA).
- FDA (2021). Adjusting for covariates in randomized clinical trials for drugs and biological products. Draft Guidance for Industry. Center for Drug Evaluation and Research and Center for Biologics Evaluation and Research, Food and Drug Administration (FDA), U.S. Department of Health and Human Services. May 2021.
- ICH E9 (1998). Statistical principles for clinical trials E9. International Council for Harmonisation (ICH).
- Kalbfleisch, J. D., & Prentice, R. L. (2011). The statistical analysis of failure time data. Wiley.
- Kong, F. H., & Slud, E. (1997). Robust covariate-adjusted logrank tests. Biometrika, 84(4), 847–862. https://doi.org/10.1093/biomet/84.4.847
- Lin, D. Y., & Wei, L. J. (1989). The robust inference for the cox proportional hazards model. Journal of the American Statistical Association, 84(408), 1074–1078. https://doi.org/10.1080/01621459.1989.10478874
- Mantel, N. (1966). Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemotherapy Reports, 50(3), 163–170.
- Peto, R., Pike, M. C., Armitage, P., Breslow, N. E., Cox, D. R., Howard, S. V., Mantel, N., McPherson, K., Peto, J., & Smith, P. G. (1976). Design and analysis of randomized clinical trials requiring prolonged observation of each patient. i. introduction and design. British Journal of Cancer, 34(6), 585–612. https://doi.org/10.1038/bjc.1976.220
- Pocock, S. J., & Simon, R. (1975). Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial. Biometrics, 31(1), 103–115. https://doi.org/10.2307/2529712
- Schulz, K. F., & Grimes, D. A. (2002). Generation of allocation sequences in randomised trials: Chance, not choice. The Lancet, 359(9305), 515–519. https://doi.org/10.1016/S0140-6736(02)07683-3
- Shao, J. (2021). Inference for covariate-adaptive randomization: Aspects of methodology and theory (with discussions). Statistical Theory and Related Fields, 5(3), 172–186. https://doi.org/10.1080/24754269.2021.1871873
- Taves, D. R. (1974). Minimization: A new method of assigning patients to treatment and control groups. Clinical Pharmacology and Therapeutics, 15(5), 443–453. https://doi.org/10.1002/cpt.1974.15.issue-5
- Taves, D. R. (2010). The use of minimization in clinical trials. Contemporary Clinical Trials, 31(2), 180–184. https://doi.org/10.1016/j.cct.2009.12.005
- Ye, T., & Shao, J. (2020). Robust tests for treatment effect in survival analysis under covariate-adaptive randomization. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(5), 1301–1323. https://doi.org/10.1111/rssb.12392
- Ye, T., Shao, J., Yi, Y., & Zhao, Q. (2022). Toward better practice of covariate adjustment in analyzing randomized clinical trials. Journal of the American Statistical Association.
- Zelen, M. (1974). The randomization and stratification of patients to clinical trials. Journal of Chronic Diseases, 27(7-8), 365–375. https://doi.org/10.1016/0021-9681(74)90015-0
Appendix
A.1. Proof of Theorem 3.1
It is clear that in (Equation6
(6)
(6) ) implies
in (Equation3
(3)
(3) ). Thus, it suffices to show that, under (TR),
for all t implies
for all
. Define
and
. If
for all t, then
for all t. Condition (TR) implies that
Then
i.e.,
Since
or
depending on whether
or
,
This implies that
and, thus,
for all
, which together with
imply that
and hence
for all
.
A.2. Proof of Theorem 4.1
We prove (a) only, since the proof of (b) is similar. We first show that, under the null hypothesis or alternative hypothesis,
(A1)
(A1) where
the number of patients with treatment j in stratum z,
, j = 0, 1, and
Following the argument in the Appendix of Lin and Wei (Citation1989), we obtain that, under either the null or alternative hypothesis, the left hand side of (EquationA1
(A1)
(A1) ) is equal to
(A2)
(A2) where
denotes a quantity converging to 0 in probability as
. Define
and
. Similar to the proof of Theorem 2 in Ye et al. (Citation2022), the Lindeberg's Central Limit Theorem justifies that, conditioned on
and
, the random vector
converges in distribution to a 2-dimensional normal distribution with mean 0, conditional on
and
. Let M be the quantity in (EquationA2
(A2)
(A2) ) excluding
, which is the sum of two components of the previous random vector. Consequently,
Under (D1),
Then
unconditionally. Thus, by Slutsky's theorem, (EquationA1
(A1)
(A1) ) holds.
Next, under the local alternative specified in part (a), and
Hence, by (EquationA1
(A1)
(A1) ) and Slutsky's theorem,
It remains to show that
, under the specified local alternative. By Lemma 3 of Ye and Shao (Citation2020), within any stratum z,
. By the identity
from Kalbfleisch and Prentice (Citation2011) and the form of
, we obtain that, under the specified local alternative,
A.3. Proof of Corollary 4.1
A direct calculation shows that
where
and
are given in Theorem 4.1,
,
, and
denotes expectation under
.
Under the local alternative hypothesis with a fixed constant
, by Theorem 4.1,
, where
and
, where
Pitman's asymptotic relative efficiency of
with respect to
is
.
Applying Jensen's inequality with convex function
and
,
, we obtain that
. To reach the conclusion
, it remains to show that
.
The condition for all t implies that
and, hence,
Thus, it suffices to show
(A3)
(A3) Note that
where
is the expectation with respect to covariate
and is not depending on θ. Taking the derivative with respect to θ, we obtain that
Then,
which is the same as the left-hand side of (EquationA3
(A3)
(A3) ). As
is the probability of having an observed failure before time τ, it is a non-decreasing function of θ. This implies that (EquationA3
(A3)
(A3) ) holds.