ABSTRACT
Covariate-adaptive randomisation has a long history of applications in clinical trials. Shao, Yu, and Zhong [(2010). A theory for testing hypotheses under covariate-adaptive randomization. Biometrika, 97, 347–360] and Shao and Yu [(2013). Validity of tests under covariate-adaptive biased coin randomization and generalized linear models. Biometrics, 69, 960–969] showed that the simple t-test is conservative under covariate-adaptive biased coin (CABC) randomisation in terms of type I error, and proposed a valid test using the bootstrap. Under a general additive model with CABC randomisation, we construct a calibrated t-test that shares the same property as the bootstrap method in Shao et al. (2010), but do not need large computation required by the bootstrap method. Some simulation results are presented to show the finite sample performance of the calibrated t-test.
1. Introduction
In clinical trials and medical studies, patients arrive sequentially and must be treated immediately. When two treatments are compared under simple randomisation (SR), patients are allocated randomly into two treatment groups. The statistical inference may suffer from the disadvantage of not balancing patients' prognostic factors such as the age category, gender, disease stage, prior chemotherapy and geographical region that may influence the outcomes, although simple randomisation still produces valid statistical tests. Various randomisation methods have been proposed in the literature and they have advantages such as minimising imbalance between treatment groups, reducing selection bias, minimising accidental bias and improving efficiency in inference; see, for example, Efron (Citation1971), Taves (Citation1974), Pocock Simon (Citation1975), Kalish Begg (Citation1985), Aickin (Citation2001), Weir Lees (Citation2003), Shao, Yu, Zhong (Citation2010), Shao Yu (Citation2013) and Ma, Hu, Zhang (Citation2015). A common characteristic of these methods is the use of a randomised treatment allocation that depends on covariates or prognostic factors but is conditionally independent of the outcomes given the covariates used in randomisation. Thus, they are called covariate-adaptive randomisation methods. The current paper focuses on one such method that applies the biased coin method (Efron, Citation1971) to patients grouped by prognostic factors, which is referred to as the covariate-adaptive biased coin (CABC) method by Shao et al. (Citation2010). Similar results can be obtained for the minimisation procedure (Pocock & Simon, Citation1975; Taves, Citation1974) and the stratified block randomisation (Kalish & Begg, Citation1985), which together with the CABC are the most popular covariate-adaptive randomisation methods in clinical trials.
For any given randomisation method, statistical tests valid under the particular randomisation scheme should be used for testing the possible treatment effect. A statistical test is said to be valid if the type I error rate of the test is at most α, a given significance level, at least in the limiting case when the total sample size increases to infinity. The validity of various statistical tests under SR has been extensively studied in the statistical literature. For covariate-adaptive randomisation, however, there only exist a few theoretical results about the validity of statistical tests (e.g. Ma et al., Citation2015; Shao et al., Citation2010 and Shao & Yu, Citation2013), although covariate-adaptive randomisation has been used in clinical trials for a long time and there are many empirical results regarding properties of tests under covariate-adaptive randomisation (e.g. Aickin, Citation2002; Brikett, Citation1985; Forsythe, Citation1987; Hagino et al., Citation2004; Weir & Lees, Citation2003). As Rosenberger and Sverdlov (2008, Section 4) pointed out in their review, ‘Very little theoretical work has been done in this area, despite the proliferation of papers. The original source papers are fairly uninformative about theoretical properties of the procedures’.
Under linear and generalised linear models, Shao et al. (Citation2010) and Shao and Yu (Citation2013), respectively, derived valid tests for comparing two treatments under CABC. Their tests are based on a modification of the tests developed under SR, where the modification is to apply a bootstrap variance estimation method that has a CABC component to address the variation in CABC randomisation. This bootstrap test was shown to be valid asymptotically and robust against misspecification of model and link function.
The purpose of this paper is to show that we can construct an asymptotically valid test under CABC without using the bootstrap by directly providing a consistent variance estimator in a general additive model that includes both linear and generalised linear models as special cases. The new test shares the robustness property with the bootstrap, but does not need the large computation required by the bootstrap. The same idea can be applied to the other two popular covariate-adaptive randomisation methods in clinical trials, the minimisation and stratified block randomisation.
2. Notation and preliminaries
Let N be the number of patients under two treatments, be the treatment indicator that equals j if patient i is assigned to treatment j, j=0,1 and
be the outcome of patient i under treatment j. For patient i,
is observed. Associated with patient i, let
be a vector of covariates and prognostic factors and
be a function of
used in CABC, where
is discrete with values
,
, and K is a fixed integer
. We assume that
,
, are independent and identically distributed random vectors from some distribution.
Under SR, 's are independent with
for all i and are independent of
. With a fixed constant
, the biased coin method in Efron (Citation1971) assigns the ith patient according to
, where
and
is the difference between the number of patients in treatment 1 and the number of patients in treatment 0 after i−1 assignments have been made. This assignment rule tends to achieve balance between the numbers of patients in two treatment groups, since
and
is an imbalance metric. The CABC method applies the biased coin within each category of patients with
,
. The motivation is to achieve balance between treatment groups for each prognostic factor. A characteristic of CABC, which is common for all covariate-adaptive randomisation methods, is that
's and
's are conditionally independent given
's, although unconditionally
's and
's are dependent.
A statistical test T is a function of observed ,
, constructed such that we reject a given null hypothesis
if and only if
, where α is a given significance level and
is a quantile of the standard normal distribution or a t-distribution. T is said to be (asymptotically) valid if, when
holds,
(1) with equality holds for at least some cases.
One of the main results in Shao et al. (Citation2010), followed by Shao and Yu (Citation2013), is that if a test T is constructed using covariates 's under a correctly specified model between
and
, and T is valid according to Equation (Equation1
(1) ) under SR, then T is still valid under CABC.
However, there are practical considerations under which some covariates are not included in the construction of the test T. For example, including all covariates may lead to changing a simple test procedure to a complicated one, such as from one-way analysis of variance to two-way analysis of variance; data in some discrete covariate categories may be sparse so that including these covariates may result in some bad behaviour of the test. When is not included in the construction of T and CABC is used, the result in Shao et al. (Citation2010) indicates that the test is conservative in the sense that
with a fixed
. The reason for this is that typically T is a ratio of an estimated effect
under SR divided by the standard error of
; although
is still asymptotically valid under CABC, the standard error of
valid under SR overestimates that under CABC.
To obtain a valid test under CABC, it suffices to derive a standard error of that is asymptotically consistent, or equivalently a consistent variance estimator of
. Shao et al. (Citation2010) proposed a bootstrap variance estimator with a re-assigning treatment indicators in bootstrapping. This bootstrap method, however, requires a large amount of computation.
3. The main result
We consider the following general additive model:
(2) where
is an unknown function satisfying
and
, and
is the response mean under treatment j=0,1. We consider either the two-sided hypotheses
versus
, or the one-sided hypotheses
versus
.
The two sample t-test is
(3) or the absolute value of
, where
and
are, respectively, the numbers of patients in treatment groups 1 and 0, and
and
are, respectively, the sample mean and sample variance within treatment j.
Suppose that CABC is applied within each group formed by , which is a discrete function of
taking values
with a fixed
. As proved in Shao et al. (Citation2010),
is conservative under CABC because CABC does not introduce any bias and the variance estimator
in
Equation (Equation3
(3) ) does not account for the correlation between
and
. They then suggested applying a particular bootstrap method to construct a consistent variance estimator of
under CABC, which leads to a valid bootstrap t-test, denoted as
.
Explicitly, as shown in the appendix, under model (Equation2(2) ) and CABC,
(4) where
is convergence in distribution,
(5) and
. An interesting observation is that, under model (Equation2
(2) ) and the null hypothesis,
(6) which can be consistently estimated by
(7) where
is the sample variance of
within
and
is the number of subjects in the data set with
,
. The proof is given in the appendix. This alternative way of obtaining a consistent variance estimator is not only computationally easy but also robust against any model misspecification. The two sample t-test with variance estimated by (Equation7
(7) ) is
(8) which is named as a calibrated t-test.
Consider the following working model,
(9) This model is a special case of model (Equation2
(2) ) but it is not necessary correct. Wald's test statistic under SR is
(10) where
and
are, respectively, the sample mean and sample variance based on
's under treatment j, and
is the least square estimator of β assuming model (Equation9
(9) ). As shown in Shao Yu (Citation2013), under CABC and model (Equation2
(2) ),
(11) and
(12) Under model (Equation2
(2) ) and CABC,
(13) where
(14) Since
in Equation (Equation14
(14) ) and
in Equation (Equation5
(5) ) are related by
(15) results (Equation12
(12) )–(Equation15
(15) ) show that Wald's test
is conservative under CABC unless
is a constant, i.e.
is independent of
. Thus, Wald's test
is not valid in the sense of Equation (Equation1
(1) ), unless the working model (Equation9
(9) ) is a correct model.
If we borrow the idea of consistently estimating the variance of under
, a calibrated Wald's test can also be constructed as
(16) which is valid and asymptotically equivalent to its counterpart
in Equation (Equation8
(8) ).
This calibrated variance idea can also be extended to the case where working model (Equation9(9) ) is replaced by a more complicated one.
4. Simulation results
4.1. Linear model
A simulation study was carried out to examine the type I error of the calibrated t-test and Wald's test
under CABC along with five other tests: the two sample t-test
under
SR, Wald's test
under SR, the two sample t-test under CABC, Wald's test under CABC and the bootstrap t-test
under CABC.
In the simulation study, the significance level is ;
is
; the probability p in CABC is 2/3; the sample size N is 200; the bootstrap variance estimator
is approximated by Monte Carlo with B=200; and the simulated type I error and power are based on 10,000 runs and 2000 runs, respectively. The simulation setting is
, where
and
are both binary with
. Both
and
are used in the CABC and in the construction of Wald's test, but the interaction term is ignored in the construction of Wald's test.
The simulation results and values of are shown in Table . A few conclusions from Table are:
The two sample t-test
and Wald's test
derived under the simplified working model are conservative under CABC.
The type I errors of the bootstrap t-test
, calibrated t-test
and calibrated Wald's test
under CABC are reasonably close to the nominal level 5%, depicting the validity of all three tests, and the consistency of
.
,
and
have almost the same empirical power, which agrees with the asymptotic equivalence of
,
and
under CABC.
Table 1. Simulation power in % under linear model (
, N=200, 10,000 simulation runs when
, 2000 simulation runs when
).
The advantage of the proposed bootstrap t-test is that it directly estimates the variance of by Monte Carlo sampling, which performs well under small sample size and is robust against any model misspecification. The one-way analysis of covariance test is invalid under CABC if model is misspecified. But the calibrated one-way analysis of covariance test is robust against model misspecification, computationally easy and performs well with regard to both type I error and power. The calibrated t-test is computationally easy, but has certain requirement on sample size for the gap between variance estimator and
to be ignorable.
4.2. Logistic model
The second simulation setting is , where
and
are both binary with
. Both
and
are used in the CABC and in the construction of Wald's test, but the interaction term is ignored in the analysis. The rest of the parameters are the same as in Table .
The simulation results and values of are shown in Table . A few conclusions from Table are:
The two sample t-test is conservative under CABC, while Wald's test is valid though derived under the simplified working model.
is valid under CABC, indicating that under the generalised linear model, the new variance estimator
is still valid.
and
have almost the same power as Wald's test
.
Table 2. Simulation power in % under logistic model (
, N=200, 10,000 simulation runs when
, 2000 simulation runs when
).
Acknowledgements
The author would like to thank two referees for their helpful comments and suggestions.
Disclosure statement
No potential conflict of interest was reported by the author.
ORCID
Additional information
Notes on contributors
Ting Ye
Ting Ye is a Ph.D. student in Department of Statistics in University of Wisconsin-Madison. Her research interests focus on clinical trial design, survival analysis and missing data.
References
- Aickin M. (2001). Randomization, balance, and the validity and efficiency of design-adaptive allocation methods. Journal of Statistical Planning and Inference, 94, 97–119. doi: 10.1016/S0378-3758(00)00228-7
- Aickin M. (2002). Beyond randomization. The Journal of Alternative Medicine, 8, 765–772.
- Brikett N. J. (1985). Adaptive allocation in randomized controlled trials. Controlled Clinical Trials, 6, 146–155. doi: 10.1016/0197-2456(85)90120-5
- Efron B. (1971). Forcing a sequential experiment to be balanced. Biometrika, 58, 403–417. doi: 10.1093/biomet/58.3.403
- Forsythe A. B. (1987). Validity and power of tests when groups have been balanced for prognostic factors. Computational Statistics and Data Analysis, 5, 193–200. doi: 10.1016/0167-9473(87)90015-6
- Hagino A., Hamada C., Yoshimura I., Ohashi Y., Sakamoto J., & Nakazato H. (2004). Statistical comparison of random allocation methods in cancer clinical trials. Controlled Clinical Trials, 25, 572–584. doi: 10.1016/j.cct.2004.08.004
- Kalish L. A., & Begg C. B. (1985). Treatment allocation methods in clinical trials: A review. Statistics in Medicine, 4, 129–144. doi: 10.1002/sim.4780040204
- Ma W., Hu F., & Zhang L. (2015). Testing hypotheses of covariate-adaptive randomized clinical trials. Journal of the American Statistical Association, 110(510), 669–680. doi: 10.1080/01621459.2014.922469
- Pocock S. J., & Simon R. (1975). Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial. Biometrics, 31, 103–115. doi: 10.2307/2529712
- Shao J., & Yu X. (2013). Validity of tests under covariate-adaptive biased coin randomization and generalized linear models. Biometrics, 69, 960–969. doi: 10.1111/biom.12062
- Shao J., Yu X., & Zhong B. (2010). A theory for testing hypotheses under covariate-adaptive randomization. Biometrika, 97, 347–360. doi: 10.1093/biomet/asq014
- Taves D. R. (1974). Minimization: A new method of assigning patients to treatment and control groups. Clinical Pharmacology and Therapeutics, 15, 443–453. doi: 10.1002/cpt1974155443
- Weir C. J., & Lees K. R. (2003). Comparison of stratification and adaptive methods for treatment allocation in an acute stroke clinical trial. Statistics in Medicine, 22, 705–726. doi: 10.1002/sim.1366
Appendix. Proofs of (4)–(7)
Proof of (Equation4(4) ).
Proof of (Equation4
(4) )
Applying result (7.9) in Efron (Citation1971) to each category defined by and using the fact that
and
, where
, we obtain that
(A1) Applying (EquationA1
(A1) ), we obtain
Letting
, we obtain that
Applying result (7.9) in Efron (Citation1971) to each category defined by
and using the fact that
is discrete, we conclude that the last term in the previous expression is
conditionally on
. Thus,
The asymptotic mean of
is
, which follows from the fact that
's are conditionally independent of
given
,
, and
by the definition of
.
Since 's are of mean 0 and independent of
,
and
Since
's and
are conditionally independent given
and
, we obtain that
where
. Therefore,
and
Given
,
's and
's are conditionally independent. Hence, by the central limit theorem and the above results, the conditional distribution of
given
, is asymptotically normal with mean 0 and variance
, which converges to
by the law of large number. Thus, conditionally on
or unconditionally, the quantity in (??) is asymptotically normal with mean 0 and variance
.
Proof of (Equation7(7) ).
Proof of (Equation7
(7) )
Without loss of generality, we assume that under ,
in the proof. From the fact that
,
where is the number of subjects satisfying
. Recall that
's and
's are independent and identically distributed. By the law of large numbers,
Now that
can be expressed as
, which together with the dominated convergence theorem and the fact that
imply that
.