775
Views
2
CrossRef citations to date
0
Altmetric
Review Article

Inference after covariate-adaptive randomisation: aspects of methodology and theory

Pages 172-186 | Received 10 Aug 2020, Accepted 01 Jan 2021, Published online: 18 Jan 2021

Abstract

Covariate-adaptive randomisation has a more than 45 years of history of applications in clinical trials, in order to balance treatment assignments across prognostic factors that may have influence on the outcomes of interest. However, almost no theory had been developed for covariate-adaptive randomisation until a paper on the theory of testing hypotheses published in 2010. In this article, we review aspects of methodology and theory developed in the last decade for statistical inference under covariate-adaptive randomisation. We focus on issues such as whether a conventional procedure valid under the assumption that treatments are assigned completely at random is still valid or conservative when the actual randomisation is covariate-adaptive, how a valid inference procedure can be obtained by modifying a conventional method or directly constructed by stratifying the covariates used in randomisation, whether inference procedures have different properties when covariate-adaptive randomisation schemes have different degrees of balancing assignments, and how to further adjust covariates in the inference procedures to gain more efficiency. Recommendations are made during the review and further research problems are discussed.

1. Introduction

In a clinical trial to compare k2 treatments, patients are typically randomised into treatment arms according to fixed treatment assignment proportions π1,,πk, where each πt is a known number strictly between 0 and 1 and t=1kπt=1. The simplest randomisation scheme assigns patients to treatments completely at random and, thus, is called complete randomisation or simple randomisation. However, simple randomisation may yield imbalance assignments, i.e. sample sizes not following the assignment proportions across some prognostic factors or covariates, e.g., institution, disease stage, prior treatment, gender and age, which are thought to have significant influence on the outcomes or responses of interest. For instance, a trial exhibiting a substantial imbalance in patient age or disease stage between two treatment arms may not pass a regulatory review even though a statistically significant treatment effect has been shown. The issue is more serious when patients are not all available for simultaneous assignment of treatments but rather arrive sequentially and must be treated immediately.

This leads to the development of covariate-adaptive randomisation (which is also referred to as dynamic allocation), i.e., treatment assignment of the ith patient is made dependent on the observed covariate value of this patient and the assignments and covariate values of all i−1 previously assigned patients. It should be emphasised that covariate-adaptive randomisation does not use any outcomes or responses from the i−1 previous patients when the ith patient is randomised to a treatment arm. Adaptive randomisation methods using outcomes or responses are not our focus and can be found, for example, in Hu and Rosenberger (Citation2006), Zhang et al. (Citation2007), Hu et al. (Citation2009), Rosenberger and Lachin (Citation2015), and the references therein. The oldest covariate-adaptive randomisation scheme is the minimisation proposed by Taves (Citation1974) and its extensions in Pocock and Simon (Citation1975). Other popular covariate-adaptive randomisation methods include the stratified permuted block randomisation method (Zelen, Citation1974), the stratified urn design (Wei, Citation1977; Zhao & Ramakrishnan, Citation2016) and the stratified biased coin method (Kuznetsova & Johnson, Citation2017; Shao et al., Citation2010). Summaries of different allocation schemes are given by Kalish and Begg (Citation1985), Schulz and Grimes (Citation2002), and Rosenberger and Sverdlov (Citation2008).

How often are covariate-adaptive schemes applied in clinical trials? According to Taves (Citation2010), from 1989 to 2008, over 500 clinical trials implemented the minimisation method to balance important covariates. In a recent review of nearly 300 clinical trials published in year 2009 and year 2014 (Ciolino et al., Citation2019), 237 of them used covariate-adaptive randomisation. Other examples can be found in van der Ploeg et al. (Citation2010), Fakhry et al. (Citation2015), Breugom et al. (Citation2015), Stott et al. (Citation2017), and Sun et al. (Citation2018). In the 2018 New England Journal of Medicine, there are seven articles about covariate-adaptive randomisation (Horn et al., Citation2018; Jourdain et al., Citation2018; McKeever et al., Citation2018; Mehra et al., Citation2018; Myles et al., Citation2018; Ramirez et al., Citation2018; Zannad et al., Citation2018). Applications of covariate-adaptive randomisation are not limited to clinical trials, as they are relevant for randomised experiments with many interventions.

How should inference be carried out with data collected under covariate-adaptive randomisation? Unfortunately, tests and other inference procedures constructed based on simple randomisation, which will be called conventional tests and inference procedures, are often applied in practice after data are collected under covariate-adaptive randomisation. For example, in the seven articles cited previously in the 2018 New England Journal of Medicine, they all used conventional tests for treatment effect. On one hand, over 35 years between 1974 and 2009, there were many empirical results showing that some conventional tests could still control Type I errors in spite of using covariate-adaptive randomisation; see, for example Birkett (Citation1985), Forsythe (Citation1987), Aickin (Citation2002), Weir and Lees (Citation2003), Hagino et al. (Citation2004), and Zhong and Kim (Citation2008). On the other hand, the Committee for Proprietary Medicinal Product commended that ‘it remains controversial whether the analysis adequately reflects the randomisation scheme’ (Committee for proprietary medicinal products, Citation2004) and the European Medicines Agency 2015 guidelines stated that ‘possible implications of dynamic allocation methods [ minimisation] on the analysis, e.g., with regard to bias and Type I error control should be carefully considered,… conventional statistical methods do not always control the Type I error’ (EMA, Citation2015). Because a statistical inference procedure on treatment effects should be valid under the particular randomisation scheme used in data collection, the application of conventional inference procedures after covariate-adaptive randomisation has definitely raised concerns and controversies.

Why don't we always apply an inference procedure valid under a given covariate-adaptive randomisation scheme? In their review of covariate-adaptive randomisation, Rosenberger and Sverdlov (Citation2008) stated:

Very little theoretical work has been done in this area, despite the proliferation of papers. The original source papers are fairly uninformative about theoretical properties of the procedures.

That is, the lack of theoretical work in developing valid inference procedures associated with covariate-adaptive randomisation schemes is probably the main reason why conventional procedures are applied in applications. Why is there so little theoretical work in this problem prior to 2008? Unlike simple randomisation, covariate-adaptive randomisation generates some dependence among treatment assignments, covariates and outcomes under which asymptotic distributions of treatment effect estimators (such as the difference of sample averages) could not be easily derived. Shao et al. (Citation2010) initiated theoretical studies on the validity of statistical tests under covariate-adaptive randomisation. The following three issues are addressed in their paper:

  1. Can we develop a test procedure valid under covariate-adaptive randomisation?

  2. If we use covariate-adaptive randomisation and a conventional test procedure valid under simple randomisation, will the Type I error of the test be inflated?

  3. Is a test under covariate-adaptive randomisation more powerful than it is under simple randomisation?

If we have affirmative answers to Questions (A)–(C), or at least Questions (A)–(B), then the concerns and controversies about using covariate-adaptive randomisation will be largely eliminated. As the first piece of theoretical work, the results in Shao et al. (Citation2010) are limited to certain types of tests, randomisation schemes, and models between covariates and responses. Fortunately, significant progresses in the theory of this area have been made in the last decade, e.g., Hu and Hu (Citation2012), Shao and Yu (Citation2013), Ma et al. (Citation2015), Bugni et al. (Citation2018Citation2019), Ye (Citation2018), Ma et al. (Citation2020), Ye and Shao (Citation2020), and Ye et al. (Citation2020). Another stream of results is based on permutation or re-randomisation methods, e.g., Simon and Simon (Citation2011), Kaiser (Citation2012), and Bugni et al. (Citation2018).

The purpose of this article is to review aspects of methodology and theory for statistical inference after covariate-adaptive randomisation. We concentrate on Questions (A)–(C) previously stated and the main results in the last decade, some of which are very recent. It is our hope that this review will provide some guidance for clinical trialists about which valid inference procedures to use for various situations, and will shed light on further research and development in this important area.

2. Covariates, outcomes and treatment effects

First, let's describe covariates and outcomes or responses under a clinical trial. Consider a clinical trial with a total of n patients that are assigned to k2 treatment arms denoted by a=1,,k. From patient i{1,,n}, let Xi be the vector of all observed covariates and let Yi(a) be the potential outcome or response of interest under treatment assignment a=1,,k. Yi(a) is called potential outcome because only one of Yi(1),,Yi(k) will be observed from patient i, as each patient receives only one treatment. Thus what we observe from patient i is Yi=Yi(a) if treatment a is assigned to patient i.

The outcome or response Yi(a) could be continuous or discrete, or a survival time. In a survival trial, censoring is typically involved so that, for patient i, Yi(a)=min(Ti(a),Ci(a)) together with an indicator of the event Ti(a)Ci(a) are observed, where Ti(a) is the potential survival or failure time and Ci(a) is the potential censoring time, under treatment a=1,,k.

Throughout, we assume the following minimal condition on covariates and outcomes.

(C1)

(Yi(1),,Yi(k),Xi), i=1,,n, are independent and identically distributed.

Note that there is no assumption on the relationship between the covariates and potential outcomes. We allow arbitrary treatment effect heterogeneity, i.e., the effect of treatment and covariate interaction on potential outcomes.

For convenience, we use Y(1),,Y(k),X to denote the variables from a generic patient. Under (C1), (Y(1),,Y(k),X)(Yi(1),,Yi(k),Xi) for every i, where XY means that X has the same distribution as Y.

To assess treatment effect, we may be interested in the average treatment effect between any fixed pair of treatment arms, a and b, defined as E(Y(a)Y(b)), where E is the population expectation and E(Y(a)) is assumed to be well defined. Another important measure in comparing treatments a and b is the quantile treatment effect defined as qτ(a)qτ(b) (Firpo, Citation2007; Zhang et al., Citation2020), where qτ(a) is the τth quantile of the distribution of Y(a) under treatment a and τ is a fixed fraction. Quantile treatment effect is more appropriate when potential outcomes are highly skewed and is more relevant and informative than the average treatment effect when some distributional impacts have to be assessed.

Both average treatment effect and quantile treatment effect are some characteristics of the distributions of potential outcomes. In some applications, we would like to assess the treatment effect on the entire distribution of a potential outcome or the entire conditional distribution of a potential outcome given covariates. For example, in a survival analysis we may be interested in testing whether the conditional distributions of T(a) given X are the same for different a's.

Here we would like to make it clear that treatments may have effect not only on the marginal distributions of potential outcomes, but also the conditional distributions of Y(a) given X, although typically treatments may not have any effect on the marginal distribution of the covariate X.

So far we have not yet discussed the treatment assignment of patients. Suppose that treatment assignments are made according to some probabilistic mechanism. For patient i, let Ai be the treatment assignment indicator vector, i.e., Ai=ea if patient i is assigned to treatment a, where ea is a vector whose ath component is 1 and rest components are 0's, a=1,,k. The observed outcome from patient i is Yi=Yi(a) if and only if Ai=ea, a=1,,k, i=1,,n.

3. Covariate-adaptive randomisation schemes

We now introduce details about how Ai's are generated according to a randomisation scheme, using or without using covariates Xi's.

Under simple randomisation, Ai's are independent of (Yi(1),,Yi(k),Xi)'s and, further, Ai's are independent and identically distributed with P(Ai=ea)=πa, where P denotes the probability under a given randomness mechanism. It should be emphasised that the independence between Ai's and (Yi(1),,Yi(k),Xi)'s means that the treatment assignments are independent of potential outcomes and covariates, not that treatments have no effect on potential outcomes or conditional distribution of Y(a) given X as discussed in Section 2.

Let Z be a vector of discrete covariates with finitely many levels to be utilised in covariate-adaptive randomisation. Typically, components of Z are some discrete components of X and/or some discretised continuous components of X that are thought to have significant influence on the potential outcomes. In the following, we describe some popular covariate-adaptive randomisation schemes for enforcing assignment allocation across at levels of Z. In a typical covariate-adaptive randomisation scheme, for the ith patient arrived with observed Zi, the treatment assignment indicator Ai is generated depending on not only the value of Zi but also the Z-values and assignments of the previous i−1 patients, i=1,,n.

The stratified permuted block randomisation method (Zelen, Citation1974) randomly assigns a block of size B patients into k arms each having Bπa patients for every B sequentially arrived patients with Z=z, a particular level of Z, where B is appropriately chosen so that Bπa's are integers and the last block is allowed to be incomplete. This method is called stratified permuted block randomisation since randomisation is carried out within each stratum (joint level of Z) to achieve balancedness of assignments across strata.

When k = 2 and π1=π2=1/2, the stratified biased coin method (Shao et al., Citation2010) assigns patient i with Zi=z according to the biased coin randomisation in Efron (Citation1971), P(Ai=e1)=p,Di1(z)<0,1/2,Di1(z)=0,1p,Di1(z)>0,where p is a fixed constant satisfying 1/2<p<1 and Di1(z) is one half of the within z stratum difference between the numbers of patients in treatment 1 and treatment 2 after i−1 assignments have been made. An extension of the stratified biased coin for general case of k3 can be found in Kuznetsova and Johnson (Citation2017).

The stratified urn design (Wei, Citation1977Citation1978aCitation1978b) is the stratified biased coin randomisation with p depending on i. When k = 2 and π1=π2=1/2, the fixed p in biased coin is replaced by a pi depending on Di1(z). According to Wei (Citation1977), the urn design would force balance at the beginning of treatment allocation, and approach simple randomisation as the size of trial increases. A stratified urn design for general situation of k3 can be constructed using the method described in Zhao and Ramakrishnan (Citation2016).

The previous three stratified covariate-adaptive randomisation schemes enforce balancedness of treatment assignment allocation across all strata, i.e., joint levels of Z. However, the oldest covariate-adaptive randomisation scheme, the minimisation, is very different from these three methods.

First, consider k = 2 and π1=π2=1/2. For each i{1,,n}, let Gi(1) be a weighted sum of squared or absolute differences between numbers of patients in two treatment arms over marginal levels of Z, where the calculation is based on i−1 previously assigned patients and the assumption that the ith patient i is assigned to treatment 1, and let Gi(2) be the same sum except that the ith patient is assumed to be in treatment 2. For a = 1 or 2, Gi(a) represents the ‘total amount of imbalance’ in treatment numbers across the marginal levels of Z which exists if treatment a is assigned to the ith patient. Therefore, we would like to assign the ith patient by minimising Gi(a) over a = 1, 2, i.e., we assign the ith patient to treatment 1 if Gi(1)<Gi(2), to 2 if Gi(1)>Gi(2), and to 1 or 2 randomly if Gi(1)=Gi(2). This is why the method is called the minimisation by Taves (Citation1974). Pocock and Simon (Citation1975) extended the minimisation by allowing minimisation with a given probability, i.e., P(Ai=e1)=p,Gi(1)<Gi(2),1/2,Gi(1)=Gi(2),1p,Gi(1)>Gi(2),where p>1/2 is a fixed constant. Pocock and Simon's method is still referred to as the minimisation and Taves' minimisation is the special case with p = 1. For a general k and/or allocation, the minimisation can be similarly constructed (Han et al., Citation2009; Pocock & Simon, Citation1975).

If Z is one dimensional, then the minimisation is the same as the stratified biased coin method. For a multivariate Z, the key distinction between the minimisation and the three previously described stratified randomisation methods is that enforcing treatment balancedness is at all joint levels of Z for the latter but only at marginal levels of Z for the former. For this reason, the minimisation is also called the marginal method in Ma et al. (Citation2015) and Ye and Shao (Citation2020). Enforcing treatment balance in marginal levels of Z is sufficient in most applications.

Any of the previously introduced covariate-adaptive randomisation schemes satisfy

(D1)

{Ai,i=1,,n} and {Yi(1),,Yi(k),Xi,i=1,,n} are conditionally independent given {Zi,i=1,,n}.

Actually, (D1) almost always holds for covariate-adaptive randomisation, because treatments, not their assignments, may affect the potential responses as we discussed earlier for simple randomisation, and given Zi's, the rest of Xi's contain covariate information not used in randomisation.

Furthermore, all covariate-adaptive randomisation schemes considered so far satisfy the following condition (D2) (Baldi Antognini & Zagoraiou, Citation2015). In what follows, ⇒ denotes convergence in distribution as the sample size n, and 0 is in fact convergence to 0 in probability.

(D2)

For every i=1,,n, P(Ai=ea|Z1,,Zn)=πa and, for every a and every level z of Z, {n(z)}1D(a)(z) 0, where D(a)(z)=na(z)πan(z), n(z) is the number of patients with Zi=z, and na(z) is the number of patients with Zi=z under treatment a.

Note that D(a)(z) is a measure of the assignment imbalance in stratum z. According to the asymptotic property of D(a)(z) in (D2), covariate-adaptive randomisation schemes can be classified into one of the following three types.

Type 1.

For every a and every z of Z, {n(z)}1/2D(a)(z)0.

Type 2.

For every a, D(a)(z)'s with all different strata z's are mutually independent and, for every z, {n(z)}1/2D(a)(z)N(0,va), the normal distribution with mean 0 and a known variance va>0.

Type 3.

Methods not in Type 1 or 2.

The three types are in the order of the degree in enforcing the balancedness within every z using the assignment imbalance measure D(a)(z). Type 1 is the strongest, requiring D(a)(z) diverging slower than the square root of within stratum z sample size. Representatives of Type 1 covariate-adaptive randomisation methods are stratified permuted block and biased coin schemes. In fact, under stratified permuted block randomisation, D(a)(z) is bounded; for the stratified biased coin method, it follows from a result in Efron (Citation1971) that D(a)(z) is bounded in probability for every z.

Type 2 is weaker than Type 1, as {n(z)}1/2D(a)(z) converges in distribution to N(0,va), not 0. The stratified urn design is Type 2 with va=1/12 when k = 2, π1=π2=1/2 (Wei, Citation1978aCitation1978b). Simple randomisation treated as a special case of covariate-adaptive randomisation is also Type 2. Finally, the minimisation is Type 3, since it is neither Type 1 nor Type 2 (Ye & Shao, Citation2020). Specifically, under minimisation, D(a)(z) and D(a)(z) with zz are not independent, and their relationship is complicated, because assignments are made according to marginal levels of Z.

4. Validity and conservativeness of tests

Testing a null hypothesis of no treatment effect on potential outcomes is the most utilised statistical inference procedure in clinical trials. For a given null hypothesis H0 and a significance level α>0, a test statistic T is a function of observed {Yi,Xi,i=1,,n}, which is constructed such that H0 is rejected if and only if T is outside of the interval [zα/2,z1α/2], where zr is the rth quantile of a known distribution, usually the standard normal distribution, in which case H0 is rejected if and only if |T|>z1α/2 as z1α/2=zα/2. Here, we consider two sided tests; the discussion for a one sided test is similar and omitted. T is said to be asymptotically valid (or valid for simplicity) if (1) supP under H0limnPT[zα/2,z1α/2]=α(1) T is said to be asymptotically conservative (or conservative for simplicity) if (2) supP under H0 limnPT[zα/2,z1α/2]<α.(2)

4.1. Validity of conventional tests

As we discussed in Section 1, prior to 2010, there was almost no theoretical work and practitioners applied conventional tests developed under simple randomisation, which caused concerns about whether Type I error could be inflated. That is, if a conventional test T is applied after covariate-adaptive randomisation, does (Equation1) still hold?

Forsythe (Citation1987) concluded that a conventional test T still controls Type I error when Z used in minimisation is also included in the construction of T. However, this conclusion was based on simulation results under certain models.

The first piece of theoretical work in this area obtained by Shao et al. (Citation2010) is that, under covariate-adaptive randomisation, a conventional T is valid in the sense of (Equation1) if both of the following hold:

  1. The covariate Z used in covariate-adaptive randomisation is a function of all covariates used to construct the test T.

  2. T is valid in the sense of (Equation1) under any fixed set of treatment allocation A1,,An.

Note that (i) coincides with Forsythe's simulation discovery. But (ii) requires the validity of T under any deterministic allocation A1,,An, which can be realistically achieved only when a correct statistical model is used in constructing T. However, correctly impose a model is difficult. Although mathematically, (i)–(ii) is only sufficient not necessary for the validity of a conventional test T under covariate-adaptive randomisation, we can easily find an example in which T is not valid under covariate-adaptive randomisation when either (i) or (ii) fails; e.g., Shao et al. (Citation2010) and Shao and Yu (Citation2013).

4.2. Conservativeness of conventional tests

Before we answer Question (A) in Section 1 regarding the development of a valid test according to (Equation1) under covariate-adaptive randomisation, we would like to address Question (B) in Section 1, i.e., whether or not a conventional test T is conservative in the sense of (Equation2). If the answer is yes, then at least the Type I error is not inflated by using conventional tests.

The first result of this kind was obtained by Shao et al. (Citation2010) regarding the two sample t-test under a homogeneous one-way analysis of covariance model. The result is, the conventional two sample t-test for comparing two treatments (k = 2) is conservative according to (Equation2) under the stratified biased coin randomisation. Following this work, results about the conservativeness of different conventional tests under different models and covariate-adaptive randomisation methods have been obtained by Hu and Hu (Citation2012), Shao and Yu (Citation2013), Ma et al. (Citation2015), Bugni et al. (Citation2018), Ye (Citation2018), and Ye and Shao (Citation2020). In particular, under Type 1 or 2 randomisation schemes described in Section 3, Ye and Shao (Citation2020) proved the conservativeness of the conventional log-rank and score tests for survival analysis, which is a substantial advance in the theory of this area. Unfortunately, no result is available for the conservativeness of conventional tests under minimisation, except for some unrealistic cases. Furthermore, the available results are for particular conventional tests, i.e., no general result is available.

The reason why conventional tests become conservative under some covariate-adaptive randomisation schemes as well as why the result is not available for minimisation can be explained as follows. Many (if not most) conventional tests are ratios with numerators being statistics accessing the plausibility of the null hypothesis H0 and denominators being standard errors estimating the standard deviations of the corresponding numerators. For example, the two sample t-test for testing effect between two treatments (k = 2) is (3) T=Y¯1Y¯2S12/n1+S22/n2,(3) where na is the number of patients assigned to treatment a, Y¯a and Sa2 are the sample mean and sample variance, respectively, based on Yi's under treatment a; the numerator Y¯1Y¯2 of T in (Equation3) accesses the plausibility of the null hypothesis H0:E(Y(1)Y(2))=0, and the denominator S12/n1+S22/n2 estimates the asymptotic standard deviation of Y¯1Y¯2. Under a Type 1 or 2 covariate-adaptive randomisation scheme, it is usually true that the numerator of T in (Equation3) still measures the plausibility of H0, and the denominator of T in (Equation3) is too large because the Type 1 or 2 covariate-adaptive randomisation scheme typically reduces the variation of numerator after enforcing the balancedness of treatment assignments. Specifically, Y¯1 and Y¯2 are independent under simple randomisation but are negatively correlated under Type 1 or Type 2 covariate-adaptive randomisation and, consequently, the variance of Y¯1Y¯2 is smaller under covariate-adaptive randomisation and S12/n1+S22/n2 still estimates the variance of Y¯1Y¯2 under simple randomisation. The reduction in variation together with the fact that the denominator of conventional test does not account for this reduction lead to the conservativeness of conventional test.

As we discussed in Section 3, the stratified permuted block and biased coin randomisation schemes are Type 1 and the stratified urn designs are Type 2. Hence conventional tests are conservative under these randomisation schemes.

The minimisation, however, is neither Type 1 nor Type 2 (Ye & Shao, Citation2020). The only available result on the asymptotic distribution of Y¯1Y¯2 under minimisation is obtained (Ma et al., Citation2015) under a very restrictive and nearly unrealistic condition that not only the relationship between the observed response Yi and Zi is linear, but also all components of Zi are independent and there is no other covariate in the linear model. Because the minimisation only enforces the marginal balancedness of treatment assignments, its asymptotic properties are very complicated and a general result about the asymptotic distribution of a simple statistic like Y¯1Y¯2 is not available. Some progress has been made in some recent work (Hu & Zhang, Citation2020), but the problem is not completely solved.

4.3. Development of valid tests

We now return to address Question (A) in Section 1. Although a conservative test controls the Type I error rate, it may lose power of the test and, thus, may not be appreciated by clinical trialists.

From the discussion in Section 4.1, a conventional test is valid according to Equation (Equation1) if (i)–(ii) hold, but (ii) requires prefect modelling that may be unrealistic, since model misspecification often occurs especially when there are many covariates. The discussion in Section 4.2 actually suggests that we modify the denominator of a conventional test to develop a valid test under covariate-adaptive randomisation. The first result was also obtained by Shao et al. (Citation2010) who proposed a bootstrap variance estimator for the two sample t-test with a component of re-generating treatment assignments in every bootstrap sample to account for the correct variation under the stratified biased coin randomisation. The resulting bootstrap test replaces the denominator of two sample t-test in (Equation3) by the squared root of the bootstrap variance estimator and is valid according to (Equation1). This bootstrap method can be extended to modifying many other conventional tests, for Type 1 or 2 covariate-adaptive randomisation scheme (Shao & Yu, Citation2013; Ye & Shao, Citation2020).

With some effort on deriving the asymptotic distribution of the numerator of a conventional test under Type 1 or 2 covariate-adaptive randomisation, a valid test can also be constructed by correctly estimating the asymptotic variance of the numerator (Ye, Citation2018; Ye & Shao, Citation2020). For the conventional two sample t-test in (Equation3), for example, Ye (Citation2018) showed that a valid test under stratified biased coin randomisation can be obtained by replacing the denominator of the t-test by 2zn(z)S2(z)/n, where S2(z) is the sample variance based on Yi's in stratum Z=z. Compared with the bootstrap, this approach does not require a large amount of computation and has another advantage to be discussed later.

Perhaps a better approach is to directly derive a valid test based on a given covariate-adaptive randomisation scheme or a general group of randomisation schemes. This will be discussed in Section 5 when we consider general inference procedures.

Another stream of methods is based on re-randomisation or permutation, e.g., Simon and Simon (Citation2011), Kaiser (Citation2012), and Bugni et al. (Citation2018). In the rest of this section, we discuss in details about the re-randomisation approach in Simon and Simon (Citation2011), which is somewhat similar to the bootstrap method. Consider k = 2 and H0:Y(1)Y(2). Under H0, Y(1) and Y(2) are exchangeable so that we create potential outcome Y~i(1)=Y~i(2)=Yi for patient i. Let A=(A1,,An) be the observed treatment assignments under the given covariate-adaptive randomisation scheme. Any test T can be written as T(A,O), where O={Y~i(1),Y~i(2),Xi,i=1,,n}. Let C=(C1,,Cn) be randomly generated treatment assignments under the same randomisation scheme, i.e., CA conditioned on Z, T(C,O) be T(A,O) with A replaced by C, and let FO be the cumulative conditional distribution function of T(C,O) given O. From the probability theory, P|T(C,O)<FO1(α/2) or T(C,O)>FO1(1α/2)|Oα.Hence, unconditionally, under H0, P{T(C,O)<FO1(α/2) or T(C,O)>FO1(1α/2)}αand if we reject H0 if and only if T is outside of the interval [FO1(α/2),FO1(1α/2)], then this T has Type I error rate α for every n.

Two issues remain to be considered. The first one is that the quantile FO1(r) usually has no explicit form and approximation such as Monte Carlo is needed. The second issue is that this method may be conservative for every n, because T(C,O) with random C is discrete. At this stage, it is still unknown whether result (Equation1) holds for this method, since the previous argument shows that the left-hand side of (Equation1) α, but we cannot prove the equality in (Equation1) holds, i.e., we cannot rule out the possibility that (Equation2) actually holds so that the re-randomisation method is conservative.

4.4. Tests in survival analysis

We review some available theory for survival analysis, since covariate-adaptive randomisation has a long history of application in survival trials. In fact, all 7 articles in the 2018 New England Journal of Medicine cited in Section 1 are about survival trials.

For simplicity, we focus on the case of k = 2.

The data structure for survival analysis is described in the beginning of Section 2, where the potential outcome Y(a)=min(T(a),C(a)), T(a) is the potential survival, and C(a) is the potential censoring, under treatment a. It is typically assumed that conditional on covariate X, T(a) and C(a) are independent and the ratio P(C(1)t|X)/P(C(2)t|X) is a function of t only.

The most common analysis in survival trials is testing whether two treatments have different effect on the conditional distributions of T(a) given X. Let λ(t,x,a) be the underlying hazard function of T(a) given X=x, a = 1, 2. The null hypothesis of interest is H0:λ(t,x,1)=λ(t,x,2) for all possible t and x.

Without imposing any model, a conventional nonparametric test for H0 is the log-rank test (4) T=i=1n0AiS1(t)S(t)dNi(t)×i=1n0S1(t)S2(t){S(t)}2dNi(t)1/2,(4) where Sa(t)=i=1nI(Ai=ea)I(Yi(a)t), I(C) is the indicator of event C, S(t)=S1(t)+S2(t), Ni(t)=I(Ai=e1)Ni(1)(t)+I(Ai=e2)Ni(2)(t), and Ni(a)(t)=I(Ti(a)Ci(a))I(Yi(a)t), a = 1, 2. Similar to the two sample t-test in (Equation3), the log-rank test that is valid according to (Equation1) under simple randomisation is conservative in the sense of (Equation2) under Type 1 or 2 covariate-adaptive randomisation, because the denominator of T in (Equation4) is too large as a standard error for the numerator of T. A valid modified log-rank test is derived by replacing the denominator of T with the squared root of a stratified variance estimator given in Formula (20) of Ye and Shao (Citation2020).

In survival analysis, the following Cox proportional hazard model is very popular: (5) λ(t,x,a)=λ0(t)exp(θa+βTx),(5) where θ is an unknown parameter, βT is the transpose of a vector β of unknown parameters, and λ0(t) is an unspecified baseline hazard function. If the Cox model is correct, then the null hypothesis is the same as H0:θ=0, and a score test of H0 can be derived using the partial likelihood under the Cox model. The idea is that the score test is more powerful than the log-rank test if the Cox model is correct. Even if the Cox model could be misspecified, it can be used as a working model under the model-assisted approach, i.e., a model is used to assist the derivation of an inference procedure that is efficient when the model is correct and is still asymptotically valid when the model is incorrect.

Under simple randomisation, a valid model-assisted score test was derived (DiRienzo & Lagakos, Citation2002; Kong & Slud, Citation1997; Lin & Wei, Citation1989), which is often more powerful than the log-rank test in (Equation4) without using any covariates. This conventional score test, however, is shown in Ye and Shao (Citation2020) to be conservative under Type 1 or 2 covariate-adaptive randomisation, because of the same reason that the denominator of the score test is too large as a standard error. Again, we can obtain a valid score test by replacing the denominator with the squared root of a stratified variance estimator (Ye & Shao, Citation2020).

We can also apply the bootstrap or re-randomisation discussed in Section 4.3 to construct valid tests. However, the bootstrap or re-randomisation discussed in Section 4.3 is not correct in survival analysis, unless we assume P(C(1)t|X)=P(C(0)t|X) for all t. The reason is that, to apply the bootstrap or re-randomisation, the observed (Yi,Xi)'s have to be exchangeable across i under H0. Under H0, although Ti(a)'s are exchangeable, Ci(a)'s are not unless P(C(1)t|X)=P(C(2)t|X) for all t. Even if the treatment has no effect on the potential survival time, it may have some effect on the potential censoring due to some practical reasons.

5. Valid inference

We have already discussed to some extent how to construct valid tests under covariate-adaptive randomisation. There are a few shortcomings in those available results reviewed in Section 4. First, an obvious one is that some results/methods rely on correct specification of a model. Second, all results/methods in Section 4 depend on covariate-adaptive randomisation schemes; in particular, Type 1 or 2 randomisation method is required, which excludes the minimisation. Third, only testing hypotheses is considered, not other inference such as confidence sets. Finally, all methods in Section 4 are modifications of conventional procedures.

In this section, we would like to address the following re-phrased Question (A) in Section 1:

  1. Can we develop an inference procedure valid under covariate-adaptive randomisation with very little model assumption?

5.1. Testing in survival analysis

We begin with the log-rank test for survival data in the case of k = 2. The stratified log-rank test (Peto et al., Citation1976) is simply the log-rank test in (Equation4) stratified with all levels of the discrete covariate Z utilised in covariate-adaptive randomisation: (6) T=ziL(z)0AiS1(t,z)S(t,z)dNi(t)×ziL(z)0S1(t,z)S2(t,z){S(t,z)}2dNi(t)1/2,(6) where L(z) is the stratum of patients with Zi=z, Sa(t,z)=iL(z)I(Ai=ea)I(Yi(a)t), and S(t,z)=S1(t,z)+S2(t,z). Although the stratified log-rank test in (Equation6) exhibits nice empirical properties under covariate-adaptive randomisation (Lachin et al., Citation1988; Xu et al., Citation2016) and has been used for a long time, the first proof of its validity according to (Equation1) comes from Ye and Shao (Citation2020) with some efforts. The proof actually shows that the stratified log-rank test is valid for any covariate-adaptive randomisation method, including the minimisation, as long as the minimal conditions (D1) –(D2) are satisfied.

Why does stratification make so much difference? Recall that in Section 4.1 we comment that a test will be valid if two conditions are satisfied: (i) Z used in randomisation is also used in constructing the test and (ii) a correct model is used to derive the test. Note that the stratification with strata being levels of Z can be viewed as a kind of modelling based on the discrete covariate Z, and such modelling is always correct. Thus (ii) has been met if we stratify using Z. To meet (i), we must fully stratify, i.e., use all strata defined by joint levels of Z, not partially stratify. It can be shown that if we combine some strata in the construction of the stratified log-rank test, then the resulting test is not valid.

The only issue with the stratified log-rank test in (Equation6) is that it is not efficient if ZX, i.e., X contains more information than Z. In fact, we cannot definitely tell whether the stratified log-rank test is more powerful than the unstratified log-rank test in (Equation4) under simple randomisation, which is similar to the issue of a stratified sample mean may not be always more efficient than the unstratified sample mean in survey sampling. Ye and Shao (Citation2020) showed by simulation that a modified log-rank test that replaces the denominator of T in (Equation4) with a stratified standard error may be more powerful than the stratified log-rank test in (Equation6). The efficiency issue will be further considered in Section 6.

5.2. Inference on average or quantile treatment effect

Next, we consider inference on the population mean difference θ=E(Y(a)Y(b)) with any two fixed treatments a and b in a trial with k2 treatment arms. As the development of inference procedures often starts with finding estimators of the parameter of interest, we first review some available estimators of θ.

The simplest estimator of θ is the sample mean difference Y¯aY¯b, where Y¯a is the sample mean of Yi's under treatment a=1,,k. Bugni et al. (Citation2018) proposed another estimator called the strata fixed effect estimator in their Section 4.2. The asymptotic distributions of Y¯aY¯b and the strata fixed effect estimator have been derived under Type 1 or 2 covariate-adaptive randomisation, but they are not available for Type 3 covariate-adaptive randomisation such as minimisation due to the lack of theory on Type 3 methods.

The following post-stratified estimator of θ, similar to the stratified log-rank test in (Equation6), is proposed by Bugni et al. (Citation2019) and Ye et al. (Citation2020), (7) θˆS=zn(z)n{Y¯a(z)Y¯b(z)},(7) where Y¯a(z) is the sample mean of Yi's from patients in post-stratum L(z) under treatment a=1,,k. If the weight n(z)/n in (Equation7) is replaced by the population weight P(Z=z), then θˆS in (Equation7) is the stratified estimator in survey sampling. Since P(Z=z) is substituted by n(z)/n and L(z) is formed after Z is observed, the estimator θˆS is referred to as post-stratified estimator in survey sampling.

Applying different techniques, Bugni et al. (Citation2019) and Ye et al. (Citation2020) independently established that, if (C1) and (D1)–(D2) hold and the second order moments of Y(a) and Y(b) are finite, then (8) n(θˆSθ)N(0,σS2),(8) where σS2=Evar(Y(a)|Z)/πa+var(Y(b)|Z)/πb+var{E(Y(a)Y(b)|Z)}.Result (Equation8) is model free, i.e., only (C1) and the second-order moments of the potential outcomes are required. It is applicable to any covariate-adaptive randomisation method satisfying (D1)–(D2), most noticeably the minimisation for which very little is known about its theoretical property, as the minimisation is neither Type 1 nor Type 2. Another interesting fact is that the limiting variance σS2 is invariant with respect to randomisation methods. Hence, not only result (Equation8) holds for any covariate-adaptive randomisation method as long as the minimal (D1) –(D2) are satisfied, but also θˆS in (Equation7) has the same asymptotic distribution and efficiency regardless of which randomisation scheme is used for treatment assignments. Such kind of result has not be seen in the literature except that Ye and Shao (Citation2020) showed that the asymptotic distribution of the stratified log-rank test in (Equation6) is invariant to the randomisation schemes. Existing results in the literature (Bugni et al., Citation2018; Ma et al., Citation2015; Shao & Yu, Citation2013; Shao et al., Citation2010) are typically dependent with randomisation methods and many of them are not applicable to Type 3 methods such as the minimisation.

When the covariate-adaptive randomisation scheme is Type 1, result (Equation8) also holds with θˆS replaced by the strata fixed effect estimator in Bugni et al. (Citation2018). In general, however, θˆS is asymptotically more efficient than the strata fixed effect estimator or the simple estimator Y¯aY¯b.

For inference on θ under any type covariate-adaptive randomisation, if θˆS is adopted to estimate θ, then all we need to do is to derive an estimator σˆS2 of σS2 that is consistent, i.e., σˆS2σS20 under any type covariate-adaptive randomisation. This is actually not difficult once we establish a result like (Equation8). It is shown in Ye et al. (Citation2020) that a consistent estimator of σS2 under any type covariate-adaptive randomisation is σˆS2=1nzn2(z)Sa2(z)na(z)+Sb2(z)nb(z)+1nzn(z)Y¯a(z)Y¯b(z)2θˆS2,where na(z) and Sa2(z) are the sample size and sample variance of Yi's, respectively, of the patients in stratum Z=z and under treatment a.

Under any randomisation scheme satisfying (D1)–(D2), an asymptotically valid (1α)% confidence interval for θ has limits θˆS±z1α/2σˆS, where z1α is the quantile of the standard normal distribution.

More estimators of the average treatment effect θ are considered in Section 6.

We now consider inference on another important parameter, the quantile treatment effect defined as qτ(a)qτ(b) in Section 2, where qτ(a) is the τth quantile of the distribution of Y(a) under treatment a and τ is a fixed fraction.

Unlike the means, for quantiles we cannot use differences as in (Equation7). Instead, we estimate qτ(a) and qτ(b) separately, and then take a difference of estimates. Under treatment a, we estimate the marginal distribution of Y(a) at a fixed point y as (9) Fˆ(a)(y)=1nzn(z)na(z)iL(z)I(Ai=ea)I(Yi(a)y),(9) a=1,,k. Then, qτ(a) is estimated by qˆτ(a)= the τth quantile of Fˆ(a), and qτ(a)qτ(b) is estimated as qˆτ(a)qˆτ(b). For inference on quantiles, however, a simple estimator of the asymptotic variance of qˆτ(a) may not be easily obtained. Methods such as the bootstrap or Woodruff's interval may be applied (Shao, Citation2003).

The stratification in (Equation6), (Equation7) or (Equation9), together with the asymptotic theory, provides a solid foundation for valid and model free inference after covariate-adaptive randomisation and, thus, it largely eliminates the concern and controversy as discussed by regulatory agencies about the use of covariate-adaptive randomisation such as minimisation.

Combining the results and discussions in this section and Section 5.1, we reach a general conclusion that a valid inference procedure can be obtained as long as the covariate Z utilised in covariate-adaptive randomisation is fully used in the construction of inference procedure. A simple way to do this is to use all joint levels of Z as strata.

It can be seen that the conditions needed for this conclusion is much weaker than (i) and (ii) stated in Section 4.1, but (i)–(ii) in Section 4.1 are considered for the validity of a conventional test under covariate-adaptive randomisation.

5.3. Effect of types of randomisation schemes

Result (Equation8) about the asymptotic distribution of θˆS in (Equation7) is invariant to any types of randomisation schemes described in Section 3. But this does not imply that the stratification in (Equation7) or in (Equation6) is the best way for inference, especially when problems other than the inference on average treatment effect are considered. An example is that the modified log-rank test in Ye and Shao (Citation2020) may be more powerful than the stratified log-rank test in (Equation6), as discussed in the end of Section 5.1.

If an inference procedure is not invariant to different randomisation schemes, then it is interesting to find out which randomisation scheme, or which type, provides better inference procedures. For the modified log-rank test in Ye and Shao (Citation2020), it is more powerful when a Type 1 randomisation scheme is used, rather than the Type 2 or 3. The same may be true for any inference procedure not invariant to different randomisation schemes. For different Type 1 methods, such as the stratified permuted block and the stratified biased coin methods, so far there is no result indicating that the inference procedures based on these two randomisation schemes have different performances.

6. Efficiency considerations

Question (C) in Section 1 is about whether a test under covariate-adaptive randomisation can be more powerful than it is under simple randomisation. Another question is, if Z is used in randomisation and stratification as in (Equation6) or (Equation7) and if X contains more information than Z, can we obtain more powerful tests or more efficient estimators by utilising covariate information in X that is not in Z? Note that X may contain a component that is not in Z but is related with the potential responses Y(1),,Y(k), or some components of Z are discretised components of X and the remaining information after discretisation is still useful in predicting the potential responses.

6.1. Adjusting for covariates

We first consider the second question in the estimation of θ=E(Y(a)Y(b)) for two fixed treatments a and b. Let U be a function of X that we want to further utilise in improving the efficiency of θˆS in (Equation7). Since the information generated by Z is not in that of U, we assume that var(U|Z=z) is positive definite for every z.

For model free estimation and inference, we do not want to impose any model between the potential responses and U. In fact, it is hard to find a correct model within each stratum Z=z, if we still apply stratification in estimating θ. How do we adjust for covariates without using a model? Ye et al. (Citation2020) adopted the model-assisted generalised regression approach in survey sampling, first discussed in Cassel et al. (Citation1976) and studied extensively in the literature, for example, Särndal et al. (Citation2003), Shao and Wang (Citation2014), and Ta et al. (Citation2020).

In this section, we review some results from Ye et al. (Citation2020). Let Ui be the covariate U-value of patient i, and for each z, let U¯a(z) be the sample mean of Ui's of patients in stratum La(z)={i:Zi=z under treatmenta}, and βˆa(z)=iLa(z){UiU¯a(z)}{UiU¯a(z)}T1×iLa(z){UiU¯a(z)}Yi.Within treatment a and stratum L(z)={Z=z}, βˆa(z) is the least squares estimator of the coefficient vector in front of U under a linear model between Y(a) and U, but the model is not required to be correct. An estimator of θ following θˆS but further adjusting for covariate U is (Ye et al., Citation2020) θˆA=zn(z)n[Y¯a(z)Y¯b(z){U¯a(z)U¯(z)}Tβˆa(z)+{U¯b(z)U¯(z)}Tβˆb(z)],where U¯(z) is the sample mean of Ui's of all patients in stratum L(z).

An alternative estimator θˆB of θ in Ye et al. (Citation2020) is obtained by replacing both βˆa(z) and βˆb(z) in the definition of θˆA with a combined estimator βˆ(z)=a=1kiL(z),Ai=a{UiU¯a(z)}{UiU¯a(z)}T1×a=1kiL(z),Ai=a{UiU¯a(z)}Yi.When k>2, both U¯(z) and βˆ(z) involve data from patients in treatment arms other than a and b.

The following result parallel to result (Equation8) is established in Ye et al. (Citation2020). If (C1) and (D1)–(D2) hold and the second order moments of Y(a) and U are finite, then (10) n(θˆAθ)N(0,σA2)andn(θˆBθ)N(0,σB2),(10) where σA2=E[var{Y(a)UTβa(Z)|Z}/πa+var{Y(b)UTβb(Z)|Z}/πb]+E[{βa(Z)βb(Z)}T×var(U|Z){βa(Z)βb(Z)}]+var{E(Y(a)Y(b)|Z)},σB2=E[var{Y(a)UTβ(Z)|Z}/πa+var{Y(b)UTβ(Z)|Z}/πb]+var{E(Y(a)Y(b)|Z)},βa(z)={var(U|Z=z)}1cov(U,Y(a)|Z=z), a=1,,k, and β(z)=a=1kπaβa(z).

Several conclusions can be made from result (Equation10). First, result (Equation10) is model free and invariant with respect to covariate-adaptive randomisation schemes, as long as the minimal (D1)–(D2) hold.

Second, from the definitions of σS2 and σA2, it is shown in Ye et al. (Citation2020) that σS2σA2=E{πbβa(Z)+πaβb(Z)}T×var(UZ){πbβa(Z)+πaβb(Z)}T×{πaπb(πa+πb)}1+Eβa(Z)βb(Z)Tvar(UZ)βa(Z)βb(Z)βa(Z)βb(Z)T×{(πa+πb)11}and, hence, adjusting covariate U always gains efficiency, i.e., θˆA is asymptotically more efficient than θˆS, unless (11) πbβa(z)+πaβb(z)=0and{βa(z)βb(z)}(1πaπb)=0for everyz,(11) in which case θˆS and θˆA have the same asymptotic efficiency. When there are more than two treatments, 1πaπb>0 and, consequently, (Equation11) holds only when βa(z)=βb(z)=0 for every z, i.e., U is uncorrelated with the potential responses Y(a) and Y(b) after conditioning on Z so that adjusting for U is unnecessary. When there are only two treatments, (Equation11) also holds if πa=πb=1/2 and βa(z)=βb(z) for every z.

Third, from the definitions of σA2 and σB2, it can be shown (Ye et al., Citation2020) that σB2σA2=E{βa(Z)β(Z)}Tvar(UZ){βa(Z)β(Z)}πa1+E{βb(Z)β(Z)}Tvar(UZ){βb(Z)β(Z)}πb1Eβa(Z)βb(Z)Tvar(UZ)βa(Z)βb(Z).and, hence, θˆA is asymptotically more efficient than θˆB unless (12) β(z)=πbβa(z)+πaβb(z)πa+πband{βa(z)βb(z)}(1πaπb)=0for every z,(12) in which case θˆB and θˆA have the same asymptotic efficiency.

Note that βˆ(z) used in θˆB ignores the fact that cov(U,Y(a)Z=z) may depend on treatment a. That is why θˆB is asymptotically not as efficient as θˆA in general, and σB2=σA2 when these covariances are the same for every a and every z, i.e., β1(z)==βk(z) so that (Equation12) holds. If (Equation12) holds, θˆB may have better finite sample performance than θˆA, although two estimators are asymptotically equivalent. An exceptional case for σA2=σB2 is when k = 2 and π1=π2=1/2, in which we even do not need βa(z)=βb(z).

In general, θˆB may be asymptotically less efficient than θˆS, i.e., covariate adjustment with only the main effects may hurt efficiency, a perspective in Freedman (Citation2008) and Lin (Citation2013). For example, there are scenarios in which (Equation11) holds but (Equation12) does not.

Finally, inference about θ can be carried out based on (Equation10) and the availability of consistent estimators of σA2 and σB2. Some model free consistent variance estimators under any covariate-adaptive randomisation schemes are derived in Ye et al. (Citation2020), which are similar to σˆS2 in Section 5.2.

6.2. Can covariate-adaptive randomisation boost efficiency?

We now address Question (C) raised in Section 1 and the beginning of this section: Can a test (or an inference procedure) under covariate-adaptive randomisation be more efficient than it is under simple randomisation?

For the types of covariate-adaptive randomisation schemes described in Section 3, the answer is no, assuming that exactly the same test is used under simple randomisation or under covariate-adaptive randomisation without adjusting for conservativeness. This answer is based on the first-order asymptotic property. With a fixed n, the test or inference procedure under covariate-adaptive randomisation may perform slightly better due to the balancedness of treatment assignments.

In our previous discussions, a conventional procedure may be conservative under covariate-adaptive randomisation, and a valid procedure can often be constructed by modifying the conventional procedure. This modified procedure can be more efficient than the conventional procedure, but the comparison is not fair because the modified procedure makes some adjustment typically depending on the covariate Z.

Then, what is the advantage of applying covariate-adaptive randomisation? It is applied mainly for balancing treatment assignments across prognostic factors, which may be important for reviewing clinical results and other practical considerations.

There is a stream of developments and results in balancing discrete or continuous covariates and increasing estimation efficiency at the same time (Atkinson, Citation1982Citation1999Citation2002; Baldi Antognini & Zagoraiou, Citation2011; Rosenberger & Sverdlov, Citation2008; Senn et al., Citation2010). The approaches are typically model-based and the gains in efficiency may be from the second-order asymptotics.

Boosting efficiency can also be achieved by adjusting covariates under simple randomisation with less effort compared with applying covariate-adaptive randomisation, which is discussed next.

6.3. Designing versus modelling

Utilising covariate Z in randomisation can be viewed as a kind of designing for better quality of data, although this is not the same as what in the traditional experiment design, because in clinical trials we typically cannot control covariate values of patients. Adjusting for covariates, either model-based or model-assisted, fits into the general framework of modelling. In this section, we address the issue of designing versus modelling.

First, consider inference on the average treatment effect θ=E(Y(a)Y(b)). If Z is the only covariate, i.e., X=Z, then the conclusion is that designing and modelling (adjusting for covariate) can achieve the same efficiency asymptotically. In this case, θˆS=θˆA=θˆB and it has the same asymptotic normal distribution under simple randomisation and under any other covariate-adaptive randomisation satisfying (D1)–(D2). The stratification in (Equation7) serves the purpose of modelling under simple randomisation, but it is essential for obtaining easy inference under covariate-adaptive randomisation including minimisation.

Consider next the situation where XZ and the covariate U as discussed in Section 6.1 together with Z are available for modelling (the entire covariate X may still contain more information than that from U and Z). The conclusion is, modelling with Z and U achieves more efficiency than designing with Z only, and is the same as designing with Z plus an additional modelling with U (adjusting for U). This directly comes from result (Equation10). Under simple randomisation, θˆA in Section 6.1 is the estimator after modelling with Z and U, in view of the fact that Z is discrete so that stratification is the same as modelling with Z, and its limiting variance is σA2 in (Equation10). On the other hand, designing with Z only leads to the estimator θˆS in (Equation7), which has limiting variance σS2 in (Equation8) regardless of which covariate-adaptive randomisation is applied, and σA2σS2. Finally, designing with Z plus an additional modelling with U leads to estimator θˆA.

Similar conclusions can be obtained for testing in survival analysis as discussed in Section 4.4. Consider the situation of X=Z. Since Z is discrete, the Cox model given by (Equation5) is always correct. Modelling with Z produces the score test under simple randomisation, whereas designing with Z leads to the stratified log-rank test in (Equation6). It is shown in Ye and Shao (Citation2020) that the two tests have the same Pitman's asymptotic efficiency. If XZ and the Cox model (Equation5) with X is correct, then it is shown in Ye and Shao (Citation2020) that the score test under simple randomisation is more efficient than the stratified log-rank test based on designing and stratification with Z, in terms of Pitman's asymptotic efficiency. In this case, designing with Z plus an additional modelling leads to the score test. Unlike the case of inference on average treatment effect, however, in survival testing all results for the situation of XZ relies on the correctness of Cox model (Equation5). If model (Equation5) is wrong, then the score test can be less powerful than the unstratified log-rank test.

7. Further research work

We end this review with the following discussion of further research topics in this area.

  1. Although some estimation and inference procedures previously discussed have asymptotic distributions invariant to covariate-adaptive randomisation schemes, it may be still important to study and understand the Type 3 randomisation methods such as the minimisation whose properties are unclear at this stage. In particular, the asymptotic property of D(a)(z) defined in (D2). Efforts should be made to establish the joint asymptotic distribution of D(a)(z) with z being all levels of Z. A different direction is to develop more and better covariate-adaptive randomisation schemes. For example, in Section 5.3 we point out that a Type 1 randomisation scheme may produce more efficient inference procedures than a Type 2 or 3 randomisation scheme. Hu and Hu (Citation2012) modified Pocock and Simon's approach and proposed to use an imbalance measure that is a weighted sum of the overall imbalance, marginal imbalance, and strata imbalance. Some effort should be made to study the implementation of this scheme for practical uses.

  2. To utilise covariates, we considered the model-assisted generalised regression approach for the estimation of average treatment effect and score test under a working Cox model for testing hypotheses in survival analysis. It is interesting to develop other model-assisted approaches to gain efficiency without relying on models.

  3. From result (Equation8), if Z is another covariate such that the σ-field of Z contains the σ-field of Z, then the θˆA using Z in randomisation is asymptotically more efficient than the θˆA using Z in randomisation. That is, utilising more covariate information in randomisation can increase asymptotic efficiency. On the other hand, using a Z with too many levels may cause sparsity of data. Some guidance on this may be useful for practical users.

  4. The stratification in (Equation6) or (Equation7) uses all levels of Z as strata. In applications, it is possible that some strata contain very few number of patients or even no patient. Some methods of handling this scenario should be developed to produce asymptotically valid or at least conservative inference procedures, such as combining some strata with small sizes.

  5. The result and discussion on inference about quantile treatment effects are very limited. In survival analysis, due to the presence of censoring, the distribution function estimator in (Equation9) has to be replaced by the Kaplan–Meier product-limit type estimator. Furthermore, how to adjust for covariates has not been considered.

  6. The bootstrap, re-randomisation and permutation methods described in Section 4.3 are promising alternative tools to the approach of asymptotic distribution plus variance estimation for statistical inference. Two issues have to be addressed. The first one is that the re-randomisation and permutation methods are naturally developed for testing. Applying these tools for inference on parameters other than the average treatment effect requires further development. The other issue is what we discussed in the end of Section 4.4, i.e., the development of bootstrap or re-randomisation methods when censoring distributions conditioned on X can be different under the null hypothesis that the survival distributions conditioned on X are identical.

Acknowledgements

Our research was supported by the National Natural Science Foundation of China (11831008) and the U.S. National Science Foundation (DMS-1914411).

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

Our research was supported by the National Natural Science Foundation of China (11831008) and the U.S. National Science Foundation (DMS-1914411).

Notes on contributors

Jun Shao

Dr Jun Shao holds a PhD in statistics from the University of Wisconsin-Madison. He is a Professor of Statistics at the University of Wisconsin-Madison. His research interests include variable selection and inference with high dimensional data, sample surveys, and missing data problems.

References

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.