321
Views
0
CrossRef citations to date
0
Altmetric
Short Communications

Rejoinder on ‘Inference after covariate-adaptive randomization: aspects of methodology and theory’

Pages 196-199 | Received 11 Mar 2021, Published online: 24 Mar 2021

I would like to thank all discussants for their insightful discussions on the topic of statistical inference after covariate-adaptive randomisation, especially for including reviews of some new results and references that are not in my review written more than a year ago. I hope these discussions together with my review will stimulate further studies in this important area having many applications particularly in clinical trials.

My rejoinder focuses on some main points from four separate groups of discussants.

1. The discussion by Drs. Ma, Zhang and Hu

Drs. Ma, Zhang, and Hu's discussion brings some new results or interesting directions for new studies. These include, but are not limited to, robust inference with the randomisation scheme considered in Hu and Hu (Citation2012), inference after covariate-adaptive randomisation with covariate misclassification, unobserved covariates, missing data or non-compliance, and high dimensional covariates. Covariate-adaptive randomisation can also be combined with other adaptive designs such as sequential monitoring (Zhu & Hu, Citation2019), sample size re-estimation (CitationLi et al., Citationin press), and seamless phase II/III clinical trials (CitationMa et al.Citationin press).

As I discussed in Section 7 of my review, the theoretical study of covariate-adaptive randomisation schemes such as Pocock and Simon's minimisation is not completed and still important, although some asymptotically valid inference procedures after these randomisation schemes have been derived. Let Z be the discrete covariate used in covariate-adaptive randomisation, z1,,zc be all possible categories of Z, π1,,πk be the target assignment proportions in the trial, n be the total sample size of the trial, na(l) be the number of patients in treatment a with Z=zl, a=1,,k, l=1,,c, and n(l)=n1(l)++nk(l). A key result for studying the asymptotic validity of inference procedures after covariate-adaptive randomisation is

(R1)

  n(na(l)n(l)πa, a=1,,k,l=1,,c)|Z1,,ZnN(0,D)  in distribution,

i.e. conditioned on n observed values of Z, Z1,,Zn, the k×c dimensional vector whose (a,l)th component is na(l)n(l)πa converges in distribution to a multivariate normal with mean 0 and some covariance matrix D. Result (R1) holds for type 1 or 2 covariate-adaptive randomisation schemes as described in Section 3 of my review. For Pocock and Simon's minimisation, however, it has not been rigorously shown in general whether or not (R1) holds, although (Ma et al., Citation2015) showed (R1) under restrictive conditions (a correct linear model between the response and Z whose components are independent). Hu and Zhang (Citation2020) derived the asymptotic normality for na(l)n(l)πa,a=1,,k with a single fixed l, but (R1) requires joint asymptotic normality of the entire vector over all l=1,,c.

Without (R1), asymptotic validity of inference procedures for most covariate-adaptive randomisation schemes including Pocock and Simon's minimisation can still be established under some conditions, which will be further explained in Section 4.

To construct asymptotic valid inference procedures, sometimes the explicit form of the covariance matrix D in (R1) is required.

I am less excited about balancing continuous covariates with covariate-adaptive randomisation. The reason is that covariate-adaptive randomisation is mainly use to enhance the credibility of the results of the trial (EMA, Citation2015) in terms of the balancedness of treatments across levels of some common discrete baseline prognostic factors such as institution, disease stage, prior treatment, gender, and age group. In fact, balancedness of marginal levels of these discrete prognostic factors is the main concern of agencies such as the EMA or FDA, and that is why Pocock and Simon's minimisation is popular. I do not see a clear motivation of balancing a continuous baseline covariate at the design stage. If efficiency is the concern, we may simply adjust for this continuous covariate in the inference procedure. It is much easier to construct a valid and efficient inference procedure, compared with to derive a valid inference procedure after balancing a continuous baseline covariate. So far, valid inference procedures after balancing a continuous baseline covariate are mostly model-based.

2. The discussion by Drs. Wang, Susukida, Mojtabai, Amin-Esmaeili and Rosenblum

My review mostly focuses on differences of sample means (or quantiles) and tests in survival analysis. The discussion by Drs. Wang, Susukida, Mojtabai, Amin-Esmaeili, and Rosenblum and their article (Wang et al., Citation2020) open up a wide range of robust inference methods to handle nonlinearity, various outcome types, repeated measures, missing outcomes, etc. Wang et al. (Citation2020) also contain three examples of analyses of trial data, illustrating that the gain due to stratified permuted block randomisation and covariate adjustment could be as high as 36%.

In Section 5.3 of my review, empirical distribution estimators and related quantile estimators valid under covariate-adaptive randomisation are considered. Wang et al. (Citation2020) established the asymptotic normality of Kaplan–Meier estimator under stratified permuted block or biased coin randomisation, which is important for survival analysis. Specifically, they showed that, in a trial with two arms (k = 2), {n(Sˆn(a)S(a)),t[0,τ]}, where Sˆn(a) is the Kaplan–Meier estimator of the survival function S(a), a is a fixed treatment, and τ>0 is a fixed constant, converges weakly to a mean 0, tight Gaussian process with a covariance function V(a)(t,t) explicitly given in the Supplementary Material of Wang et al. (Citation2020). They also showed that V(a)(t,t)=V~(a)(t,t)U(t)π1π2, where V~(a)(t,t) is the covariance function under simple randomisation and U(t)>0 under stratified permuted block or biased coin randomisation. Again, we see the common phenomenon of reducing variance by applying covariate-adaptive randomisation compared with simple randomisation.

As commented by Wang et al. (Citation2020), the result can be extended to the estimation of survival function with adjusted baseline covariates. Alternatively, one may consider a stratified version of Kaplan–Meier estimator along with the idea in formula (9) of my review.

3. The discussion by Dr. Liu

Dr. Liu's discussion provides useful details and references about asymptotic validity and efficiency of model-assisted inference procedures after covariate-adaptive randomisation and adjustment for covariates. The discussion about the efficiency gain in using ANOVA versus ANCOVA or ANCOVA with treatment-by-covariate interactions started as early as Yang and Tsiatis (Citation2001), continued later by Freedman (Citation2008), Lin (Citation2013) and Wang et al. (Citation2019), and studied under covariate-adaptive randomisation recently by Bugni et al. (Citation2018), Bugni et al. (Citation2019), Liu and Yang (Citation2020), Ma et al. (Citation2020b), Wang et al. (Citation2020) and CitationYe et al. (Citationin press).

I would like to emphasise two points here. The first one is, as pointed out by Dr. Liu, when there are only two treatment arms and equal allocation is used (k = 2 and π1=π2=1/2), the use of ANCOVA with or without treatment-by-covariate interaction has the same asymptotic efficiency and is guaranteed to be more efficient than the use of ANOVA without adjusting for covariates. However, this phenomenon no longer exists once there are more than two treatment arms even if equal allocation is applied.

The second point is that, asymptotically, the most efficient estimator of the treatment difference θ=E(Y(a)Y(b)) defined in the beginning of Section 5.2 of my review is the θˆA defined in Section 6.1 of my review, which adjusts for covariates by using a working linear model and the ordinary least squares estimator βˆa(z)=iLa(z){UiU¯a(z)}{UiU¯a(z)}T1×iLa(z){UiU¯a(z)}Yi of covariate effect within each stratum La(z) under treatment a and Z=z. As Dr. Liu pointed out, however, when there are small strata formed by levels of Z, the stratum-specific least squares estimator βˆa(z) might lead to inferior performance due to over-fitting (Liu & Yang, Citation2020). One modification is to combine βˆa(z) and βˆb(z) within each stratum level z, although they may estimate different quantities. Alternatively, utilising the fact that baseline covariates Ui's have the same distribution over all treatment arms, CitationYe et al. (Citationin press) recommended to replace the matrix inverse in βˆa(z) by an average over all treatment arms to remedy the issue of small strata, which leads to replace βˆa(z) by β~a(z)=1n(z)a=1kiLa(z){UiU¯a(z)}{UiU¯a(z)}T1×1na(z)iLa(z){UiU¯a(z)}Yi, where na(z) is the number of units in La(z) and n(z)=n1(z)++nk(z). Note that stability issues related with dimensionality for not very large data sets are mainly in the inverses of estimated covariance matrices. Hence, using the inverse of an average may largely remedy the issue of small strata. Some simulation results in CitationYe et al. (Citationin press) show that using β~a(z) in the estimation of θ leads to better finite-sample performance compared with using βˆa(z) or combining βˆa(z) and βˆb(z) when they actually estimate different quantities.

Finally, another way to handle many covariates is to apply high-dimensional technique as Dr. Liu commented (Ma et al., Citation2020a), or to use variable selection.

4. The discussion by Drs. Ye and Yi

In their discussion, Drs. Ye and Yi clearly described the working models behind estimators θˆS, θˆA and θˆB in Sections 5.2 and 6.1 of my review. This not only provides explanations about the asymptotic relative efficiencies among θˆS, θˆA and θˆB, but also leads to a general working model (formula (1) in the discussion) that produces a class of model-assisted estimators of θ (formula (2) in the discussion) including θˆS, θˆA and θˆB as special cases.

In the beginning of Section 6.1 of my review, X is considered to be the vector of all available baseline covariates, Z is the discrete baseline covariate vector (part of X) used in covariate-adaptive randomisation, and U is the vector of covariates not in Z but in X to be adjusted for efficiency in the analysis stage. I would like to point out that U may contain some components which are interactions between Z and covariates not in Z. Drs. Ye and Yi's discussion classifies (Z,U) into two categories or vectors, W and V, where W contains covariates having treatment-by-covariate interaction in working model (1) in their discussion and V has no treatment-by-covariate interaction. Note that either W or V could be empty. For example, for ANOVA without using any covariate, both W and V are empty; for classical ANCOVA without considering any treatment-by-covariate interaction, W is empty but V is not; as discussed by Drs. Ye and Yi, θˆS in Section 5.2 of my review corresponds to W=Z and empty V, θˆA in Section 6.1 corresponds to W=(Z,U) and empty V, and θˆB in Section 6.1 corresponds to W=Z and V=U.

In applications, a crucial question is, what is the minimum set of covariates to be included in W or V to ensure that the resulting model-assisted estimator of θ is asymptotically normal with mean θ and variance invariant to the covariate-adaptive randomisation schemes (including Pocock and Simon's minimisation)? As pointed out by Drs. Ye and Yi, a simple answer is that W should contain the dummy variables for all joint levels of Z, and there is no requirement on V. In fact, V is used to not let the dimension of W become too high. Asymptotically, the estimator with V being empty is most efficient, unless some components of W are actually not related with the response. We must balance between adjusting for covariates and over-fitting, for which variable selection may be a useful solution.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Jun Shao

Dr. Jun Shao holds a PhD in statistics from the University of Wisconsin-Madison. He is a Professor of Statistics at the University of Wisconsin-Madison. His research interests include variable selection and inference with high dimensional data, sample surveys, and missing data problems.

References

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.