201
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Adaptive two-stage seamless sequential design for clinical trials

&
Received 16 Apr 2023, Accepted 09 Apr 2024, Published online: 05 May 2024

ABSTRACT

We propose an adaptive sequential testing procedure for the selection and testing of multiple treatment options, such as dose/regimen, different drugs, sub-populations, endpoints, or a mixture of them in a seamlessly combined phase II/III trial. The selection is to be made at the end of phase 2 stage. Unlike in many of the published literature, the selection rule is not required to be to “select the best”, and does not need to be pre-specified, which provides flexibility and allows the trial investigators to use any efficacy and safety information/criteria, or surrogate or intermediate endpoint to make the selection. Sample size and power calculations are provided. The calculations have been confirmed to be accurate by simulations. Interim analysis can be performed after the selection, sample size can be modified if the observed efficacy deviates from the assumed. Inference after the trial, including p-value, median unbiased point estimate and confidence intervals, are provided. By applying a dominance theorem, the procedure can be applied to normal, binary, Poisson, negative binomial distributed endpoints and time-to-event endpoints, and a mixture of these distributions (in trials involving endpoint selection).

1. Introduction

Drug development is a lengthy and expensive process. It usually involves phase 2 and phase 3 trials. A phase 2 trial is generally of exploratory nature, in which a number of treatment options (such as treatment [e.g., dose/regimen], patient population, endpoints) can be investigated. One of the options with the most potential to be successful will be selected to proceed to a phase 3 trial. Conducting a clinical trial requires a sequence of activities including protocol development, seeking regulatory approvals, Institutional Review Board (IRB) approvals, budgeting, budget negotiation with contract service providers (commonly referred as vendors), site negotiation and initiation, and enrolment ramp up (Due to the lengthy process of site initiations, enrollment is typically very slow at the beginning of the trial). Each of the activities can be time and resource consuming. In addition, when phase 2 and 3 trials are conducted separately, there is usually a time gap of at least several months between the completion of the phase 2 trial and the planning and starting of the phase 3 trial. Combining phase 2 and phase 3 into one trial can reduce two sequences of activities into one and eliminate the time gap between the phase 2 and phase 3 trials, and the drug development process can be shortened and more efficient. Trial designs that combine phase 2 and phase 3 trials are commonly referred to as seamless phase II/III designs.

In a seamless design, at the end of phase 2 stage, an experiment treatment option must be chosen to proceed to phase 3 stage to be compared with the control. The main statistical challenges in such trials include: i) Family-wise error (FWE) control due to the selection procedure at the end of phase 2 stage. This selection is a multiple comparison procedure in nature and proper procedure should be included to control FWE at intended level. An optimal control is such that the FWE never exceeds the nominal error level but can be equal to the nominal level under certain conditions. A conservative control is such that the FWE never reaches the nominal level under any situation. The Bonferroni method is an example of conservative control. Conservative control will lead to reduced power. Hence, methods that achieve optimal or less conservative control are more preferable; ii) Because the trial includes phase 2 and phase 3 stages, it is desirable to combine the data from the two stages and analyze the data in an efficient way. But how to combine the data has been a main challenge since the data from phase 2 is not normally distributed because of the selection; iii) Sample size and power calculation (as always, such calculations involve assumptions on the effect size) are important to trial planning. The multiplicity issue and complicated methods for combining phase 2 and 3 data can complicate sample size and power calculations. iv) The assumptions for sample size calculation could be inaccurate (due to uncertainty about effect sizes). Thus, adaptive measures, such as sample size re-estimation (SSR), may be desirable to achieve adequate power. To obtain a proper new sample size, a method for calculating conditional power will be necessary; v) Inference must be made after the completion of the trial. Hence, p-values, estimates of the effect sizes, confidence intervals are desirable. The multiplicity issue from phase 2, also leads to challenges to finding unbiased estimates or exact inference. vi) When adaptations are made to the trial, inferences will be more complicated, but their relevance and importance is not reduced; vii) The usual group sequential design (GSD) testing comparing one new treatment arm and a control relies critically on the recursive algorithm of Armitage et al. (Citation1969). Computations with multiple comparison adjustment and testing in a sequential procedure will be more complicated than a usual group sequential. In this article, we propose a procedure that addresses all of the above-mentioned statistical issues.

2. Background and features of proposed method

2.1. Background

Stallard and Todd (Citation2010) summarized the methods into three categories: the group-sequential approach proposed by Stallard and Todd (Citation2003), the combination test approach proposed by Bauer and Kieser (Citation1999) (also Bretz et al. Citation2006), and the adaptive Dunnett method of Koenig et al. (Citation2008). Later, several other methods were published that uses partition (e.g., Sugitani et al. (Citation2013)), or the graph-based testing procedure (Glimm et al. Citation2018; Klinglmueller et al. Citation2014). Most methods require that the treatment option with the best observed outcome be selected (i.e., play-the-winner) at the end of phase 2 stage. There have been no methods for sample size and power calculations.

The published methods use various means to control the FWE, such as the Bonferroni method, or the closed testing procedure (Marcus R, Eric P, Gabriel. Citation1976) (e.g., Stallard and Todd Citation2003; Bauer and Kieser Citation1999; Bretz et al. Citation2006; Glimm et al. Citation2018); or the Dunnett method testing procedure (Koenig et al. Citation2008), or the partition method (Sugitani, Hamasaki, and Hamada Citation2013); or the graph-based testing procedure (Klinglmueller et al. Citation2014).

More recently, Chen et al. (Citation2018) proposed a 2-in-1 adaptive phase 2/3 design for oncology trials. The focus of the design is to allow a decision to expand an ongoing Phase 2 trial into a Phase 3 trial, with fixed sample sizes for both the phase 2 and 3 trials. The phase 2 trial in this design does not involve multiple comparisons, and thus the design does not involve multiplicity issue due to selection, it also does not involve sample size re-estimation for the phase 3 trial. This design was later expanded in two directions: i) including dose selection (e.g., Jin and Zhang Citation2022; Zhang et al. Citation2022, Citation2023) in phase 2 and without sample size re-estimation for phase 3. The type I error (or FWE) control due to selection is achieved either by partitioning the α (Zhang et al. Citation2022, Citation2023), or by a graphical method (Jin and Zhang Citation2022). ii) including sample size re-estimation for phase 3 but without dose selection for phase 2 (Li et al. Citation0000). The type I error (or FWE) control with sample size re-estimation is achieved by the selection of a Cmin.

2.2. Features of proposed method and notations

The majority of the methods in the literature are based on the Wald statistics. Our method is based on the score function (e.g., Jennison and Turnbull Citation1997). Unlike other methods that uses the score function (e.g. Stallard and Todd Citation2003), in our method the score function from phase 2 is directly added to the score function from phase 3 to form a combined score function. The combined score function is not normally distributed (because the score function from phase 2 is not normally distributed due to the selection process), but is a one-dimensional Markov process, and its mathematical properties can be precisely described.

Like Stallard and Todd (Citation2003), we discuss a seamless design in which only one treatment option will be selected to continue into the phase 3 stage to compare with the control. A more general case in which several treatment options may continue into the phase 3 stage (e.g., Stallard and Friede Citation2008) will be discussed elsewhere in the context of adaptive sequential designs for multiple comparisons.

The control of the FWE is through a dominance theorem (section 3.3) on the Markov process. By utilizing the dominance theorem, our method does not require play-the-winner; hence, the selection does not need to depend solely on the primary outcome, but all information on efficacy and safety can be used for selection. The selection rule does not need to be pre-specified. Any emerging factors that were not foreseen at trial planning can be included in the selection. With such flexibility, it is possible to select options with the best combination of efficacy (not necessarily the best among all options) and safety profile. The selection can also be based on surrogate or intermediate endpoints. FWE control is conservative if the selection does not follow play-the-winner, but is precise if play -the-winner is followed and the endpoint is normally distributed. In this sense, the FWE control is optimal.

Our method includes a conservative point estimate if the null hypothesis is rejected at the end of phase 2. If the trial entered the phase 3 stage, then a median unbiased point estimate, an exact upper confidence limit, are available. We also include a conservative lower confidence limit that is consistent with the hypothesis test in the sense that the lower limit excludes zero if and only if the null hypothesis is rejected. In addition, our method includes sample size and power calculations that match exactly with simulations, confirming the accuracy of these calculations. Finally, our method includes sample size re-estimation in the phase 3 stage to increase power if the observed effect size at the interim analyses is smaller than the assumed one.

2.3. Comparison with methods in the literature

Our method is based on the Markov process properties of the combined score function. No method in literature utilizes the properties of Markov process. However, the presentation in this article does not involve Markov process theory, and our presentation does not require the readers to be familiar with Markov process theory. The dominance theorem (section 3.3.1), together with Slepian’s lemma (Huffer Citation1986; Slepian Citation1962), is fundamental to our approach. Its application assumes that the correlation coefficients between the score function statistics at the end of the phase 2 stage are non-negative (see section 3.3.2 for the discussion of this assumption). This assumption is usually satisfied in practice. For example, in trials involving selection among different doses of the same drug, the endpoints for all the doses are the same, and they are expected to trend in the same direction, hence the correlations are expected to be non-negative. In trials involving comparisons in different sub-populations, the endpoints for all the sub-populations are the same, and they are expected to trend in the same direction, with non-negative correlations. Multiple efficacy endpoints are usually assessed in any clinical trial. Suppose that all endpoints are standardized such that positive mean value indicates better efficacy. Then in most cases, it’s unlikely (although not entirely impossible) that an experiment drug will have positive efficacy in one endpoint but negative with another, hence, the assumption of non-negative correlation is reasonable in trials involving endpoint selection as well. However, if there is a possibility that some of the endpoints are negatively correlated, our method may not be appropriate.

A major difference between our method and those in the literature is how the data from the phase 2 and 3 stages are combined. All published methods (e.g., see Stallard and Todd Citation2010) use a multiplicity adjustment procedure such as the Bonferroni method, or the closed testing procedure, or the Dunnett’s procedure, or other multiple testing procedure (e.g., Bauer et al. Citation2015; Klinglmueller et al. Citation2014; Sugitani et al. Citation2013) on the phase 2 data and obtain a p-value. Then the p-value (usually the smallest) is converted into a normally distributed variable through the inverse normal transformation. Using the smallest p-value corresponds to the play-the-winner rule. This normalization conversion has been a necessary step in all published methods. With this conversion, data from both phases 2 and 3 are normally distributed and can be added together. However, the precise description of phase 2 data is lost in the inverse normal conversion, and it’s not possible to use the combined data for any purpose of parameter estimation (i.e., either point estimate or confidence limits). This conversion is also conservative since basically all multiplicity adjustment methods control the FWE conservatively. Our method preserves the precise description of the phase 2 data, since the score function from phase 2 data is directly added (without any conservative data conversion) to the score function from phase 3 data to form a Markov process. FWE control with our method is optimal (i.e., it is conservative in general, but is precise under the play-the-winner rule and when the endpoint is normally distributed). The combined data (as a Markov process) can be used for estimating the lower confidence limit of the efficacy for the selected option. Under the play-the-winner rule, all calculations on sample size, power, critical boundaries and selection probabilities are accurate for normally distributed endpoints, and can be confirmed by simulations (e.g., simulations in confirmed calculated results in ). Further, group sequential design and adaptive sequential design can also be described using the score function, based on the Markov process properties. With the commonly shared Markov process properties, we incorporated results from Gao et al. (Citation2008) to propose sample size re-estimation with exact type I error control, and also incorporated results from Gao et al. (Citation2013) to propose optimally exact point estimate and confidence interval (i.e., the point estimate is median unbiased if the trial proceeds into phase 3, and conservative if the trial stops at the end of phase 2; We derive exact upper confidence interval limit as well as a lower confidence interval limit that is consistent with hypothesis testing).

The dominance theorem provides rigorous theoretical justification for flexible selection rule. In many scenarios (e.g., when the selection is based on the primary endpoint, and there is no safety concern), it is most logical to select the option with the best efficacy, or play the winner. However, there may be situations in which the winner is not necessarily the most desirable. For example, in a trial with dose selection, the dose with the smallest p-value may be the highest dose, which may be related to higher rates of adverse events. A lower dose with acceptable effect size and lower rates of adverse events (better safety profile) may be more desirable. Another example is a trial in which a “surrogate or intermediate endpoint” (U.S. Citation2019) is used for the selection. Due to randomness, the treatment option with the best observed surrogate or intermediate endpoint may not be the same option with the best observed primary outcome endpoint, which precludes the use of play-the-winner rule. Hence, a method that does not require “play the winner” can provide some flexibility.

By using the dominance theorem, the seamless sequential design is completely parallel to the traditional group sequential/adaptive sequential design that compares a test treatment with a control, with critical boundaries, sample size and power calculation, interim analysis in exactly the same way as in Gao, Ware, Mehta (Citation2008), and final analysis with point estimate and two-sided confidence interval obtained similarly to Gao, Liu, Mehta (Citation2013). If the strategy of play-the-winner is used and the endpoint is normally distributed, then the critical boundaries control the type I error (FWE) exactly (in the strong sense), and the sample size and power calculations are exact. Otherwise, the FWE control is slightly conservative, per the dominance theorem and Slepian’s lemma (Huffer Citation1986; Slepian Citation1962).

We note that the correlation coefficients in a multi-variate normal distribution were critical in both Chen et al. (Citation2018) and our approach: Chen et al. (Citation2018) used the correlation coefficients in a bivariate normal distribution to control type I error for expanding a phase 2 trial into a phase 3 trial, and Slepian’s lemma (for multi-variate normal distribution) was critical to our dominance theorem.

Due to the application of the dominance theorem and Slepian’s lemma (Huffer Citation1986; Slepian Citation1962), there is an important difference between Stallard and Todd (Citation2003) and our method in that we don’t make any assumption on the covariance matrix while Stallard and Todd (Citation2003) assumes that the off- diagonal coefficients of the covariance matrix are ½ which holds only when the endpoint is normally distributed. Thus, our procedure can be applied to more general cases, such as when the endpoint is a non-normally distributed variable (e.g., survival analysis), and for different therapies in which the covariance coefficient may be unknown.

The proposed is broad in the sense that it can be applied to any designs that have been discussed in the literature, including those designs for oncology trials, such as those by (Chen et al. Citation2018; Jin and Zhang Citation2022; Zhang et al. Citation2022, Citation2023, Li et al. Citation2022).

We refer to our method as the two-stage seamless sequential design/adaptive two-stage seamless sequential design, TSSSD/ATSSSD. A seamless phase II/III design is an adaptive design. The FDA guidance on adaptive designs (2019) summarized the major potential advantages of adaptive designs: a) Statistical efficiency; b) Ethical considerations; c) Improved understanding of drug effects; d) Acceptability to stakeholders. In the same guidance, the FDA (2019) recommended that three statistical principles should be satisfied: a) Controlling the Chance of Erroneous Conclusions (type I error control); b) Estimating Treatment Effects (valid inference, including point estimate, two-sided confidence interval, and p-value); c) Trial Planning (including sample size/power calculation); (The FDA guidance (2019) also included a fourth principle, which is operational: d) Maintaining Trial Conduct and Integrity). Our method, the TSSSD/TSSASD, satisfies all the three statistical principles in the FDA guidance (2019). The TSSSD/ATSSSD is supported by the DACT (Design and Analysis for Clinical trials) software at https://www.innovatiostat.com/software.html (The software is free for academic researchers and research institutions). Computation codes are available upon request.

3. Method

3.1. Score function and GSD

To demonstrate the similarities between GSD and TSSSD, we first describe the GSD. In a GSD, one experimental treatment is compared with a control. Let θ be the parameter of interest, larger θ indicates better efficacy. Let the null hypothesis be H0: θ=0, and the alternative hypothesis Ha: θ>0. The GSD (O’Brien and Fleming Citation1979; Pocock Citation1977) was based on the Wald statistics. Interim analyses are planned, to be performed at information time points t1<<tK=T. Critical boundaries c1,,cK are chosen such that H0 is rejected in favor of Ha if the event i=1KZici is observed (Zi=Zti is the Wald statistics at ti). Type I error is controlled by requiring Pi=1KZiciα/2. Let θˆ be an unbiased estimate, and s.e.θˆthe standard error. The estimated Fisher’s information time is t=s.e.θˆ2. Then, Zt=θˆt is the Wald statistics, and the score function is St=θˆt. Approximately, StWt=Bt+θt   Nθt,t (Jennison and Turnbull Citation1997). Let ei=citi and name the ei ‘s as the “exit” boundaries. The rules of a GSD can be equivalently stated as Pi=1KStieiα/2.

3.2. The TSSSD

3.2.1. Data description

The score function in a trial that compares one experimental treatment with a control arm is approximately a Brownian motion (Jennison and Turnbull Citation1997). In a seamless design that compares several treatment options with a control, there are several score functions in the phase 2 stage, each corresponding to the comparison of one experimental treatment option with the control. Take for example a trial that tests multiple doses in phase 2 stage, other situations can be dealt with similarly. Suppose that there are M dose groups and a control arm in the phase 2 stage. Let θm (m = 1,…, M) be the efficacy for each dose being considered, larger θ indicates better efficacy. Let θ=θ1,,θM. Let the related hypotheses for each dose comparison be: null hypothesis H0,m: θm=0, and one-sided alternative hypothesis Ha,m: θm>0. At the end-of-phase 2, one dose (e.g., the n-the dose) is selected to proceed to phase 3. The overall null hypothesis of the trial is the null hypothesis about the selected dose, that H0,n: θn=0, and one-sided alternative hypothesis Ha,n: θn>0. Each of the M parameters θ1,,θM, is associated with an estimate θˆm0 and the standard error s.e.θˆm0, from the phase 2 data, m=1,,M. There are M Wald statistic Zm0=θˆm0/s.e.θˆm0, and M score functions Sm0=θˆm0/s.e.θˆm02. The superscript (0) indicates that the statistics uses only data from the phase 2 stage. Statistics using only data from the phase 3 stage will be denoted with a superscript (1), and statistics using both the phase 2 and 3 data will be denoted by the superscript (0,1). Let tˆm0=s.e.θˆm02 be the estimator for tm0, the information time for the m -th comparison. Suppose that the n-th option is selected. Let Sn0,1=Sn0+Sn1 be the combined score function of the selected treatment option vs. the control, where Sn0 is from the phase 2 stage, and Sn1 is from the phase 3 stage. Sn1 is independent from all the phase 2 data and is approximately a Brownian motion. Sn0 is normally distributed in isolation, but not normally distributed as the chosen one from S10, … , SM0 due to the selection procedure (which is a multiple comparison procedure). However, Sn0,1, as the sum of a random variable Sn0 and a Brownian motion Sn1, is a Markov process. We note that a Brownian motion is a special Markov process, but a Markov process is not necessarily a Brownian motion in general.

A fundamental property of Brownian motion is that it has independent increments. The increments of Sn0,1 are the same as those of Sn1(which is approximately a Brownian motion), and thus are independent. This property is essential to our method. Because our method is based on the score function and its Brownian motion properties, we refer this as the Brownian motion approximation (BMA) approach. The BMA approach is motivated by the thinking in the theory of Markov processes, but such thinking has been modified and expressed only in terms of multivariate normal distributions and only uses calculus and four basic (widely known) properties of Brownian motion: i) It is additive: Bt+Bu=Bt+u; ii) It has independent increment; iii) covBti,Btj=minti,tj; iv) For 0<t1<..<tK, Bt1,,BtK has a joint multivariate normal distribution (e.g., Jennison and Turnbull Citation2000). No prior knowledge of Markov processes is required to follow our method.

3.2.2. Properties of the combined score function

Let the score function Sm0=Sm0θˆm0Wm,tm00=Wm0tm0=Bm,tm00+θmtm0Nθmtm0,tm0. Let W0=W1,t100,,WM,tM00 be the vector of score functions at the end of phase 2, and t0=t10,,tM0. Let t0θ=θ1t10,,θMtM0. The joint density function for W0=W1,t100,,WM,tM00 is uθ,x,Σ,t0=2πM2detΣ12expxt0θΣ1xt0θ2, where Σ =Σt0=σij, i,j=1,,, with σij=covWi,ti00,Wj,tj00. The derivation of the covariance matrix for dose selection, sub-population enrichment has been discussed in Gao et al. (Citation2014). For simplicity of notations, we assume that t10==tM0=t0. Unequal tm0 ‘s can be dealt with similarly. At the end of phase 2, an option (e.g., with corresponding Wn,tn00, which is a component of W0) is selected to proceed to phase 3. As a result of the selection, the data from phase 2 is X0=Wn,tn00. This n is dependent on the selection, hence a random number, and X0 does not have the same distribution as Wm,tm00, for any fixed m=1,,M, and X0 is not normally distributed. For example, under the play-the-winner rule, X0=maxW1,t100,,WM,tM00, n is such that Wn,tn00=maxW1,t100,,WM,tM00 and the distribution of X0 is PX0x=Pm=1MW1,tm00x=xxuθ,x,Σ,t0dx1dxM. The distribution can be precisely calculated if Σ is known (e.g., if phase 2 stage involves dose selection with a common control, then the off-diagonal elements = 0.5, and the diagonal elements = 1). The cases in which Σ are unknown are discussed in section 3.3. On the other hand, the score function using only the phase 3 data for the selected option (say the n –th, denoted as Wnt1=Bt1+θnt1), is still approximately a Brownian motion. The cumulative data from phase 2 and 3 will be represented as Xt1=X0+Wnt1=Wn,tn00+Wnt1=Wntn0+t1. Accordingly, Xt=X0+Wt, or Xt=X0+Wt=X0+Bt+θt (where Wt=Wnt1 if the n –the option is selected), is the sum of a random variable X0 and a Brownian motion with a (random) drift θn. To simplify the notation, we remove the subscript of the selected option.

3.2.3. Hypothesis test and type I error control

In a TSSSD, interim analyses may be performed on Xt at information time points 0<t0<t10,1<<tK0,1, where ti0,1=t0+ti1, and ti1 is the information time in the phase 3 stage. Let r0=t0/tK0,1, ri0,1=ti0,1/tK0,1, ri1=ti1/tK0,1, and si1=ti1/tK1. The null hypothesis H0 is rejected if X0e00i=1MXti0,1ei0,1 is observed. In a TSSSD, sample size and power calculation, inferences such as p-value, confidence interval and estimates all require the calculation of probabilities of the form of PX0e00i=1I=1Xti0,1ei0,1XtI0,1xI0,1. The details of the calculation are provided in the online supporting material. The exit boundariese00, e10,1,,eK0,1, or the critical boundaries c00=e00/t0, ci0,1=ei0,1/ti0,1, i = 1, … ,K, can be chosen to satisfyPX0e00i=1KXti0,1ei0,1θ=0α/2. provides a flowchart of the design and illustrates the critical boundaries.

Figure 1. Trial design flowchart.

Figure 1. Trial design flowchart.

Figure 2. Cumulative data Xt1 (using play the winner rule) and critical boundaries.

Figure 2. Cumulative data Xt1 (using play the winner rule) and critical boundaries.

In , Sm0 is the score function for option m, and X0=Sn0=maxSm0,m=1,,M (i.e., X0 is obtained by play-the-winner rule).

3.2.4. O’Brien-Fleming boundary

The O’Brien-Fleming boundaries satisfy e00==eK0,1, which is equivalent to c00r0=c10,1r10,1=cK0,1rK0,1). Let fe=PX0ei=1KXti0,1e. Then fe is a decreasing function of e. Let e0.7exα/α20.5mu0.7ex2=f1α/2. Then c0=e0.7exαα20.5mu0.7ex2/r0, ci=e0.7exα/α20.5mu0.7ex2/ri0,1], i = 1, …, K are the desired O’Brien-Fleming boundaries (O’Brien and Fleming Citation1979). The details of calculation are provided in the online supporting material.

3.2.5. α-spending functions

Let A0=X0e00, Ai=(X0<e00)i=0K(Xtj0,1<ej0,1)Xti0,1ei0,1. Ai is the event that the critical boundaries were not crossed at any interim analysis before ti0,1, and was crossed at ti0,1. Note that the events Ai are mutually exclusive, or that AiAj= for ij. Hence, Pi=0KAi=i=0KPAi. Let PAi=αi. Then αi is the type I error “spent” at ti0,1. The only requirement on the αi’s is that i=0Kαi=α/2. The ci’s can be successively chosen to satisfy PAi=αi. The details of calculation are provided in the online supporting material.

3.2.6. Power calculation and sample size determination

Suppose that the “information fractions”, ri0,1, and the critical boundaries c00, c10,1,,cK0,1, i = 1,…,K have been determined. Let the Fisher’s information time at the final analysis be denoted as T=tK0,1. Under the alternative hypothesis, θ=θ1,,θM0. The power is calculated as [$Pθ,T=PX0e00 i=1KXti0,1ei0,1θ. (The details for calculating Pθ,T are in the online supporting material). The power of a trial depends on the selection rule and the configuration θ. The desired power of the trial can be obtained by choosing an information time T that is large enough such that Pθ,T1β. Sample size nK can then be determined by the relationship that nKaT for some distribution specific constant a (e.g., Whitehead Citation1997)

3.3. General situations

The above discussed calculations of type I error, exit/critical boundaries, sample size/power assume that the covariance matrix is known, which is possible only in the special situation of dose selection with a common control, and the variable is normally distributed. In addition, the selection rule is strictly play- the-winner. We refer all other situations as the general situation, such as treatment selection with non-normally distributed variable (e.g., time-to-event), population selection, endpoint selection, or mixed selections (FDA guidance, adaptive designs, 2019). Further, the selection rule may not necessarily be play- the-winner as in the case when “surrogate or intermediate endpoint” (FDA. 2019) are used for the selection (e.g., when the primary endpoint requires long term follow-up, biomarkers and other surrogate endpoints may be used. Such situations are common in rare disease trials), or when the safety profile of the option with the best efficacy is not satisfactory, and the option with the next best efficacy needs to be selected. The FDA guidance (2019) noted that “It may be particularly difficult to estimate Type I error probability and other operating characteristics for designs that incorporate multiple adaptive features.” The FDA guidance (2019) hypothesized that, for the general situation, “it can be argued that assuming independence among multiple endpoints will provide an upper bound on the Type I error probability.” We provide a mathematically rigorous proof of this hypothesis by using the dominance theorem and Slepian’s lemma (Huffer Citation1986; Slepian Citation1962). We provide upper bound of type I error that does not exceed the desired level, conservative lower confidence limit and upper bound of p-value. These conservative statistics satisfy the three statistical principles in the FDA guidance (2019). These calculations are confirmed through simulations shown in later sections. Our approach also mathematically, and rigorously, confirms the hypothesis about the assumption of independence in the FDA guidance (2019).

3.3.1. The “Dominance” Theorem and Slepian’s lemma

As discussed in section 3.2.1, the combined score function has the form of Xt=X0+Wt=X0+Bt+θt. The dominance theorem establishes that the probabilities related to Xt are largely determined by the distribution of X0.

Definition:

Let X0 and Y0 be random variables, and that PY0xPX0x for all x. Then Y0 is said to dominate X0.

Theorem

(The Dominance Theorem): Suppose that Y0 dominates X0. Let Xt=Xt=X0+Bt+θt and Yt=Yt=Y0+B˜t+θt, where Bt and B˜t are both Brownian motions. Let t1<<tK be a sequence of time points and u0,u1,,uK be some exit boundaries. Then

PY0u0i=1KYtiuiPX0u0i=1KXtiui.

Lemma 1.

Let Xt=Xt=X0+Bt+θt and Yt=Yt=Y0+B˜t+θt, where Bt and B˜t are both Brownian motions, and Y0X0. Let t1<<tK be a sequence of time points and e0,e1,,eK be some exit boundaries. Then

PY0e0Ki=1YtieiPX0e0Ki=1Xtiei.

Proof: Let Xt=Xt=X0+Bt+θt and Y˜t=Y˜t=Y0+Bt+θt, where B(t) is a Brownian motion. Then Y˜t Xt. For any e0,e1,,eK, Y˜0e0X0e0, Y˜tieiXtiei, and Y˜0e0i=1KY˜tieiX0e0i=1KXtiei. Hence

PY˜0e0i=1KY˜tieiPX0e0i=1KXtiei

Since Y˜(t) and Y(t) have the same distribution,

PY0e0i=1KYtiei=PY˜0e0i=1KY˜tiei. Hence,

PY0e0i=1KYtieiPX0e0i=1KXtiei

The lemma is proved. ■

Proof of The Dominance Theorem: Let Fx=PY0<x, and Gx=PX0<x. Since PY0xPX0x, then F(x) ≤ G(x). And F1yG1y. Let 0 ≤ U ≤1 be a random variable with a uniform distribution. Let Y˜0=F1U, and X˜0=G1U. Then Y˜0X˜0 and PY˜0x=PF1Ux=PUFx=1Fx=PY0x,andPX˜0x=P(G1U x)=PUGx=1Gx=PX0x.

Let B(t) be a Brownian motion, X˜(t) =  X˜0 + B(t) + θt, and Y˜(t) =  Y˜0+ B(t) + θt. Then by Lemma 1,

PY˜0e0i=1KY˜tieiPX˜0e0i=1KX˜tiei.

Since X˜(t) and X(t) have the same distribution,

PX0e0i=1KXtiei=PX˜0e0i=1KX˜tiei.

Since Y˜(t) and Y(t) have the same distribution,

PY0e0i=1KYtieiPY˜0e0i=1KY˜tiei.

Therefore,

PY0e0i=1KYtieiPX0e0i=1KXtiei

The theorem is proved. ■

Slepian’s lemma (Huffer Citation1986; Slepian Citation1962): Suppose that X=X1,,XnRn and Y=Y1,,YnRn are Gaussian vectors and that EXi=EYi=0, EXi2=EYi2, i = 1, … ,n, and EYiYjEXiXj for i ≠ j. Then the following inequality holds for all real numbers u1,,un: Pi=1nYi>uiPi=1nXi>ui.

3.3.2. Conservative exit/critical boundary selection

Suppose that it is known that PY0xPX0x for all x, and that the distribution of Y0 is known, but the distribution of X0 is not. Per the dominance theorem, Yt=Y0+Bt+θt is more likely to exit the same exit boundaries than Xt. Then Ytcan be used to select the exit/critical boundaries, and these boundaries can be applied to Xtto control type I error. Consider Wm0=Bm0+θmr0  Nθmr0,r0, m = 1,…,M, as the Xi‘s in Slepian’s lemma. Then EWm0Wl0=EBm0Bl0, and EXi2=EWi0Wi0=EBi0Bi0=r0. Let Bm,ind0r0  Nθmr0,r0, m=1,,M be independent Brownian motions, and Wm,ind0=Bm,ind0+θmr0. Let Wi,ind0r0 be the Yi’s in Slepian’s lemma. Then EYi2=EWi,ind0Wi,ind0=EBi,ind0Bi,ind0=r0=EXi2. Further, 0=EWm,ind0Wl,ind0=EBm,ind0Bl,ind0=EYmYl. For all potential options/comparisons to be considered in a two-stage trial, it is reasonable to assume that EBm0Bl00, or that the endpoints in the phase 2 stage are non-negatively correlated. This assumption is in agreement with the opinion that (FDA guidance, 2019) “Most secondary endpoints in clinical trials are correlated with the primary endpoint, often very highly correlated). This assumption is independent of the selection rule. Let Bm,ind0r0, m=1,,M be independent Brownian motions, and Wm,ind0=Bm,ind0+θmr0. Let Bi,ind0r0 be the Yi‘s in Slepian’s lemma. Then 0=EBm,ind0Bl,ind0EBm0Bl0. Hence, EYmYl0EBm0Bl0=EXmXl, and the conditions for Slepian’s lemma are met. Thus, for all real numbers u1,,um: Pm=1MYm>umPm=1MXm>ui, i.e., Pm=1MWm,ind0>umPm=1MWm0>um, this leads to PmaxWm,ind0,,WM,ind0>xPmaxWm0,,WM0>x. Let X0,ind=maxWm,ind0,,WM,ind0, X0,max=maxWm0,,WM0. Let X0 be the selected Wn0 at the end of the phase 2 (regardless of the selection rule). Then X0,ind dominates X0,max per the above discussion., and X0,max dominates X0 (regardless of the selection rule). Type I error (FWE) can be controlled in all situations (treatment selection of any distribution, population enrichment, and endpoint selection, or mixed selection) as follows: i) If the covariance matrix is known, but the selection rule is not strict, let Y0=X0,max and apply the dominance theorem; ii) If the covariance matrix is unknown, and the selection rule is not strict, let Y0=X0,ind and apply the dominance theorem. The critical boundaries c00, c10,1,,cK0,1, for a sequential testing on X(t) can be determined by requiring that PY0c00r0)i=1KYri0,1cK0,1ri0,1α/2. Then, regardless the selection rule, or the covariance matrix, P X0c00r0(i=1KXri0,1ci0,1ri0,1P Y0c00r0)(i=1KYri0,1ci0,1ri0,1α/2

3.4. Adaptive sequential testing (ATSSSD)

In a TSSSD, the power calculation is based on assumptions that may or may not reflect the true efficacy, and the comparison with the best efficacy may not get selected even if the rule is play-the-winner ( and provide examples for such probabilities). Hence, it may be desirable to perform sample size re-estimation at an interim analysis in the phase 3 stage (preferably using effect size estimated from phase 3 data only, which is unbiased). To facilitate the sample size change discussion we use the superscript (2) to indicate statistics, parameters after the sample size change in the phase 3 stage. For example, the Information times in the phase 3 stage after sample size change will be ti2, the combined phase 2 and 3 time points will be denoted ti0,2, the new critical and exit boundaries will be denoted as ci0,2, ei0,2, respectively. The exit boundaries e10,2,,eK20,2 should maintain type I error control, the selection of which is mathematically identical to sample size modification for adaptive sequential designs (Gao et al. Citation2008), the only difference is the notations. Details of calculations are provided in the online supporting material.

Figure 3. Trial Design with two treatments and three looks, common control, and normal distribution – DACT.

Figure 3. Trial Design with two treatments and three looks, common control, and normal distribution – DACT.

3.4.1. Conditional power and conditional type I error

Suppose that XtL0,1=xL0,1 is observed at the L-th interim analysis. Let θ be the effect size of the selected comparison. The first step in sample size re-estimation is to calculate the conditional power with the planned sample size.

  • Under the alternative hypothesis Ha: θ > 0. W(t) = B(t) +θtNθt,t. The conditional power at information times tL+10,1<<tK10,1 is Pθ,eL+10,1,,eK10,1|XtL0,1=xL0,1

=Pi=1K1LXtL+i0,1eL+i0,1|XtL0,1=xL0,1θ. Conditional type I error can be obtained by setting θ = 0, αc1=Pi=L+1K1Xti0,1ei0,1|XtL0,1=xL0,10. If the conditional power is low or too high, then sample size can be either increased or decreased.

  • Suppose that the sample size is to be modified with new information times t10,2,,tK20,2, and new exit boundaries e10,2,,eK20,2 chosen. The conditional power (ti0,2 = t0 + ti2) is (similar to the above) Pθ,e10,2,,>K20,2|XtL0,1=xL0,1=

i=1K2Xti0,2ei0,2|XtL0,1=xL0,1θ. This is an increasing function of T2=tK20,2. Desired power (e.g., 1β) can be obtained by choosing an appropriately large T2. The new conditional type I error is obtained by setting θ = 0, αc2=Pi=1K2Xti0,2ei0,2|XtL0,1=xL0,10.

3.4.2. Adjusting the exit boundaries after sample size modification

To maintain the total type I error, it is sufficient to maintain the conditional type I error (same as in Gao, Ware, Mehta Citation2008), i.e. if the future exit boundaries e10,2,,eK20,2 are chosen such that αc2=αc1, then the overall type I error will be maintained. Details of the calculation are provided in the online supporting material.

3.5. Inference

3.5.1. Conservative lower confidence limit and P-value

Let θ=θ1,,θM=θ,,θ, and denote it as θ==θ,,θ. Suppose that the trial ended at tI0,1, with XtI0,1=xI0,1=zI0,1tI0,1. Let fXθ=PX0c00t0(i=1I1Xti0,1ci0,1ti0,1XtI0,1zI0,1tI0,1θ=, fXθ=PY0c00t0(i=1I1Yti0,1ci0,1ti0,1YtI0,1zI0,1tI0,1θ=Where Yt=Y0+Yt, Y0=X0,max if the covariance matrix is known, and Y0=X0,ind if the covariance matrix is not known. Then both fXθ and fYθ are increasing functions of θ and fXθfYθ (per the dominance theorem). Hence for any 0<γ<1, fY1γfX1γ. In particular, fY1α/2fX1α/2. Therefore, θα/2,twostage=θα/2=fY1α/2 is a conservative lower confidence limit, and p=fY0 is a conservative p- value. Further, p<α/2 if and only if θα/2>0.

3.5.2. Unbiased point estimate and exact upper confidence limit

An unbiased estimate of θ = θselected=θn from phase 2 data (hence from combined phase 2/3 data) is not possible due to the bias from the selection (a multiple comparison) procedure. However, if only the phase 3 data is used, a median unbiased point estimate θˆselected=θˆphase3 and exact upper confidence limit θ1α/2,phase3 can be obtained (Gao et al. Citation2013). Therefore, we propose to use θˆphase3 as the point estimate, and (θα/2,twostage, θ1α/2,phase3) as the conservative two-sided confidence interval. Mathematically, the method is exactly the same as in Gao, Liu, Mehta (Citation2013) (see section 5.2 for explanations). To obtain θˆphase3, it is necessary to associate a separate group sequential design for phase 3. The design includes information time points for interim in the phase 3 stage are t11<<tK11, the pre-selected separate phase 3 stage exit/critical boundaries e11,,eK11, and c11,,cK11. The relative information fraction for phase 3 are si1=ti1/tK11 (see and ). Let

Σs1,i=s11s11s11s21s11s21s11s21si1

Suppose that the trial terminated atI -th interim analysis in the phase 3 stage (information time tI1), with ZtI1=zI1. Let

gθ=1c11s11θtK11s11cI11sI11θtK11sI11zI1sI1θtK11sI1u0,x,Σs1,Idx1dxI.

gθ is an increasing function of θ. Let θˆphase3=g10.5 be the point estimate, and θ1α/2,phase3=g11α/2 be the upper limit of the (1- α) × 100% confidence interval data. Then θˆphase3 is a median unbiased estimator, and Pθθ1α/2,phase3=α/2, i.e., the upper limit is exact (e.g., see Gao, Liu, Mehta Citation2013).

3.6. Inference with adaptation

3.6.1. Backward image

To obtain the statistics for inference, it is necessary to derive a backward image (see Gao, Liu, Mehta Citation2013, section 4) first. The backward image can be obtained in exactly the same way as in Gao et al. (Citation2013) as follows (see section 5.2 for explanations): Suppose that the trial terminated at I2-th interim analysis (information time tI20,2), with X(tI20,2) = xI20,2. Then, there is a unique Jθ0,1, and a unique xJθ0,10,1 such that Pi=1I21Xti0,2ei0,2XtI20,2xI20,2|XtL0,1=xL0,1=Pi=L+1Jθ0,11Xti0,1ei0,1XtJθ0,10,1xJθ0,10,1|XtL0,1=xL0,1. The pair tJθ0,10,1,xJθ0,10,1 is the backward image of tI20,2,xI20,2, given XtL0,1=xL0,1.

3.6.2. Lower confidence limit and p-value

LetfXθ=PX0e00i=1Jθ0,11Xti0,1ei0,1XtJθ0,10,1xJθ0,10,1.

Let Yt=Y0+Yt, where Y0 is chosen as above and define fYθ similarly. Per earlier discussion regarding the application of the dominance theorem, for any 0<γ<1, fY1γfX1γ. A conservative p-value and lower confidence bound can be obtained as: i) p=fY0, θα/2,twostage=θα/2=fY1α/2. The p-value is consistent with θα/2, in the sense that pα/2 if and only if θα/2 ≥0.

3.6.3. Unbiased point estimate and exact upper confidence limit

Unbiased point estimate and upper confidence interval can be obtained, in exactly the same way as in Gao, Liu, Mehta (Citation2013, as follows (see section 5.2 for explanations): Let ti2=ti0,2t0 be the information time for the phase 3 stage. Let X(tI22) = xI22=xI20,2xL0,1 be the observed value for the phase 3 stage score function. Let ei2=ei0,2x0, where x0 is the observed value of X0. Then, there is a unique Jθ1 and a unique xJθ11 such that Pi=1I21Xti2ei2XtI22xI22|XtL1=xL0,1x0=Pi=L+1Jθ11Xti1ei1Xti1ei1XtJθ11xJθ11|XtL1=xL0,1x0. The pair tJθ11,>xJθ11 is the phase 3 stage backward image of tI22,xI22, given XtL1=xL0,1x0. Let ZtJθ11=zJθ11=xJθ11/tJθ11.Letgθ=1c11s11θtK11s11

cJθ111sJθ111θtK11sJθ111zJθ11sJθ11θtK11sJθ11u0,x,Σs1,Jθ1dx1dxJθ1

gθ is an increasing function of θ. Let θˆphase3=g10.5 be the point estimate, and θ1α/2,phase3=g11α/2 be the upper limit of the (1– α) × 100% confidence interval data. Then θˆphase3 is a median unbiased estimator, and Pθθ1α/2,phase3=α/2, i.e., the upper limit is exact (e.g., see Gao, Liu, Mehta Citation2013).

3.6.4. Repeated sample size modification

If desired, similar to the situation in adaptive sequential design (e.g., Gao, Liu, Mehta Citation2008) sample size modifications can be repeated any number of times. The details are provided in the online supporting material.

3.7. Special case L=K11 and K2=1

In general, the calculation of conditional probabilities requires numerical integrations. In a special, and likely the most common case, all statistics for sample size modification, such as conditional power, conditional type I error at interim analysis, new sample size, and new critical boundaries can all be expressed using closed form formulas. This is the case in which the sample size modification is performed at the penultimate interim analysis and there is no more interim analysis after the sample size change. Hence, L=K11 and K2=1. The calculations are exactly the same as in Gao, Ware, Mehta (Citation2008). The backward image can also be calculated using a formula, exactly the same as in Gao et al. (Citation2013). Details of calculations are provided in the online supporting material.

3.7.1. Sample size re-estimation

Suppose that XtK110,1=xK110,1 has been observed. Let θˆ and s.e.θˆ be the naïve point estimate and the standard error. Then xK110,1=θˆs.e.θˆ2. The conditional power is ΦθˆtK11tK111eK10,1xK110,1/tK11tK111. The required new information time that provides 1β×100% conditional power is t10,2θˆ2eK10,1xK110,1/tK11tK111+Φ11β2+tK110,1. (Preferably, the θˆ in this formula should be derived using the phase 3 data only). The new adjusted exit boundary is e10,2=t10,2tK110,1/tK10,1tK110,1eK10,1xK110,1+xK110,1 and the new critical boundary is c10,2=e10,2/t10,2.

3.7.2. Backward image

In this case, the trial ends at t10,2. Let the point estimate (using data from both stages) be θˆ0,1, and the standard error be s.e.θˆ0,1. The final observation (using data from both stages) is Xt10,2=x10,2=θˆ0,1s.e.θˆ0,12. Also, Jθ0,1=K1. For any θ, the backward image is given as xK1,θ0,1=tK10,1tK110,1/t10,2tK110,1x10,2xK110,1θt10,2tK110,1+xK110,1+θtK10,1tK110,1.

4. Examples and simulations

Probability calculations (e.g., critical boundaries, power, and sample size) and simulations require clearly pre-set rules. Play-the-winner is a clear rule, all calculations and simulations with this rule can be performed using the Markov process properties. If the rule is not play-the-winner, probability calculations may or may not be possible, depending on the complexity of the rules, even if the rule can be pre-set. For these reasons, probability calculations are provided only for play-the-winner rule in DACT. If a non-play-the winner rule is clearly specified, then customized simulations can be conducted to assess the probabilities and operating characteristics. There could be various reasons for not using the play-the-winner rule in practice, the most common reason may be the need for including safety profile (i.e., rate of adverse events) together with efficacy in trials involving dose selection. However, safety considerations often involve clinical judgement which can be difficult to pre-specify. For trials involving “surrogate or intermediate endpoint” (FDA. 2019), using play-the-winner on observations of the primary endpoint is precluded. Hence, only simulations using the play-the-winner rules are presented in this article. Type I error (FWE) control and conservativeness of probability calculations for non-play-the-winner rules are ensured by the dominance theorem.

If the play-the-winner rule is adopted and the distribution of the endpoint is normally distributed, then the FWE control is exact with the critical boundaries, and the power and sample size calculations are precise. If the selection rule is not play-the-winner, or if the endpoint is not normally distributed, then the FWE is conservatively controlled per the dominance theorem. For non-play-the-winner rules, due to the arbitrariness of the selection rule, the power and sample size can’t be precisely calculated, and the true power could be smaller than the actual power, and the sample size may have a power less than 1β. To ensure adequate power, sample size re-estimation may be conducted (and is recommended) in the phase 3 stage (where there is no bias due to selection).

4.1. Critical boundary selection and sample size calculation in trial design

In this section, two trial designs with three looks are presented. The first look is at the end of phase 2. One for a trial that compares two treatments with a common control and that the efficacy variable is normally distributed. The second trial design is for survival analysis endpoint. The covariance matrix for the first trial is known with off-diagonal covariance coefficients equal to 0.5, while that for the second trial is not. The critical boundaries for the second example are obtained by using the dominance theorem with a covariance matrix whose off-diagonal covariance coefficients equal to 0. These boundaries are therefore conservative, hence also applicable to any two comparisons, such as treatment selection with any non-normal distributions, or endpoint selection, or sub-population selection. In a TSSSD, several aspects of information are important: i) Critical boundaries for the combined phase II/III testing (for overall hypothesis testing, overall type I error (FWE) control, and lower confidence limit calculation); ii) Critical boundaries for phase 3 only (for unbiased point estimate) as a standalone trial; iii) Sample size per arm; iv) Probability of each test arm being observed as the best at end of phase 2 (whether the selection rule is to play the winner or not, understanding this probability can be helpful for trial planning); v) Power for each arm (this is the probability of the arm being the best at the end of phase 2, and rejecting the null hypothesis with this arm in either phase 2 or 3); vi) Conditional power for each arm if the arm showed best efficacy at the end of phase 2 (this is the probability of rejecting the null hypothesis conditional on a particular arm been selected for phase 3 testing). All these information are provided in the DACT output.

The information in involves many complicated calculations. It would be assuring if these calculations can be confirmed through simulations. In , simulation results are presented for the design in . The designing parameters (such as critical boundaries, and sample size) are as specified in . The sample size for the control arm is 110. The simulation was repeated 100,000 times. The simulated probabilities (i.e., overall power, the probability of each treatment being the best at the end of phase 2, the power of each arm [i.e. the probability of being selected as the winner and rejecting the null hypothesis at the completion of the trial], and the conditional power for each arm after being selected as the best) match those calculated ones (within random error of simulations), confirming that the BMA approach mathematically precisely describes probability events of the seamless sequential designs.

Figure 4. Simulated probabilities-DACT.

Figure 4. Simulated probabilities-DACT.

It is noted that: i) The two-stage boundaries in are more conservative (higher) than that in for conservative type I error control, since the covariance matrix in (for trial 2) is not known, and the dominance theorem and Slepian’s lemma (Huffer Citation1986; Slepian Citation1962) are used; ii) The probability of the arm with lesser efficacy showing better efficacy is greater than zero (in both and ), this is the probability of the “loser” being selected as the “winner”. The observed efficacy of the arm will very likely be biased. This suggests that unbiased estimate should not include phase 2 data (This is why there should be critical boundaries for phase 3 only).

Figure 5. Trial Design with two arms/regimens, survival analysis, three looks.

Figure 5. Trial Design with two arms/regimens, survival analysis, three looks.

4.2. Type I error simulations (with sample size re-estimation)

To simulate type I error control, critical boundaries in is used for normal distribution and for survival analysis. Let the planned sample size be Np, and the maximum sample size be Nmax. Sample size is re-estimated at the second interim analysis. Let the futility threshold be θfuti. If the observed efficacy θˆ is less than θfuti, the trial will continue to the planned sample size (immediate termination could be another option, but not used for the simulations here). If the conditional power is no less than 1 − β = 0.9, the trial will proceed to planned sample size. If the conditional power is less than 1 − β = 0.9, and the re-calculated sample size is less than Nmax, then the trial will proceed to the new sample size. if the re-estimated sample size is greater than Nmax, then the trial will proceed to planned sample size (proceed to Nmax could be another option). For the trial with normal distribution, θfuti=0.05, Np=100 per arm, and Nmax=500 per arm was used for the simulations. For the trial with survival analysis endpoint, θfuti=log0.95 (i.e., the trial is futile if the observed hazards ratio is greater than 0.95), the planned sample size Np is 200 event, Nmax is 500 events. The simulation is repeated 100,000 times for normal distribution and 10,000 time for survival analysis. We note that the sample size modification rules in the simulations are chosen as an example for illustration, and the type I error control remains valid with any other simulation rules.

When (θ1, θ2) = (0,0) (or (HR1, HR2) = (1, 1)), any rejection of the null hypothesis results in a type I error. If θ1 = 0 (or HR1 = 1) but θ2 > 0 (or HR2 < 1), a rejection of the null hypothesis results in a type I error only when the dose associated with θ1 = 0 (or HR1 = 1) was selected at the end of the phase 2 stage. In , the selection rule is to play the winner, in which the treatment with the largest Wald statistics was chosen for phase 3 stage.

Table 1. Type I error simulations-play the winner.

In , three selection rules were simulated. One is to play the winner, another is to select the treatment with the second largest Wald statistics from phase 2 stage. Such a scenario could happen in a dose selection trial, and the dose with the larger Wald statistics was the higher dose but with higher toxicity, and the lower dose was chosen for lower toxicity. The third rule is random selection. This could happen when the selection was based on a surrogate or intermediate endpoint, whose order of efficacy does not exactly match that of the primary endpoint. By the dominance theorem, playing the winner will result in the highest rejection rate among all possible selection methods (see section 5.2 for explanations) under full null hypothesis of θi=0 for all i. Hence, under the full null hypothesis, play the winner is associated with the highest type I error. Under partial null hypothesis, with some θi=0 and other θi>0, there is no dominance relationship of type I error between different selection rules, but all type I errors will be less than that under full null hypothesis (per the dominance theorem). Detailed explanations will be too tedious and won’t be presented here. However, proper type I error control is demonstrated through simulation results in .

Table 2. Type I error simulations: play the winner, selecting the 2nd best, or random selection.

The simulations in can be confirmed using the DACT software. The simulations in require specific programming (included in the online support material).

4.3. Point estimates and confidence interval

Simulations are performed for dose selection studies. The selection rule for the simulations in is for a two (active) dose (with one control) design in which the dose with the largest observed effect size (i.e. θˆn0 = θˆmax0 = max{θˆ10,θˆ20}) will be selected at the end of the phase 2 stage. Point estimates and confidence interval coverage for θselected=θnare presented. In a three-look design, one analysis is performed at the end of phase 2 stage, one interim analysis during the phase 3 stage, and one final analysis at the end of the trial. In our simulations, sample size re-estimation is performed after the second look in a three-look design.

Table 3. Bias and CI coverage, three looks, with sample size re-estimation at second look.

The simulations confirm: i) The lower limit of the confidence interval is conservative, but consistent with the hypothesis testing; ii) The point estimate is median unbiased; iii) The upper confidence limit is exact.

4.4. A trial example

We use a hypothetical trial to demonstrate how to perform sample size re-estimation and final analysis. Suppose that an oncology trial is being conducted (see for the design), in which two treatments are compared to a common control. Suppose that the endpoint is PFS (Progression free survival), and the planned total number of events is 140. It is noted that the covariance matrix from phase 2 stage is not known. The selection rule does not require play- the-winner. Suppose that the trial is designed according to (for survival analysis, θ=logHR). Note that K1=2 (K1 is the number of looks in phase 3, total number of looks in the whole trial is K1+1). Suppose that a sample size re-estimation is conducted at the second look in the TSSSD (first look in phase 3 stage). Hence, L=1 in our notations (however, the number 2 should be chosen when using DACT for the calculation, since it is the second interim analysis in the TSSSD). Suppose that the total number of events for the selected regimen and the control at the interim analysis is 80. Suppose that θˆ0,1=0.4 (HRˆ=0.67) and s.e.(θˆ0,1)=0.18 (Using both phase 2 and 3 data), θˆ1=0.37 (HRˆ=0.691). s.e.(θˆ1)=0.2 (Using phase 3 data only) have been observed. Accordingly, z10,1=0.4/0.18=2.222. Since z10,1<c10,1=2.869285 at the interim analysis and the trial continues with a sample size re-estimation. We note that θˆ0,1 can be biased due to the selection process at the end of phase 2, and θˆ1 is unbiased since it is obtained using only phase 3 data. So the effect size is assumed to be θˆ(1)=0.37. Using DACT, the conditional power with the planned the sample size is 0.7716, and the new total number of events (from the remaining treatment and the control) required to achieve a new conditional power of 0.9 is 156 (see Figure S1 in the online supporting material). The new critical boundary is c10,2=2.09. Suppose that at the end of the trial with the new sample size, θˆ0,2=0.39 and s.e.(θˆ0,2)=0.0923 (using both phase 2 and 3 data), θˆ2=0.365. s.e.(θˆ2)=0.1 (Using phase 3 data only). Accordingly, z10,2=0.39/0.0923=4.2253. Since z10,2>c10,2, the null hypothesis is rejected. The final p-value is p=0.000176.The point estimate and the conservative two-sided 95% confidence interval for θ=logHR are 0.3657 (0.186661,0.568683) (readers can repeat these calculations using the DACT software).

5. Discussion

5.1. The role of the score function

The properties of the score function are well known (Jennison and Turnbull Citation1997). It is approximately a Brownian motion. Because of this property, it serves as a rigorous mathematical foundation for an ATSSSD procedure that satisfies the three statistical principles for adaptive designs (FDA guidance, 2019). By utilizing the dominance theorem and Slepian’s lemma, the procedure can be applied to a wide range of trials, including dose, treatment, end point selection, or a mixture selection, for any distributions. And it provides complete inference.

The Brownian motion properties of score function is based on large sample theory. Such properties may not hold well when the sample size is small and/or when the endpoint is non-Gaussian distributed. There is no clear threshold on sample size when the Brownian motion properties will hold. Simulations (which can be conducted using DACT) may be conducted on the operating characteristics of the design for non-Gaussian distributed endpoints, and/or when the sample size is not large.

5.2. Parallels and differences between seamless sequential and usual sequential designs

The adaptive seamless sequential design is fundamentally parallel to usual adaptive sequential design for phase 3 trials, but with some important differences.

Parallels:

  • Both procedures are based on the score function on the accumulative data. The score function in a usual adaptive sequential design is Wt=Bt+θ  Nθt,t, where t is Fisher’s information time from the beginning of the trial. Bt  N0,t is a Brownian motion and a Markov process. Thus W(t) is a Brownian motion with drift θt. On the other hand, the accumulative data in a seamless sequential design is Xt=X0+Wt=X0+Bt+θt, where X0 is the random variable from the phase 2 stage, Bt  N0,t), and t is Fisher’s information time from the beginning of the phase 3 stage. X(t) is a Markov process, but not a Brownian motion.

  • Let θ=θ1,,θM (for seamless design), a=a1,,aK, x=x1,,xK, and t=t1,,t. Both procedures are fundamentally based on an exit function of the form

    gθ,a,t=a1aKuθ,x,tdx1dxK
    fθ,a,t=a1aKuθ,x,tdx1dxK

where f(∙) is for the seamless design and g(∙) is for the usual sequential design. u(∙) is the joint density function of either Wt=Wt1,,WtK or Xt=Xt1,,XtK). fθ,a,t is the probability of Xt exiting the upper boundary a=a1,,aK at any ai, and gθ,a,t is the exiting probability of Wt. For any fixed a and t, f,a,t is an increasing function of each component of θ=θ1,,θM, g,a,t is an increasing function of θ. By setting θ=θ1,,θM to be θ=θ,,θ in a seamless sequential design, this property is used to obtain θγ=f1γ, (g1γ for usual sequential design) for 0<γ <1. θ0.5serves as the point estimate for θ, and θα/2,θ1α/2 forms the (1−α)×100% confidence interval for θ. For any fixed θ, fθ,,t or gθ,,t is a decreasing function of each ai. Setting θ = 0, or θ=(0, … ,0), this property is used to obtain critical boundaries for both sequential designs.

  • Sample size re-estimation (SSR): the SSR for the seamless sequential design only involves conditional probabilities for phase 3, and only involves W(t), the tail of Xt=X0+Wt. Similarly, the SSR of a usual adaptive design also involves W(t). Hence, the SSR for both adaptive sequential designs are exactly the same (e.g., Gao, Ware, Mehta Citation2008). The only difference is in notations.

Differences:

  • The distribution for Wt is much simpler because W(t) is a Brownian motion. In a seamless design, the distribution of X0 can be complex, depending on the configuration of θ=θ1,,θM, the correlation between the components of W0=W1,t100,,WM,tM00, the selection procedure in the phase 2 stage. In this article, a conservative approach is taken to make a conservative assumption that the selection is based on “play the winner” and obtain an X0,max=W1,t100,,WM,tM00. This guarantees that X˜0X0,max if X˜0 is the result of any other selection procedure. X0 may be influenced by unpredictable factors and hence may be unpredictable and unknown. Denote fX0θ,a,t as the exit function for Xt=X0+Wt. By the dominance theorem, fX˜0θ,a,tfX0,maxθ,a,t. Hence, the critical boundaries obtained using fX0,maxθ,a,t are larger (and thus more conservative) than those obtained using fX˜0θ,a,t, θγ’s obtained using fX0,maxθ,a,t are smaller (thus more conservative) than those obtained using fX˜0θ,a,t. The dominance theorem is the foundation of the seamless sequential design.

  • In a usual sequential design that compares a test arm and a control arm, gθ,a,t is an increasing function of a single parameter θ. The estimates of θγ=g1γ is straightforward. In a seamless sequential design, fθ,a,t is an increasing function of every component of θ=θ1,,θM. On condition that θn is the parameter for the selected option, then Xt=X0+Wt=X0+Bt+θnt. Note that X0 depends on the configuration of θ=θ1,,θM and the correlation between the components of W0=W1,t100,,WM,tM00, the vector of score functions at the end of phase 2. In practice, both the configuration and the correlations are unknown. If the phase 2 data were to be used in the estimation of θn, then the estimation can’t be unbiased because of the unknowns. Hence, a conservative approach is taken in assuming that θ=θ1,,θM=θn,,θn=θ,,θ, and the components of W0 are independent. Thus fθ,a,t=fθ,a,t becomes a function of a single parameter θ. Per the dominance theorem, the estimator θγ is conservatively biased (i.e., median(θ0.5)≤θn). An important feature of this estimator is consistency, such that θα/2 >0 if and only if the hypothesis test rejects the full null hypothesis of θ1,,θM=0,,0 at the one-sided α/2 level. To obtain an unbiased estimate of θn, the phase 2 data must be excluded (section 3.5.2, 3.6.3).

  • By the dominance theorem, conservative estimates of θγ and conservative p-values can be obtained using fθ,a,t=fθ,a,t either with sample size change (sections 3.6.2) or without sample size change (section 3.5.1). But unbiased estimates may be more desirable than conservative ones. Note that the tail of Xt=X0+Wt is W(t), which is exactly the Brownian motion in a usual sequential design. Hence, this tail can be used to obtain unbiased θy, either with sample size change (section 3.6.3), or without sample size change (section 3.5.2).

  • When there is sample size change, a backward image is necessary for the estimates. The backward image depends only on the tail W(t). Hence the algorithm for obtaining the backward image is exactly as in a usual adaptive sequential design (Gao, Liu, Mehta Citation2013).

Supplemental material

Supplemental Material

Download MS Word (144.9 KB)

Disclosure statement

No potential conflict of interest was reported by the author(s).

Supplementary material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/10543406.2024.2342518

Additional information

Funding

The author(s) reported there is no funding associated with the work featured in this article.

References

  • Armitage, P., C. K. McPherson, and B. C. Rowe. 1969. Repeated significance tests on accumulating data. Journal of the Royal Statistical Society Series A (General) 132 (2):235–244. doi:10.2307/2343787.
  • Bauer, P., F. Bretz, V. Dragalin, F. König, and G. Wassmer. 15 Mar 2015. Twenty-five years of confirmatory adaptive designs: Opportunities and pitfalls. Statistics in Medicine 35(3):325–347. doi:10.1002/sim.6472.
  • Bauer, P., and M. Kieser. 1999. Combining different phases in the development of medical treatments within a single trial. Statistics in Medicine 18 (14):1833–1848. doi:10.1002/(SICI)1097-0258(19990730)18:14<1833:AID-SIM221>3.0.CO;2-3.
  • Bretz, F., H. Schmidli, F. Koenig, A. Racine, and W. Maurer. 2006. Confirmatory Seamless Phase II/III clinical trials with hypotheses selection at Interim: General concepts. Biometrical Journal 48 (4):623–634. doi:10.1002/bimj.200510232.
  • Chen, C., K. Anderson, D. V. Mehrotra, E. H. Rubin, and A. Tse. 2018. A 2-in-1 adaptive phase 2/3 design for expedited oncology drug development, contemp. Clinical Trials 64:238–242. doi:10.1016/j.cct.2017.09.006.
  • Gao, P., L. Y. Liu, and C. Mehta. 2013. Exact inference for adaptive group sequential designs. Statistics in Medicine 32 (23):3991–4005. doi:10.1002/sim.5847.
  • Gao, P., L. Y. Liu, and C. Mehta. 2014. Adaptive sequential testing for multiple comparisons. Journal of Biopharmaceutical Statistics 24 (5):1035–1058. doi:10.1080/10543406.2014.931409.
  • Gao, P., J. H. Ware, and C. Mehta. 2008. Sample size re-estimation for adaptive sequential designs. Journal of Biopharmaceutical Statistics 18 (6):1184–1196. doi:10.1080/10543400802369053.
  • Glimm, E., M. Bezuidenhoudt, A. Caputo, and W. Maurer. 2018. A testing strategy with adaptive dose selection and two endpoints. Statistics in Biopharmaceutical Research 10 (3):196–203. doi:10.1080/19466315.2018.1497531.
  • Huffer, F. 1986. Slepian’s inequality via the central limit theorem. Canadian Journal of Statistics 367–370. doi:10.2307/3315195.
  • Jennison, C., and B. W. Turnbull. 1997. Group sequential analysis incorporating covariance information. Journal of the American Statistical Association 92 (440):1330–1441. doi:10.1080/01621459.1997.10473654.
  • Jennison, C., and B. W. Turnbull. 2000. Group sequential methods with applications to clinical trials. Boca Raton, FL: Chapman & Hall/CRC.
  • Jin, M., and P. Zhang. 2022. A seamless adaptive 2-in-1 design expanding a phase 2 trial for treatment or dose selection into a phase 3 trial. Statistics in Biopharmaceutical Research 14 (3):334–341. doi:10.1080/19466315.2021.1914717.
  • Klinglmueller, F., M. Posch, and F. Koenig. 2014. Adaptive graph-based multiple testing procedures. Pharmaceutical Statistics 13 (6):345–356. doi:10.1002/pst.1640.
  • Koenig, F., W. Brannath, F. Bretz, and M. Posch. 2008. Adaptive Dunnett tests for treatment selection. Statistics in Medicine 27 (10):1612–1625. doi:10.1002/sim.3048.
  • Li, R., L. Wu, R. Liu, and J. Lin. 0000. Flexible seamless 2-in-1 design with sample size adaptation. https://arxiv.org/abs/2212.11433.
  • Marcus, R., P. Eric, and K. R. Gabriel. 1976. On closed testing procedures with special reference to ordered analysis of variance. Biometrika 63 (3):655–660. doi:10.1093/biomet/63.3.655.
  • O’Brien, P. C., and T. R. Fleming. 1979, A multiple testing procedure for clinical trials. Biometrics Bulletin 35(3):549–556. Sep. doi:10.2307/2530245.
  • Pocock, S. J. 1977, Group sequential methods in the design and analysis of clinical trials. Biometrika 64(2):191–199. August. doi:10.1093/biomet/64.2.191.
  • Slepian, D. 1962. The one-sided barrier problem for gaussian noise. Bell System Technical Journal 41 (2):463–501. doi:10.1002/j.1538-7305.1962.tb02419.x.
  • Stallard, N., and T. Friede. 2008. A group-sequential design for clinical trials with treatment selection. Statistics in Medicine 27 (29):6209–6227. doi:10.1002/sim.3436.
  • Stallard, N., and S. Todd. 2003. Sequential designs for phase III clinical trials incorporating treatment selection. Statistics in Medicine 22 (5):689–703. doi:10.1002/sim.1362.
  • Stallard, N., and S. Todd. 2010. Seamless phase II/III designs. Statistical Methods in Medical Research 20 (6):623–634. doi:10.1177/0962280210379035.
  • Sugitani, T., T. Hamasaki, and C. Hamada. 2013. Partition testing in confirmatory adaptive designs with structured objectives. Biometrical Journal 55 (3):341–359. doi:10.1002/bimj.201200218.
  • U.S. 2019. Department of Health and Human Services Food and Drug Administration (CDER and CBER). Guidance for Industry Adaptive Design Clinical Trials for Drugs and Biologics.
  • Whitehead, J. 1997. The design and analysis of sequential clinical trials. Rev. 2nd ed. England: John Wiley & Sons Ltd: West Sussex.
  • Zhang, P., X. Li, K. Lu, and C. Wu. 2022. A 2-in-1 adaptive design to seamlessly expand a selected dose from a phase 2 trial to a phase 3 trial for oncology drug development. Contemporary Clinical Trials 122:106931. doi:10.1016/j.cct.2022.106931.
  • Zhang, P., X. Li, K. Lu, C. Wu, and M. Ge. 2023. A variation of a 2-in-1 adaptive design to seamlessly expand a selected dose from a phase 2 trial to a phase 3 trial for oncology drug development. Contemporary Clinical Trials 127:107119. doi:10.1016/j.cct.2023.107119.