1,815
Views
1
CrossRef citations to date
0
Altmetric
Statistical Innovation in Healthcare: Celebrating the Past 40 Years and Looking Toward the Future - Special issue for the 2021 Regulatory-Industry Statistics Workshop

Power and Sample Size Calculations for the Restricted Mean Time Analysis of Prioritized Composite Endpoints

ORCID Icon
Pages 540-548 | Received 23 Dec 2021, Accepted 25 Jul 2022, Published online: 03 Oct 2022

Abstract

As a new way of reporting treatment effect, the restricted mean time in favor (RMT-IF) of treatment measures the net average time the treated have had a less serious outcome than the untreated over a specified time window. With multiple outcomes of differing severity, this offers a more interpretable and data-efficient alternative to the prototypical restricted mean (event-free) survival time. To facilitate its adoption in actual trials, we develop simple approaches to power and sample size calculations and implement them in user-friendly R programs. In doing so we model the bivariate outcomes of death and a nonfatal event using a Gumbel–Hougaard copula with component-wise proportional hazards structures, under which the RMT-IF estimand is derived in closed form. In a standard set-up for censoring, the variance of the nonparametric effect-size estimator is simplified and computed via a hybrid of numerical and Monte Carlo integrations, allowing us to compute the power and sample size as functions of component-wise hazard ratios. Simulation studies show that these formulas provide accurate approximations in realistic settings. To illustrate our methods, we consider designing a new trial to evaluate treatment effect on the composite outcomes of death and cancer relapse in lymph node-positive breast cancer patients, with baseline parameters calculated from a previous study.

1 Introduction

Robust metrics like the restricted mean survival time (RMST) are increasingly welcomed in the design and analysis of clinical trials (Royston and Parmar Citation2011, Citation2013; Uno et al. Citation2014; McCaw et al. Citation2019). Unlike the traditionally used hazard ratio, the RMST, which measures the average survival time gained within a fixed time window, remains interpretable and nonparametrically estimable whether or not the hazards are proportional over time. When nonfatal events like hospitalization or cancer relapse are added to the survival endpoint, Mao (Citation2021) considered an extension called the restricted mean time in favor (RMT-IF) of treatment, which measures the average time gained having had a less severe outcome. The RMT-IF draws on all events but also properly differentiates fatal versus nonfatal ones in their severity. As a result, it is statistically more efficient and clinically more interpretable than the standard restricted mean event-free survival time (RMEST), the RMST applied to the first composite event (Freemantle et al. Citation2003; Anker and McMurray Citation2012).

To adopt a new metric in real trials, we need to be able to calculate its power and sample size. While the specifics necessarily depend on the metric itself, such calculations always revolve around two central quantities: an effect-size estimand for the between-group difference and a variance measure for the within-group variation. Collectively the “signal” and “noise” give rise to the noncentrality parameter under the alternative hypothesis, upon which the power and sample size depend (see e.g., Ch. 14 of van der Vaart Citation1998). In this regard the RMST enjoys a natural advantage—its estimand is a simple integral of the survival function, and the corresponding plug-in Kaplan–Meier estimator can be easily tackled in the martingale framework (Fleming and Harrington Citation1991), resulting in a closed-form variance (Tian et al. Citation2018; Yung and Liu Citation2020; Eaton et al. Citation2020).

Unfortunately, the RMST’s ease with such derivations does not extend to composite endpoints. Although the RMT-IF estimand can be similarly expressed as an integral of the survival functions for the component events, the correlations between them make the resulting estimator resistant to standard martingale theory (Wei et al. Citation1989; Lin et al. Citation2000). In fact, Mao (Citation2021) derived the variance of the RMT-IF estimator only through its robust influence functions (Tsiatis Citation2006), without an analytic form. Moreover, to describe the multiple outcomes realistically, one would need a joint model that accommodates not only between-component correlations but also, ideally, component-specific treatment effects.

Similar challenges exist in the power and sample size calculations for the win ratio (Pocock et al. Citation2012), a pairwise comparison scheme to analyze composite endpoints, and have recently been addressed by Mao et al. (Citation2021) via a combination of copula models (Oakes Citation1989), U-statistic theory (Hoeffding Citation1948), and numerical computations. Although the RMT-IF is also pairwise defined, its estimand is reducible to individual-based quantities (namely, the component-wise survival functions). This makes variance calculation easier than it is for win-ratio-like U-statistics (Mao et al. Citation2021). More importantly, unlike the win ratio (Bebu and Lachin Citation2016; Mao Citation2019; Dong et al. Citation2020), the RMT-IF as an estimand is defined through the uncensored outcomes (as is the RMST) (Akacha et al. Citation2017; Akacha, Bretz and Ruberg Citation2017; ICH Citation2020). This means that we can calculate the effect size directly from the outcome distributions without numerical averaging over the censoring distribution (Mao et al. Citation2021). On the other hand, the RMT-IF involves an extra parameter, that is, the restricting time, which affects the interpretation of the effect size as well as the operating characteristics of the test. A flexible framework for trial design should allow the user to vary the choice of this important parameter.

We aim to develop tools that are fully aligned with the above features of the RMT-IF. Section 2 lays out the main steps. We start off with a bivariate copula model under which the survival and nonfatal event times each follow a proportional hazards model with constant baseline rate. We allow the treatment hazard ratio (HR) to be different on death and the nonfatal event. The component-specific HRs are translated to the effect size on the RMT-IF by an analytic function, yielding the “signal” of the test. For the “noise” part, we simplify the robust influence functions (Mao Citation2021) into an integral of martingale-like processes under a standard set-up for censoring. The restricting time appears in the upper limit of this integral. Thanks to this expression, we only need a one-off numerical routine to compute the variances for a spectrum of restricting times. These quantities not only provide us ready estimates of power and sample size, but can also be used to retrieve other design information such as the length of patient accrual, which simultaneously affects the power (via administrative censoring) and sample size. The proposed methods are vetted by simulations in Section 3 and are illustrated on the design of a future breast cancer trial in Section 4 using baseline parameters estimated from an existing study. Section 5 concludes the article with some practical guidelines and discussions of future work.

2 Methods

2.1 Set-Up

Let D(a) denote the survival time and T(a) the nonfatal event time for a generic patient in group a, where a = 1 and 0 indicate the treatment and control, respectively. To derive usable formulas for RMT-IF’s effect size and variance, we need a simple and realistic joint model for (D(a),T(a)). To that purpose, consider the Gumbel–Hougaard copula model, a popular choice for composite outcomes (Oakes Citation1989; Luo et al. Citation2015; Oakes Citation2016; Mao et al. Citation2021). Specifically, let(1) Pr(D(a)>s,T(a)>t)=exp([{exp(aθD)λDs}κ+{exp(aθH)λHt}κ]1/κ),(1) where (λD,θD) and (λH,θH) are parameters related to the marginal distributions of D(a) and T(a), respectively, and κ1 determines their Kendall’s rank correlation C by C=1κ1 (Oakes Citation1989). Clearly, model (1) implies marginal exponential distributions for both death and the nonfatal event, the former with baseline hazard rate λD and treatment-to-control HR exp(θD), and the latter with baseline hazard rate λH and treatment-to-control HR exp(θH). Of note, we posit (1) only as a working model to derive the properties of the original nonparametric estimator for the RMT-IF (Mao Citation2021), not to construct a parametric one based on this model.

2.2 Expression of Effect Size

The RMT-IF was originally formulated on multistate outcomes Y(a)(t) from group a (a = 1, 0) with state space {0,1,,K,} for some K+. A higher number indicates a worse state and indicates death. Let τ>0 be a prespecified restricting time. By generalized pairwise comparison (Buyse Citation2010), the average time patients in group a fare better than those in group 1a over [0,τ] is(2) Wa,1a(τ)=E[0τI{Y(a)(t)<Y(1a)(t)}dt]=0τPr{Y(a)(t)<Y(1a)(t)}dt,(2) where I(·) is the indicator function. To measure the net favorability of the treatment against control, the RMT-IF considers the difference μ(τ)=W1,0(τ)W0,1(τ). We can interpret μ(τ) as the net average time gained by the treatment in a more favorable state as compared to the control. For a two-state life-death model, we can show that μ(τ) reduces to the group difference in the RMST. Like the standard RMST, μ(τ) is defined through uncensored outcomes. This ensures that the estimand has a scientifically consistent meaning that does not depend on specific studies.

In our set-up with death prioritized over the nonfatal event, we may write Y(a)(t)=I(T(a)t)+·I(D(a)t). Hence, Y(a)(t)=0,1, and if the patient has been event-free, experienced the nonfatal event but remained alive, and died, respectively, by time t. As a result, Y(a)(t)<Y(1a)(t) means one of two mutually exclusive scenarios: (i) event-free versus having experienced the nonfatal event but alive, that is, T˜(a)>t and T(1a)t<D(1a), where T˜(a)=D(a)T(a) (with bc=min(b,c)); (ii) alive (regardless of the nonfatal event) versus dead, that is, D(a)>t and D(1a)t. By (2), we have thatWa,1a(τ)=0τPr(T˜(a)>t,T(1a)t<D(1a))dt+0τPr(D(a)>t,D(1a)t)dt=0τS˜(a)(t){S(1a)(t)S˜(1a)(t)}dt+0τS(a)(t){1S(1a)(t)}dt,where S(a)(t)=Pr(D(a)>t) and S˜(a)(t)=Pr(T˜(a)>t). Hence,(3) μ(τ)=W1,0(τ)W0,1(τ)=0τ{S˜(1)(t)S(0)(t)S˜(0)(t)S(1)(t)}dt+0τ{S(1)(t)S(0)(t)}dt.(3)

The second term on the far right hand side of (3) is just an alternate expression of the group difference in the standard RMST, that is, E(D(1)τ)E(D(0)τ) (Royston and Parmar Citation2011).

We have already established that S(a)(t)=exp{exp(aθD)λDt}. It is also easy to see from (1) that S˜(a)(t)=Pr(D(a)>t,T(a)>t)=exp{exp(aθ)λt}, where λ=(λDκ+λHκ)1/κ and θ=κ1log{exp(κθD)λDκ+exp(κθH)λHκ}log(θ). Inserting these expressions into (3) and carrying out the integration, we find that(4) μ(τ)=f{τ|exp(θ)λ+λD,λ+exp(θD)λD}+f{τ|exp(θD)λD,λD},(4) where f(τ|x,y)=x1{1exp(xτ)}y1{1exp(yτ)}. By similar derivation, we can express the group difference in RMST and RMEST as(5) μD(τ)=E(D(1)τ)E(D(0)τ)=f{τ|exp(θD)λD,λD}andμE(τ)=E(T˜(1)τ)E(T˜(0)τ)=f(τ|exp(θ)λ,λ),(5) respectively.

2.3 Test Statistic and Null Variance

In practice, the outcome events may be censored due to study termination or random loss to follow-up. Denote the censoring time of the generic patient in group a by C(a) (a = 1, 0), which is assumed to be independent of (D(a),T(a)). We then observe (X˜(a),δ˜(a),X(a),δ(a)), where X˜(a)=T˜(a)C(a),δ˜(a)=I(T˜(a)C(a)),X(a)=D(a)C(a), and δ(a)=I(D(a)C(a)). Given a random na -sample of (X˜(a),δ˜(a),X(a),δ(a)) for group a (a = 1, 0), one can estimate S˜(a)(t) and S(a)(t) by the usual Kaplan–Meier estimator using the subsamples for (X˜(a),δ˜(a)) and (X(a),δ(a)), respectively, and then insert the estimates in (3) to obtain a nonparametric estimator μ̂(τ) for μ(τ).

Let σ̂2 denote the influence function-based variance estimator for n1/2{μ̂(τ)μ(τ)} given in Proposition 1 of Mao (Citation2021), where n=n1+n0 is the total sample size. Then, a two-sided level-α test (0<α<1) rejects the null hypothesis of no treatment effect (under which θD=θH=0 so that μ(τ)=0) if Sn:=n1/2σ̂(τ)1|μ̂(τ)|>z1α/2, where z1α/2=Φ1(1α/2) and Φ(·) is the cumulative distribution function of the standard normal distribution.

The power of this test depends on the distribution of Sn under the alternative hypothesis HA , when θD and/or θH are nonzero. Under a standard asymptotic setting in which θD,θH=O(n1/2), we can use the normal approximation SnHAN{n1/2σ0(τ)1μ(τ),1}, where σ02(τ) is the asymptotic variance of n1/2μ̂(τ) under the null (see e.g., sec. 14.1 of van der Vaart Citation1998). Assume without loss of generality that μ(τ)>0. Then the rejection rate under HA is approximated by(6) PrHA(Sn>z1α/2)=Φ{n1/2σ0(τ)1μ(τ)z1α/2}.(6)

Conversely, we can use (6) to solve for the sample size n needed for a target power 1β:(7) n=σ02(τ)(z1α/2+z1β)2μ(τ)2.(7)

In both (6) and (7), μ(τ) can be calculated through (4) as a function of θD,θH,λD,λH, and κ. It thus remains to calculate σ02(τ). Unlike the estimand, the variance depends on the censoring distribution, which is heavily influenced by trial design. As a standard set-up, suppose that patients are recruited uniformly over an initial period [0,τb] and are followed until time τc , where τc>τb. This results in administrative censoring times (those due to study termination) distributed as Un[τcτb,τc]. In addition, we assume a constant hazard λL for random loss to follow-up. Combining the two mechanisms together, we have that C(a)Un[τcτb,τc]Expn(λL) so that Pr(C(a)>t)=[{τb1(τct)}1]exp(λLt).

With that we can express in closed form the key quantities in the influence functions for variance estimation. Of particular importance are the “at-risk” probabilitiesπ˜(0)(t)=Pr(X˜(0)>t)=(τctτb1)exp{(λL+λ)t}andπ(0)(t)=Pr(X(0)>t)=(τctτb1)exp{(λL+λD)t}.

With details relegated to the supplementary materials, we can reduce the null variance to σ02(τ)=q1(1q)1ζ02(τ), where q=n1/n is the proportion of patients randomized to the treatment and(8) ζ02(τ)=E(0τ[exp{(λ+λD)t}{δ˜(0)I(X˜(0)t)π˜(0)(X˜(0))1λ0X˜(0)tπ˜(0)(s)1ds}+exp(λDt){1exp(λt)}{δ(0)I(X(0)t)π(0)(X(0))1λD0X(0)tπ(0)(s)1×ds}]dt)2.(8)

The expectand on the right hand side of (8) can be computed via numerical integration, as the integrand consists only of analytic functions. The same strategy can be applied to the outer expectation using closed-form density functions of the data. But that is less convenient than the Monte Carlo approach—generate a large sample of (X˜(0),δ˜(0),X(0),δ(0)) and approximate the expectation by the empirical average. Our numerical experiment shows that a Monte Carlo sample size of N=10,000 combined with M = 500 knots for numerical integration mostly controls the error within ±1%. That level of precision is more than needed for subsequent use. Moreover, a single Monte Carlo sample can be used to evaluate ζ02(τ) for multiple τ (with a variable upper limit for the integral). In R, it takes only about 0.5 sec on a Dell XPS desktop with Intel Core i9-10900X (3.70 GHz) to evaluate a typical instance of ζ02(τ) for a whole spectrum of τ.

2.4 Practical Implementations

So far we have derived the “signal” μ(τ) and “noise” ζ02(τ) needed to calculate the power and sample size. To highlight their dependence on the underlying parameters, we switch to the notation μ(τ;θD,θH,λD,λH,κ) and ζ02(τ;λD,λH,κ,τb,τc,λL), respectively. Thus, (6) and (7) become(9) Power: 1β=Φ{nq(1q)μ(τ;θD,θH,λD,λH,κ)ζ0(τ;λD,λH,κ,τb,τc,λL)z1α/2}(9) and(10) Sample size: n=ζ02(τ;λD,λH,κ,τb,τc,λL)(z1α/2+z1β)2q(1q)μ2(τ;θD,θH,λD,λH,κ),(10) respectively. Usually patients are randomized 1:1 into the treatment and control groups, in which case q = 0.5. Nevertheless, the formulas can also handle unequal allocations should that be desirable in practice. The evaluation of ζ02(τ;λD,λH,κ,τb,τc,λL) is the rate-limiting step. But since it is independent of (θD,θH), we only need a one-off computation to get the power and sample size for different HRs.

Much hassle is saved by using our R functions rmtpower() and rmtss(), which carry out the calculations in (9) and (10), respectively. The rmtpower() function accepts vector-valued n, τ, exp(θD), and exp(θH) (the last two of which must be commensurate), along with scaler-valued λD,λH,κ,τb,τc,λL, and α, and returns a three-dimensional array of power for every combination of the specified sample size, restricting time, and HRs. The rmtss() function works in a similar way except that a vector-valued 1β should be supplied in place of n and a three-dimensional array of sample size will be returned for every combination of the specified target power, restricting time, and HRs. In both functions, the user can choose whether to apply the same to the RMST and RMEST, whose effect sizes are given in (5) and whose variances are obtained as a byproduct of the computation in Section 2.3. More details on this can be found in the supplementary materials.

Flexible use of the power and sample size programs can provide other useful information about trial design. For example, given a plausible estimate of patient accrual rate ρ, the investigator may need to know how long the accrual should last in order to have enough patients to achieve the target power. One can answer that question by plotting n as a function of τb[0,τc], using rmtss(), with other parameters held constant, and then intersecting the curve with the straight line n=ρτb. The x-coordinate of the intersection point is the smallest τb needed to achieve the target power and the y-coordinate is the corresponding sample size. Nonintersecting curves mean that the target power is unattainable under the specified τc and ρ. To attain it, the investigator must either lengthen the trial or hasten recruitment.

Finally, the baseline parameters (λD,λH,κ) for the outcome distribution can be estimated from an external “pilot” study, that is, one conducted earlier with a similar patient population. We can do this using the simple estimators proposed in Mao et al. (Citation2021). Specifically, let (X˜i(0),δ˜i(0),Xi(0),δi(0)) (i=1,,N0) denote an N0-random sample of (X˜(0),δ˜(0),X(0),δ(0)) to be used as “pilot” data. We can then estimate λD , κ, and λH byλ̂D=i=1N0δi(0)i=1N0Xi(0),κ̂=log(1λ̂H#/λ̂)/log(λ̂D/λ̂),andλ̂H=λ̂H#1/κ̂λ̂11/κ̂, respectively, whereλ̂=i=1N0δ˜i(0)i=1N0X˜i(0) andλ̂H#=i=1N0I(X˜(0)<X(0))i=1N0X˜i(0).

These estimators are justified in sec. 3.2 of Mao et al. (Citation2021) and implemented in the R function gumbel.est().

3 Simulation Studies

We used simulations to assess the accuracy of the power and sample size formulas. We first focused on the power function in (9) for the RMT-IF in comparison with those for the RMST and RMEST given in Section S1.2 of the supplementary materials. Let τb=3,τc=4, and λL=0.01. For the outcome model (1), we set λD=0.05,0.4, λH=0.8, and κ=1.5 (giving rise to Kendall’s rank correlation 33.3%). With λD=0.05, the observed mortality rate is about 10% and nonfatal event rate about 80%; with λD=0.4, the observed mortality rate is about 55% and nonfatal event rate about 65%.

Fixing exp(θD)=0.8, we plot the theoretical power curves of the RMT-IF, RMST, and RMEST as functions of exp(θH) in under sample size n = 2000, Type I error rate α=0.05, and treatment-to-control randomization ratio 1:1 (q = 0.5). Whether λD=0.05 or 0.4, the power curve of the RMST stays flat with regard to exp(θH) as the test does not draw on the nonfatal event at all. When mortality rate is low, the RMST is easily outperformed by both RMT-IF and RMEST as soon as the nonfatal event takes on a moderate treatment effect. When mortality rate is high, death alone is sufficient to power the RMST. On the other hand, the RMEST can be severely underpowered when the treatment has minimal effect on the nonfatal event. This is not surprising as the composite first event is still dominated by the nonfatal type, which can mask the beneficial effect on overall survival. By contrast, the RMT-IF, with overall survival prioritized, stays largely robust to the varying effect size on the nonfatal component. The RMT-IF’s general power advantage over the two competitors also owes to its using more events per patient.

Fig. 1 Theoretical and empirical power of the RMT-IF, RMST, and RMEST under model (1) with λH=0.8, exp(θD)=0.8 and n = 2000 as a function of exp(θH). Dashed lines, theoretical power curve; round dots, empirical power based on 5000 replicates.

These comparisons are based on theoretical formulas, albeit numerically evaluated. We also computed the empirical rejection rates of the three tests under different exp(θ) based on 5000 replicates of simulated data, and then overlaid them on the theoretical curves in . We can see that the empirical values roughly coincide with the theoretical predictions, showing the validity of the asymptotic formulas in finite samples.

Next, we evaluated the sample size formula (10) for the RMT-IF. We used the same set-up as in the first simulations except that we set a moderate death hazard λD=0.2 and varied κ over 1.0, 2.0, and 3.0 (corresponding to between-component Kendall’s rank correlations 0,50%, and 66.7%, respectively, Oakes Citation1989). The observed mortality rate under this set-up is about 35% (thus, censoring rate 65%) and nonfatal event rate about 75%. We considered randomization ratios 1:1 (q = 0.5) and 2:1 (q = 0.67). For each scenario, the sample size n required to achieve power 1β=0.8 or 0.9 is computed by (10), and then 5000 samples of size n are generated to compute the empirical power of the RMT-IF test. shows strong agreement between the empirical and target power in all scenarios, especially for large n, indicating the overall accuracy of the formula. We also considered set-ups with lower and higher censoring rates in the supplementary materials and the results are comparable to those in .

Table 1 Simulation results on the empirical power of sample sizes computed from (10) under model (1) with HRs exp(θD) and exp(θH).

Finally, we turned to the problem of finding the minimum length of accrual τb given a fixed accrual rate ρ and total length of follow-up τc . We used the same set-up as that of except that we considered a variable τb while fixing exp(θD)=exp(θH)=0.8,κ=1.5, and τ = 3 for simplicity. For target power 1β=0.8 and 0.9, we plotted the sample size n computed by (10) as a function of τb in . As τb approaches τc , the administrative censoring time becomes stochastically shorter, driving up the censoring rate and thus the total number of patients needed. Then, as suggested in Section 2.4, we found the intersection points of the sample size curve with n=ρτb for ρ=400,800 per unit time, as shown in . Based on the coordinates (τb,n) of each intersection point, we simulated 5000 samples to compute the empirical power of the RMT-IF test. As shown in , the empirical power is approximately equal to the target power under which τb and n are derived.

Fig. 2 Finding the minimum τb and n under fixed accrual rate ρ and total trial duration τc=4. Dashed line, ρ = 400; dotted line, ρ = 800.

Table 2 Simulation results on the empirical power of n and τb computed from .

4 A Real Example

Lymph node-positive breast cancer occurs when cancerous cells spread from the primary location to nearby lymph nodes. Left untreated, the malignancy is likely to metastasize to other parts of the body, threatening the patient’s life. The standard treatment of node-positive breast cancer is chemotherapy, sometimes with hormonal supplements. In the late 1980s, the German Breast Cancer (GBC) study group evaluated the effectiveness of different cycles of chemotherapy and of hormonal treatment with tamoxifen on 720 node-positive breast cancer patients in a 2 × 2 design (Schmoor et al. Citation1996). The study showed moderate benefits of longer duration of chemotherapy and hormonal treatment on the patient’s relapse-free survival time. On the other hand, both chemotherapy and hormonal treatment are associated with severe side effects such as hair loss, nausea, diarrhea, and heart problems.

Suppose that an investigational treatment is to be tested against the standard care of node-positive breast cancer patients. Patient outcomes can be analyzed using the RMT-IF with cancer relapse as the nonfatal event along with death. We can use our power and sample size formulas to design the new trial, with baseline parameters estimated from the GBC study. To do so, we consider the same subset of 686 patients analyzed in Sauerbrei and Royston (Citation1999). With median and maximum lengths of follow-up of 3.8 and 7.2 years, respectively, the observed mortality rate is about 25% and relapse rate about 38%. More formally, using the procedures described in Section 2.4, we find that λ̂D=0.069 year– 1, λ̂H=0.131 year– 1, and κ̂=3.9 (the last of which means that Kendall’s rank correlation between death and relapse is 1κ̂175%). Collectively they lead to a relapse-free survival hazard of λ̂=0.134 year– 1. To check the validity of (1) as a working model, especially the assumption of constant hazard rates, we plot the Nelsen–Aalen estimates of the component-wise and overall cumulative hazards in Figure S1 of the supplementary materials. The nonparametric curves appear well approximated by the model-based straight lines, indicating a reasonable fit of the Gumbel–Hougaard model. Based on the hazard rate estimates, the 5-year overall and relapse-free survival rates are 71% and 51%, respectively; the 5-year RMST and RMEST are 4.2 and 3.6 years, respectively. These estimates are roughly in line with common epidemiologic evidence on node-positive breast cancer patients at large (Chu et al. Citation1996).

To size the new trial for a 5-year RMT-IF test with level 0.05, suppose that the trial will last τc=7 years in total, with the first τb=3 years devoted to patient enrollment. With randomization ratio 1:1 and 2:1, the sample size as a function of the HRs exp(θD) and exp(θH) is computed by (10) and displayed as surface plots in . In particular, under randomization ratio 1:1 and HRs {exp(θD),exp(θH)}=(0.6,0.6) and (0.9,0.9), we need n = 471 and 8450 to achieve 80% power, and n = 630 and 11,312 to achieve 90% power, respectively.

Fig. 3 Sample size needed to power a 5-year RMT-IF test as a function of the HRs on death and relapse with baseline parameters estimated from the GBC data.

To explore the relationship between the rate and duration of patient accrual, we compute τb as a function of ρ under the fixed τc=7 years using the method described in Section 2.4. For varying effect sizes, the corresponding τb and associated sample size n are tabulated in . Given a plausible estimate of the accrual rate, the investigator can use this table to look up the length of accrual needed to reach the target power.

Table 3 Duration of accrual τb (years) and sample size n under different accrual rate ρ (year– 1) with baseline parameters estimated from the GBC data.

Finally, it should be noted that in some studies a new line of treatment is initiated for patients upon relapse of cancer. The treatment switch as an intercurrent event (ICH Citation2020) is better accounted for as a competing risk (Conner and Trinquart Citation2021). In such cases, the RMT-IF can be estimated using the cumulative incidence functions of the component events as suggested in sec. 3.2 of Mao (Citation2021). Meanwhile, sample size calculation needs to be redesigned accordingly.

5 Concluding Remarks

Under a bivariate copula model, we have derived power and sample size formulas for the RMT-IF, a new way to analyze prioritized composite endpoints of death and nonfatal events. Our approach allows for between-component correlation (quantifiable through data on previous studies) as well as component-specific treatment effects. The simplicity of the developed routines owes to the analytic expression of the estimand and the simplified form of the null variance, which together make it possible to compute the noncentrality parameter efficiently for varying restricting times. Investigators can now use the R-package rmt (https://cran.r-project.org/package=rmt) to design new trials based on the RMT-IF analysis.

The Gumbel–Hougaard copula in (1) is particularly convenient working model for the RMT-IF as it leads to a simple expression for the estimand. Other copulas may not be nearly as nice in that regard (Nelsen Citation2007). However, the model is also highly restrictive, especially concerning the time-constancy of hazard rates. In practice, it is recommended that the investigator perform model checking using data from similar studies, like we did in Section 4, before using the formulas to design new trials. In the future, it will be of interest to relax the working model to allow for time-dependent and non-proportional hazard structures, similarly to Yung and Liu (Citation2020) in the univariate case, and to try out other joint models, for example, Clayton copulas (Clayton Citation1978), for sensitivity analysis.

We have considered only the case with a single occurrence of nonfatal event besides death. In some trials, especially those in chronic diseases, one patient can experience an adverse event repeatedly (Rogers et al. Citation2014; Mao and Kim Citation2021). In the recent INfluenza Vaccine to Effectively Stop CardioThoracic Events and Decompensated Heart Failure (INVESTED) trial (NCT02787044), for example, a high-risk cardiovascular patient could be hospitalized up to eight times per influenza season (Vardeny et al. 2021). Such recurrent events can be analyzed by the RMT-IF equally well, only with a slightly modified definition of “a better outcome” (i.e., being alive or having had fewer recurrent events). However, joint models like (1) are ill suited for recurrent events because they rarely hold when the event times are inherently ordered (Mao et al. Citation2022). Consequently, a new framework is needed for trial design.

We have focused on the power and sample size calculations under a fixed design. For ethical as well as financial considerations, real trials often prefer, or even mandate, interim analysis in the course of follow-up to assess whether the trial should be stopped early (Jennison and Turnbull Citation1999). With a model-free metric like the RMST and RMT-IF, group sequential analysis is complicated by the lack of an explicit correlation structure between the interim test statistics. Recently, Luo et al. (Citation2019) obtained simple results for group sequential RMST tests assuming piecewise exponential distributions for the event and censoring times. Lu and Tian (Citation2021) detailed a fully nonparametric approach which determines the rejection regions empirically using the data available at each interim look. The design and implementation of group sequential RMT-IF analysis remain an open problem.

Supplemental material

Supplements.zip

Download Zip (90.5 KB)

Supplementary Materials

Supplementary Materials online include technical details and additional simulation results referenced in Sections 2 and 3, as well as the R-code and data to reproduce the analysis in Section 4. The methods proposed are implemented in the R-package rmt openly available on the Comprehensive R Archive Network (https://cran.r-project.org/package=rmt).

Additional information

Funding

This research was supported by the National Institutes of Health grant R01HL149875.

References

  • Akacha, M., Bretz, F., Ohlssen, D., Rosenkranz, G., and Schmidli, H. (2017), “Estimands and their Role in Clinical Trials,” Statistics in Biopharmaceutical Research, 9, 268–271. DOI: 10.1080/19466315.2017.1302358.
  • Akacha, M., Bretz, F., and Ruberg, S. (2017), “Estimands in Clinical Trials–Broadening the Perspective,” Statistics in Medicine, 36, 5–19. DOI: 10.1002/sim.7033.
  • Anker, S. D., and McMurray, J. V. (2012), “Time to Move on from “Time-to-First”: Should All Events be Included in the Analysis of Clinical Trials,” European Heart Journal, 33, 2764–2765.
  • Bebu, I., and Lachin, J. M. (2016), “Large Sample Inference for a Win Ratio Analysis of a Composite Outcome Based on Prioritized Components,” Biostatistics, 17, 178–187. DOI: 10.1093/biostatistics/kxv032.
  • Buyse, M. (2010), “Generalized Pairwise Comparisons of Prioritized Outcomes in the Two-sample Problem,” Statistics in Medicine, 29, 3245–3257. DOI: 10.1002/sim.3923.
  • Chu, K. C., Tarone, R. E., Kessler, L. G., Ries, L. A., Hankey, B. F., Miller, B. A., and Edwards, B. K. (1996), “Recent Trends in US Breast Cancer Incidence, Survival, and Mortality Rates,” JNCI: Journal of the National Cancer Institute, 88, 1571–1579. DOI: 10.1093/jnci/88.21.1571.
  • Clayton, D. G. (1978), “A Model for Association in Bivariate Life Tables and its Application in Epidemiological Studies of Familial Tendency in Chronic Disease Incidence,” Biometrika, 65, 141–151. DOI: 10.1093/biomet/65.1.141.
  • Conner, S. C., and Trinquart, L. (2021), “Estimation and Modeling of the Restricted Mean Time Lost in the Presence of Competing Risks,” Statistics in Medicine, 40, 2177–2196. DOI: 10.1002/sim.8896.
  • Dong, G., Huang, B., Chang, Y.-W., Seifu, Y., Song, J., and Hoaglin, D. C. (2020), “The Win Ratio: Impact of Censoring and Follow-up Time and Use with Nonproportional Hazards,” Pharmaceutical Statistics, 19, 168–177. DOI: 10.1002/pst.1977.
  • Eaton, A., Therneau, T., and Le-Rademacher, J. (2020), “Designing Clinical Trials with (Restricted) Mean Survival Time Endpoint: Practical Considerations,” Clinical Trials, 17, 285–294. DOI: 10.1177/1740774520905563.
  • Fleming, T. R., and Harrington, D. P. (1991), Counting Processes and Survival Analysis, Hoboken, NJ: Wiley.
  • Freemantle, N., Calvert, M., Wood, J., Eastaugh, J., and Griffin, C. (2003), “Composite Outcomes in Randomized Trials: Greater Precision but With Greater Uncertainty,” Journal of the American Medical Association, 289, 2554–2559. DOI: 10.1001/jama.289.19.2554.
  • Hoeffding, W. (1948), “A Class of Statistics With Asymptotically Normal Distributions,” Annals of Mathematical Statistics, 19, 293–325. DOI: 10.1214/aoms/1177730196.
  • ICH. (2020), ICH E9 (R1) Addendum on Estimands and Sensitivity Analysis in Clinical Trials to the Guideline on Statistical Principles for Clinical Trials, Step 5, London: European Medicines Evaluation Agency. Accessed November 29, 2019.
  • Jennison, C., and Turnbull, B. W. (1999), Group Sequential Methods with Applications to Clinical Trials, Boca Raton, FL: Chapman & Hall/CRC.
  • Lin, D. Y., Wei, L.-J., Yang, I., and Ying, Z. (2000), “Semiparametric Regression for the Mean and Rate Functions of Recurrent Events,” Journal of the Royal Statistical Society, Series B, 62, 711–730. DOI: 10.1111/1467-9868.00259.
  • Lu, Y., and Tian, L. (2021), “Statistical Considerations for Sequential Analysis of the Restricted Mean Survival Time for Randomized Clinical Trials,” Statistics in Biopharmaceutical Research, 13, 210–218. DOI: 10.1080/19466315.2020.1816491.
  • Luo, X., Huang, B., and Quan, H. (2019), “Design and Monitoring of Survival Trials based on Restricted Mean Survival Times,” Clinical Trials, 16, 616–625. DOI: 10.1177/1740774519871447.
  • Luo, X., Tian, H., Mohanty, S., and Tsai, W. Y. (2015), “An Alternative Approach to Confidence Interval Estimation for the Win Ratio Statistic,” Biometrics, 71, 139–145. DOI: 10.1111/biom.12225.
  • Mao, L. (2019), “On the Alternative Hypotheses for the Win Ratio,” Biometrics, 75, 347–351. DOI: 10.1111/biom.12954.
  • Mao, L. (2021), “On Restricted Mean Time in Favor of Treatment,” Biometrics. DOI: 10.1111/biom.13570..
  • Mao, L., and Kim, K. (2021), “Statistical Models for Composite Endpoints of Death and Non-fatal Events: A Review,” Statistics in Biopharmaceutical Research, 13, 260–269. DOI: 10.1080/19466315.2021.1927824.
  • Mao, L., Kim, K., and Li, Y. (2022), “On Recurrent-Event Win Ratio,” Statistical Methods in Medical Research, 31, 1120–1134. DOI: 10.1177/09622802221084134.
  • Mao, L., Kim, K., and Miao, X. (2021), “Sample Size Formula for General win Ratio Analysis,” Biometrics. DOI: 10.1111/biom.13501..
  • McCaw, Z. R., Yin, G., and Wei, L.-J. (2019), “Using the Restricted Mean Survival Time Difference as an Alternative to the Hazard Ratio for Analyzing Clinical Cardiovascular Studies,” Circulation, 140, 1366–1368. DOI: 10.1161/CIRCULATIONAHA.119.040680.
  • Nelsen, R. B. (2007), An Introduction to Copulas, New York: Springer.
  • Oakes, D. (1989), “Bivariate Survival Models Induced by Frailties,” Journal of the American Statistical Association, 84, 487–493. DOI: 10.1080/01621459.1989.10478795.
  • Oakes, D (2016), “On the Win-Ratio Statistic in Clinical Trials With Multiple Types of Event,” Biometrika, 103, 742–745.
  • Pocock, S., Ariti, C., Collier, T., and Wang, D. (2012), “The Win Ratio: A New Approach to the Analysis of Composite Endpoints in Clinical Trials Based on Clinical Priorities,” European Heart Journal, 33, 176–182. DOI: 10.1093/eurheartj/ehr352.
  • Rogers, J. K., Pocock, S. J., McMurray, J. J., Granger, C. B., Michelson, E. L., Östergren, J., Pfeffer, M. A., Solomon, S. D., Swedberg, K., and Yusuf, S. (2014), “Analysing Recurrent Hospitalizations in Heart Failure: A Review of Statistical Methodology, With Application to CHARM-Preserved,” European Journal of Heart Failure, 16, 33–40. DOI: 10.1002/ejhf.29.
  • Royston, P., and Parmar, M. K. (2011), “The Use of Restricted Mean Survival Time to Estimate the Treatment Effect in Randomized Clinical Trials when the Proportional Hazards Assumption is in Doubt,” Statistics in Medicine, 30, 2409–2421. DOI: 10.1002/sim.4274.
  • Royston, P., and Parmar, M. K. (2013), “Restricted Mean Survival Time: An Alternative to the Hazard Ratio for the Design and Analysis of Randomized Trials with a Time-to-Event Outcome,” BMC Medical Research Methodology, 13, 152.
  • Sauerbrei, W., and Royston, P. (1999), “Building Multivariable Prognostic and Diagnostic Models: Transformation of the Predictors by using Fractional Polynomials,” Journal of the Royal Statistical Society, Series A, 162, 71–94. DOI: 10.1111/1467-985X.00122.
  • Schmoor, C., Olschewski, M., and Schumacher, M. (1996), “Randomized and Non-randomized Patients in Clinical Trials: Experiences with Comprehensive Cohort Studies,” Statistics in Medicine, 15, 263–271. DOI: 10.1002/(SICI)1097-0258(19960215)15:3<263::AID-SIM165>3.0.CO;2-K.
  • Tian, L., Fu, H., Ruberg, S. J., Uno, H., and Wei, L.-J. (2018), “Efficiency of Two Sample Tests via the Restricted Mean Survival Time for Analyzing Event Time Observations,” Biometrics, 74, 694–702. DOI: 10.1111/biom.12770.
  • Tsiatis, A. (2006), Semiparametric Theory and Missing Data, New York: Springer.
  • Uno, H., Claggett, B., Tian, L., Inoue, E., Gallo, P., Miyata, T., Schrag, D., Takeuchi, M., Uyama, Y., Zhao, L., Skali, H., Solomon, S., Jacobus, S., Hughes, M., Packer, M., and Wei, L.-J. (2014), “Moving Beyond the Hazard Ratio in Quantifying the between-Group Difference in Survival Analysis,” Journal of Clinical Oncology, 32, 2380–2385. DOI: 10.1200/JCO.2014.55.2208.
  • van der Vaart, A. W. (1998), Asymptotic Statistics, Cambridge: Cambridge University Press.
  • Vardeny, O., Kim, K., Udell, J. A., Joseph, J., Desai, A. S., Farkouh, M. E., Hegde, S. M., Hernandez, A. F., McGeer, A., Talbot, H. K., Anand, I., Bhatt, D. L., Cannon, C. P., DeMets, D., Michael Gaziano, J., Goodman, S. G., Nichol, K., Tattersall, M. C., Temte, J. L., Wittes, J., Yancy, C., Claggett, B., Chen, Y., Mao, L., Havighurst, T. C., Cooper, L. S., Solomon, S. D., and INVESTED Committees and Investigators. (2021), “Effect of High-Dose Trivalent vs Standard-Dose Quadrivalent Influenza Vaccine on Mortality or Cardiopulmonary Hospitalization in Patients with High-Risk Cardiovascular Disease: A Randomized Clinical Trial,” JAMA, 325, 39–49. DOI: 10.1001/jama.2020.23649.
  • Wei, L.-J., Lin, D. Y., and Weissfeld, L. (1989), “Regression Analysis of Multivariate Incomplete Failure Time Data by Modeling Marginal Distributions,” Journal of the American Statistical Association, 84, 1065–1073. DOI: 10.1080/01621459.1989.10478873.
  • Yung, G., and Liu, Y. (2020), “Sample Size and Power for the Weighted Log-rank Test and Kaplan-Meier based Tests With Allowance for Nonproportional Hazards,” Biometrics, 76, 939–950. DOI: 10.1111/biom.13196.