533
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Weighted Hazard Ratio Estimation for Delayed and Diminishing Treatment Effect

ORCID Icon &
Received 02 Aug 2022, Accepted 10 Nov 2023, Published online: 05 Jan 2024

Abstract

Nonproportional hazards (NPH) have been observed in confirmatory clinical trials with time to event outcomes. Under NPH, the hazard ratio does not stay constant over time and the log rank test is no longer the most powerful test. The weighted log rank test (WLRT) has been introduced to deal with the presence of nonproportionality. We focus our attention on the WLRT and the complementary Cox model based on time varying treatment effect proposed by Lin and León. We investigate whether the proposed weighted hazard ratio (WHR) approach is unbiased in scenarios where the WLRT statistic is the most powerful test. In the diminishing treatment effect scenario where the WLRT statistic would be most optimal, the time varying treatment effect estimated by the Cox model estimates the treatment effect very close to the true one. However, when the true hazard ratio is large the proposed model overestimates the treatment effect and the treatment profile over time. In the delayed treatment scenario, the estimated treatment effect profile over time is typically close to the true profile. For both scenarios, we have demonstrated analytically that the hazard ratio functions are approximately equal under small treatment effects. When the assumed rate of how quickly the treatment effect profile is diminishing or delaying differs in the analysis from that in the true data generating mechanism, the estimated hazard ratio profile from the WHR approach is biased. Since in practice the true HR time profile may differ from that assumed in the WHR analysis, it may be preferable to use alternative approaches for effect estimation.

1 Introduction

Randomised clinical trials (RCTs) are regarded as the gold standard for evaluating the effectiveness of a new treatment. In RCTs with time to event outcomes, the primary outcome is to measure the differences in the survival curves of the different arms and report the treatment effect to quantify treatment differences. The analysis of such trials are routinely accomplished by the log rank test and the Cox proportional hazard (PH) model. The hazard ratio is obtained from the Cox PH model to measure the effectiveness of the new treatment compared to the control group. The Cox PH model assumes that the hazard ratio between the treatment and control group stays constant from the start of study period until the end of the follow up period. Under PH, the log rank test is the most powerful test to detect differences in the survival curves (Fleming and Harrington Citation1991). However, nonproportional hazards (NPH) are commonly observed in RCTs and have been reported in many published trials for example, the IPASS trial in lung cancer (Mok et al. Citation2009) and the ICON7 trial in ovarian cancer (Kristensen et al. Citation2011). When the PH assumption does not hold, the log rank test is still valid but may suffer substantial power loss and the interpretation of the hazard ratio becomes difficult (Hernán Citation2010). There may be many reasons for non PH to exist in clinical trials, for instance due to delayed clinical effect in the case of immuno-oncology trials.

Various statistical methods have been proposed to analyze time to event outcomes in clinical trials under nonproportional hazards scenarios including restricted mean survival time (RMST) (Royston and Parmar Citation2013), weighted log rank tests (Fleming and Harrington Citation1981) and maximum combination test (MaxCombo) (Lin et al. Citation2020). The RMST measures the average survival time from time t = 0 to some pre-specified time horizon t=τ and it may be estimated as the area under the survival curve up to that time point. The weighted log rank test based on the Fleming-Harrington class of weights Gρ,γ with weight S(t)ρ (1S(t))γ where S(t) is the survival function of the pooled patients from control and treatment group; the parameters ρ, γ allows one to down weight early, late or middle events. The MaxCombo test uses the Fleming-Harrington weight function to analyze time to event outcomes in the presence of non PH. The test avoids having to pre-specify the weight function and considers a procedure which selects the best test from the set of test statistics. The procedure consist of selecting the best combination of weighted log rank test statistics with different choices of ρ and γ. When the MaxCombo approach is used, Lin et al. (Citation2020) suggest using the weighted hazard ratio for effect estimation. The estimated effect is obtained from a Cox model with a time-varying effect, where the weights are those associated with the weighted log rank test (Lin and León Citation2017).

This article focuses on the estimation of treatment effect under nonproportional hazards, particularly the weighted hazard ratio method proposed by Lin and León (Citation2017). They propose fitting a particular Cox model with a time-varying covariate/effect for which the score test corresponds to a weighted log rank test. The model proposes an effect adjustment factor A(t) = w(t)max(w(t)) where w(t) is the weight function from the chosen weighted log rank test. This is incorporated in to the Cox PH model to provide time varying effect which can be viewed as the treatment coefficient β weighted by the adjustment factor A(t). The hazard ratio obtained from the model is expressed as HR(t)=eβA(t) = [HRF]A(t) where HRF represents the maximal effect at time t with A(t) = 1. In this article, we are interested in investigating whether the estimated treatment effect is unbiased under scenarios in which the associated weighted log rank test is the most powerful.

In Section 2, we describe the standard weighted log rank test and the proposed weighted hazard ratio estimation (Lin and León Citation2017) and its estimation of the time-varying HR. In Section 3, we compare the hazard ratio functions obtained from the proposed model with the true hazard ratio functions analytically. In Section 4, we assess, through simulations, the performance of the proposed model under the two non PH scenario with different choices of ρ and γ. Section 5 is a discussion.

2 Weighted Log Rank Tests and Effect Estimation

In this section, we will provide a brief overview of the log rank test and the weighted log rank test, as well as introduce weighted hazard ratio for effect estimation proposed by Lin and León (Citation2017).

2.1 Weighted Log Rank Test

Let t1,t2,…,tr be the ordered failure times across both treatment arms, d1j and d2j be the number of deaths at tj in treatment and control arm, respectively with dj= d1j +d2j and n1j and n2j be the number of individuals at risk before tj with nj = n1j + n2j. Let X1,X2,,Xr denote the treatment assignment where Xj=1 if the jth subject is assigned to treatment arm and Xj=0 if assigned to control arm. The log-rank test statistic is defined as, Z=j=1r(d1je1j)(j=1rVL)1/2,where e1j= n1jdjnj is the expected number of failure at time t(j) under the null hypothesis of equal survival curves. VL is the variance of the observed number of failures, VL = j=1rn1jn2jdj(njdj)nj2(nj1). Under the null hypothesis of equal survival functions in the two treatment groups, asymptotically the test statistic has a Chi-squared distribution on one degree of freedom. Although the log rank test is still a valid test under nonproportional hazards it is not the most powerful test (Royston and Parmar Citation2013). Therefore, the weighted version of log rank test is sometimes used when the ratio of the two hazard functions are not constant over time (Fleming and Harrington Citation1981).

The weighted log rank test (WLRT) assigns a weight function to different time points depending on the expected type of nonproportional hazard scenario. For instance, in immuno-oncology trials there may be a delayed treatment effect and therefore the survival curve for the treatment group will only emerge to separate from the control survival curve after a certain period of time. In this case, higher weights can be allocated to later time points (Lin et al. Citation2020). Formally, the WLRT is defined as, (1) WK=j=1rw(tj)(d1je1j)(j=1rVL)1/2,(1) with variance VK=j=1rw(t(j))2(n1jn2jdj(njdj)nj2(nj1)) where w(·) is the nonnegative weight function. We will consider the Fleming-Harrington family of weighted log rank test, commonly denoted as Gρ,γ with weight, (2) w(t)=Ŝ(t)ρ(1Ŝ(t))γ(2) where Ŝ(t) is the Kaplan- Meier estimate of the survival function of the pooled patients from the treatment and control arm. The parameters ρ and γ control the shape of the weight function. G0,0 represents the log-rank test with w(t) = 1 that has constant weight over time; delayed treatment effects can be tested using G0,γ with w(t)=(1Ŝ(t))γ that allocates higher weight at later time points to detect late survival difference, G1,0 represents diminishing effect with w(t)=Ŝ(t) that gives more weight the earlier time points to detect early separation; G1,1 represents mid separation with w(t)=Ŝ(t)(1Ŝ(t)) that puts more weight at the middle of the follow-up period than at the ends.

2.2 Weighted Hazard Ratios

Lin and León (Citation2017) proposed fitting a Cox model to provide a time varying treatment effect estimate to complement the weighted log rank test. Lin and León propose fitting a Cox model with, λ(t;X)=λ0(t)eA(t)βXwhere A(t) is defined by A(t)=w(t)max(w(s)) max(w(s)) is evaluated over the values of t in the observed dataset and A(t) thus has a maximum value of 1. The time varying covariate X*(t)=A(t)X represents the treatment assignment weighted by the adjustment factor. Once X*(t) has been determined, the constant coefficient β can be estimated. It is shown in the paper that the score test of the null hypothesis that β = 0 in the proposed model above is equivalent to the weighted log rank test with the corresponding choice of weight function (Lin and León Citation2017). Lin and León used the fact that the score test from this Cox model corresponds to a particular weighted log rank to motivate estimating the time-varying treatment effect using this Cox model. The coefficient β can be interpreted as the maximal effect where the model assumes the patients experience the most benefit (A(t) = 1). To differentiate the time varying hazard ratio derived from the proposed method, we will denote this as HRLL and it can be expressed as, (3) HRLL(t)=λ0(t)eA(t)β×1λ0(t)eA(t)β×0=eβA(t)=[HRF]A(t)(3) where HRF=eβ represents the maximal effect (i.e., at time t with A(t)=1). The time profile of the treatment effect can be observed as the treatment coefficient β weighted by the adjustment factor A(t). The estimate derived from the model provides a consistent time profile of the hazard ratio over time given that the true hazard ratio also varies over time as specified by A(t). If this does not hold then the treatment effect measure will in general no longer be unbiased for the true value of the hazard ratio at a given time.

3 Evaluation of Weighted Hazard Ratio Estimation Method

In this section we give expressions for the hazard ratio function for which the Fleming-Harrington (Fleming and Harrington Citation1991) Gρ,γ test statistics are most efficient. We will also be exploring whether, for a given weight function, the Lin and León estimation method (Lin and León Citation2017) gives you the correct treatment effect profile if the true data generating mechanism is such that the corresponding weighted log rank test is the most powerful.

3.1 Diminishing Effect

Fleming and Harrington (Citation1991) showed that the Gρ test statistic is the most powerful test statistic when the hazard ratio at time t is of the following form, (4) λ2(t)λ1(t)=eΔ{S1(t)}ρ+eΔ[1{S1(t)}ρ](4) where S1(t)=exp(0tλ1(x)dx) is the survival function for the control arm. The parameter eΔ represents the hazard ratio at t = 0. The parameter ρ controls how quickly the effect diminishes. Note that the parameter γ is 0 in the (2) under the diminishing effect scenario. The above hazard ratio function (4) has its full treatment effect initially eβ and decreases monotonically to 1. Later, we will demonstrate various examples with ρ=0.5,1,2 to further develop our understanding about the behavior of the weighted logrank test and the proposed model. We now calculate the weighted hazard ratio function proposed by Lin and León (Citation2017) under the set up where the Gρ test statistic would be optimal. In order to calculate the proposed weighted hazard ratio function under the set up where WLRT is optimal, we will need to derive the true pooled survival function which is determined by finding the survival function of the two treatment arms. The hazard function for the treatment arm λ2(t) can be derived, assuming that the true data generating mechanism is such that that WLRT is optimal, by using (4) λ2(t)=λ1(t)eΔ{S1(t)}ρ+eΔ[1{S1(t)}ρ].

Thus, the survival function for the treatment arm is, (5) S2(t)=exp(0tλ2(x)dx)=exp(0tλ1(x)eΔ{S1(x)}ρ+eΔ[1{S1(x)}ρ]dx).(5)

We can replace the adjustment factor A(t) with the weight function that the Lin and León method converges to (as n tends to ) (3), in this case S(t). For the Lin and León (Citation2017) method, we estimate the pooled survival function S(t) by the Kaplan-Meier estimator for the pooled sample which is consistent for the true pooled survival function. Therefore, the Lin and León method consistently estimates, (6) HRLL(t)=eβA(t)=eβ(0.5S1(t)+0.5S2(t))ρ.(6)

Substituting S2(t) in the hazard ratio function (6) for which this WLRT is optimal for gives, (7) HRLL(t)=exp[β(12S1(t))    +12exp(0tλ1(x)eΔ{S1(x)}ρ+eΔ[1{S1(x)}ρ]dx))ρ].(7)

When fitting the proposed model, the β is estimated from the Cox PH model once the adjustment factor A(t) is determined. Comparing the true hazard ratio function (4) with the proposed hazard ratio function (7), we see that the two expressions do not appear to be equal to each other. In Appendix B, we demonstrate that the two hazard ratio functions are approximately equal to each other when Δ is close to 0. We also demonstrate this equality graphically in the next section where we choose the value of eΔ to compare the hazard ratio functions. We also compare the hazard ratio expressions analytically for some particular choices of λ1(t) and eΔ to illustrate in Appendix A.

3.2 Delayed Treatment Effect

The paper by Garès et al. (Citation2017) has provided an expression for the hazard function for which the Fleming-Harrington test Gγ for detecting late effects is optimal. The parameter ρ is 0 in the weight function (2) under the delayed effect scenario. Assume the survival time in the control arm is exponentially distributed with some constant hazard λ1. Then Garès et al. (Citation2017) show that the Gγ test is optimal for testing the hypothesis, (8) H0:λCT=λTRT=λ1H1:λTRT(t)=λCTLγ((Lγ)1(Lγ(eλ1t)+φ))Lγ(eλ1t)(8) where L(x)=x1(1s)sds=1+ln(x)x and L(x)=0.5x1slog(s)+ss2ds. The expression for the hazard ratio has no closed form and needs numerical integration techniques to compute. Garès et al. (Citation2017) has provided the definition for the parameter φ which is evaluated at some chosen time point τ and is of the following form, (9) φ=Lγ(r(1S1(τ))+S1(τ))Lγ(S1(τ))=Lγ(S2(τ))Lγ(S1(τ))(9) where r is the discrepancy rate, r=S2(τ)S1(τ)1S1(τ).

Similar to the diminishing effect scenario, we can check whether the true hazard ratio for which the Gγ test is optimal, as given in (8), agrees with the hazard ratio estimation proposed by Lin and León (Citation2017). The true hazard ratio under which the Gγ test statistic is optimal is shown to be the second equation in (8). Now the weighted hazard ratio model proposed by Lin and León corresponds to, HRLL(t)=eβA(t)=eβw(t)max(w(s))=eβ((1S(t))γmax(w(s)))=eβ((10.5S1(t)0.5S2(t))γmax(w(s))).

From Garès et al. the survival in the treatment and control arm is defined as S2(t)=(L)1(L(eλ1t)+L(S2(τ))L(S1(τ)))=(L)1(L(eλ1t)+φ)and S1(t)=eλ1t (Garès et al. Citation2017). We now substitute in for S1(t) and S2(t), (10) HRLL(t)=eβ((10.5eλ1t0.5L1(L(eλ1t)+φ))γ(10.5eλ1τ0.5L1(L(eλ1τ)+φ))γ).(10)

As in Section 3.1, we show in Appendix C that if the Lin and León method estimates the correct hazard ratio at the time point when it is largest (at t=τ), then it will estimate the HR at all other times correctly too under the condition that φ is small.

4 Simulation Studies

This section investigates the weighted hazard ratio method under diminishing and delayed effect using simulations Lin and León (Citation2017). We will focus on investigating the method proposed by Lin and León and its properties when nonproportional hazards are present in the data. Specifically, we will empirically check the analytical results of Section 3 that showed that the Lin and León approach is approximately unbiased when the treatment effect is small, and how large the bias might be when the treatment effect is larger.

4.1 Scenario I: Diminishing Effect

We simulated the survival data where the treatment is assumed to have its maximal effect initially and decrease monotonically to 1. The survival function for the treatment arm is given in (5). Three different scenarios ρ=0.5,1,2 were considered to understand the behavior of the proposed model under various rates at which the effect diminishes. We also chose three different values of the maximum effect eΔ=0.75,0.50,0.25 to investigate whether substantial bias occurs when the treatment effect is larger. We fitted the Cox model proposed by Lin and León (Citation2017) where the weight function w(t)=Ŝ(t)ρ for the non PH data we simulated is specified for three scenarios ρ=0.5,1,2. The study enrolls 200 patients, equally allocated in the two arms. The survival of the control arm follows an exponential distribution with hazard rate λ1(t)=0.5. The survival in the treatment arm is as per Equationequation (5). The follow up time is assumed to be 3 years; patients whose event time exceeds 3 years are censored. We conducted 5000 runs for each scenario.

shows the true hazard ratio over time for which the Gρ statistic is optimal and the average hazard ratio profile over time estimated by the Lin and León method. The plot displays eβ̂¯S(t)ρ where β̂¯ is the average of the log hazard ratio at t = 0 estimated from the Lin and León method and S(t) is the true pooled survival function. The proposed model slightly overestimates the maximal effect when the true HR at t = 0, that is eΔ, is large, and continues to slightly overestimate during the follow up period. Note that when the treatment effect is small, the proposed model estimates the maximal effect very close to the true one, in line with our analytical results in Appendix B. We observe through our work that when the treatment effect is large, the proposed model is not able to estimate the maximal effect at t = 0 correctly which has an impact on the treatment effect profile over time, in that case the implied profile of treatment effect over time would not match the true one (as given by (4)).

Fig. 1 Hazard ratio over time targeted by weighted hazard ratio model and true hazard ratio over time under a data generating mechanism where the Gρ test is optimal. The solid lines show the true hazard ratio over time for ρ=0.5,1,2. The true hazard ratio for the three scenarios at t = 0 are eΔ=0.75,0.50,0.25 and the hazard rate in the control arm stays the same in all the scenarios λ1(t)=0.5. The dotted lines show the proposed hazard ratio targeted by the weighted hazard ratio model. The hazard ratio for the dotted lines is the average of the hazard ratios estimated from fitting the Lin and León method and is as follows; the average of the log hazard ratio across 5000 simulations is eβ̂¯=0.75,0.497,0.247, eβ̂¯=0.749,0.499,0.246, eβ̂¯=0.748,0.495,0.243 for ρ=0.5,1,2.

Fig. 1 Hazard ratio over time targeted by weighted hazard ratio model and true hazard ratio over time under a data generating mechanism where the Gρ test is optimal. The solid lines show the true hazard ratio over time for ρ=0.5,1,2. The true hazard ratio for the three scenarios at t = 0 are eΔ=0.75,0.50,0.25 and the hazard rate in the control arm stays the same in all the scenarios λ1(t)=0.5. The dotted lines show the proposed hazard ratio targeted by the weighted hazard ratio model. The hazard ratio for the dotted lines is the average of the hazard ratios estimated from fitting the Lin and León method and is as follows; the average of the log hazard ratio across 5000 simulations is eβ̂¯=0.75,0.497,0.247, eβ̂¯=0.749,0.499,0.246, eβ̂¯=0.748,0.495,0.243 for ρ=0.5,1,2.

Next we investigated how the proposed method behaves when the value of ρ is misspecified that is the value used in the analysis differs from that used when simulating data. We set the true maximal HR at t = 0 as 0.75. We chose four different values of ρ for data generating purposes, namely, ρ=0.5,1,1.5,2. We fitted the Cox model proposed by Lin and León (Citation2017) where the weight function w(t)=Ŝ(t) was used, that is, analyzing in all four scenarios assuming ρ = 1. The survival distribution for the control and treatment arm remains the same as in the previous diminishing effect scenario. shows the hazard ratio profile targeted by the weighted hazard ratio approach proposed by Lin and León and the true hazard ratio over time. The solid lines represent the true hazard ratio profiles under the different ρ values, while the dashed lines represent the Lin and León estimated hazard ratio profiles estimated assuming ρ = 1. Note from the black lines that when the data generating value of ρ matches with the value of ρ used in the analysis, the hazard ratio profiles are approximately the same. However, when the value of ρ used in the analysis is incorrect, we can see that the estimated treatment effect differs from the true maximal effect of 0.75. When the data generating value of ρ=0.5 was used, the Lin and León HR profile overestimates until 6 months and then starts to underestimate until the end of the study. Similarly, when the data generating value of ρ is set to 1.5 and 2, the Lin and León method underestimates the maximal effect (0.783 and 0.806, respectively) and the HR profile at the start of the study but quickly starts to overestimate after about 6 months. The model misspecification differs depending on how quickly or slowly the treatment effect is diminishing over time. also shows that when the true treatment effect is diminishing slowly that is when ρ=0.5, then the method overestimates the initial effect, but then underestimates the effect at later times; however, when the true treatment effect is diminishing faster, then it underestimates the initial effect, but then overestimates the effect at later times.

Fig. 2 Hazard ratio over time targeted by weighted hazard ratio model and true hazard ratio over time under a data generating mechanism where the Gρ test is optimal. The solid lines show the true hazard ratio over time for ρ=0.5,1,1.5,2. The true hazard ratio for the four scenarios ρ=0.5,1,1.5,2 at t = 0 is eΔ=0.75 and the hazard rate in the control arm stays the same in all the scenarios λ1(t)=0.5. The dotted lines show the proposed hazard ratio targeted by the weighted hazard ratio model estimated assuming that ρ = 1. The hazard ratio for the dotted lines is the average of the hazard ratios estimated from fitting the Lin and León method and is as follows: eβ̂¯=0.715,0.747,0.783,0.806 for ρ=0.5,1,1.5,2.

Fig. 2 Hazard ratio over time targeted by weighted hazard ratio model and true hazard ratio over time under a data generating mechanism where the Gρ test is optimal. The solid lines show the true hazard ratio over time for ρ=0.5,1,1.5,2. The true hazard ratio for the four scenarios ρ=0.5,1,1.5,2 at t = 0 is eΔ=0.75 and the hazard rate in the control arm stays the same in all the scenarios λ1(t)=0.5. The dotted lines show the proposed hazard ratio targeted by the weighted hazard ratio model estimated assuming that ρ = 1. The hazard ratio for the dotted lines is the average of the hazard ratios estimated from fitting the Lin and León method and is as follows: eβ̂¯=0.715,0.747,0.783,0.806 for ρ=0.5,1,1.5,2.

These results demonstrate that misspecification of ρ in the analysis can lead to quite biased estimates of the HR profile over time. We note that when ρ (and later γ) is misspecified, the HR profile over time which is estimated (on average) will depend on the censoring distribution.

4.2 Scenario II: Delayed Treatment Effect

In this scenario, the data were simulated based on the assumption that the treatment will have no effect at the start of the follow up and gradually increases to have its maximal effect at the final time point of follow up period. The study enrolls the same number of patients as in the diminishing effect scenario that is 200. The maximal treatment effect occurs at the final time point τ = 2. The survival times in the control arm are exponentially distributed with λ1(t)=0.5. The survival times in the active arm can then be simulated using the expression for the hazard function given in (8). To simulate the survival times in the active arm, we calculated the value of S2(t) for t from 0 to τ in increments of 0.0005. We then simulated UU(0,1) and found the value of t such that S2(t) was closest to U. We consider two scenarios of S2(τ)=0.6 and 0.45 which provides the values of discrepancy rate r = 0.37 and 0.13 for γ=0.5,1,2 to verify our results obtained in Section 3. The mechanism behind simulating the nonproportional data in this way would ensure that the weight function (1S(t))γ would provide the most powerful test statistic. Similar to the previous scenario, we applied the Lin and León Cox PH model to our simulated datasets and compared the estimated HR profile with the true HR profile over time. We again considered three scenarios of γ where it is varied to control the rate of how quickly the treatment effect reaches its maximum effect.

The true hazard ratio over time for which the Gγ statistic is optimal and the average hazard ratio profile over time estimated by the Lin and León method is displayed in . The plot displays the three HR profiles over time for γ=0.5,1,2 for two different values of S2(τ). The solid lines represents the true HR profile and the dashed lines represents eβ̂¯(1S(t)γ. Note that for both values of S2(τ), the estimated average hazard ratio is overestimated for γ=1,2 and this deviation is reflected on the HR time profile. Though for other scenarios, we see that the Lin and León method estimates the treatment effect profile over time with minimal bias, in line with our analytical results in Appendix C. The proposed model closely estimates the maximal treatment effect for both scenarios where γ=0.5 and r is relatively small.

Fig. 3 Hazard ratio over time targeted by weighted hazard ratio model superimposed true hazard ratio over time under a data generating mechanism where the Gγ test is optimal. The solid line shows the true hazard ratio over time for γ=0.5,1,2. The true HR for r = 0.37 at τ = 2 is eβ=0.390,0.301,0.183 and for r = 0.13 is eβ=0.733,0.676,0.579 for γ=0.5,1,2, respectively. The dashed lines show the proposed hazard ratio targeted by the weighted hazard ratio model. The hazard ratio for the dashed lines is the average of the hazard ratios across 5000 simulations estimated from fitting the Lin and León method and is as follows; eβ̂¯=0.387,0.294,0.166, eβ̂¯=0.732,0.668,0.573 for r=0.37,0.13 and γ=0.5,1,2, respectively

Fig. 3 Hazard ratio over time targeted by weighted hazard ratio model superimposed true hazard ratio over time under a data generating mechanism where the Gγ test is optimal. The solid line shows the true hazard ratio over time for γ=0.5,1,2. The true HR for r = 0.37 at τ = 2 is eβ=0.390,0.301,0.183 and for r = 0.13 is eβ=0.733,0.676,0.579 for γ=0.5,1,2, respectively. The dashed lines show the proposed hazard ratio targeted by the weighted hazard ratio model. The hazard ratio for the dashed lines is the average of the hazard ratios across 5000 simulations estimated from fitting the Lin and León method and is as follows; eβ̂¯=0.387,0.294,0.166, eβ̂¯=0.732,0.668,0.573 for r=0.37,0.13 and γ=0.5,1,2, respectively

5 Discussion

In this article, we considered the Fleming-Harrington class of weights to simulate nonproportional hazard data for the which the same weight function would give the most optimal test. The Lin and León method consists of fitting a Cox model based on a time-varying treatment covariate. The score test from the proposed model gives a p-value which is equivalent to the weighted log rank test and the estimate derived from the model provides a time-profile of the treatment effect. We considered two nonproportional hazard scenarios to investigate its ability to correctly estimate the maximal treatment effect and the corresponding treatment effect profile over time when the data were simulated such that the corresponding weighted log rank test would be most powerful.

We showed analytically that the Lin and León method is approximately unbiased in both scenarios when the treatment effect is small and the assumed weight function is correctly specified. Our simulation results for the diminishing treatment effect scenario showed that the Lin and León method slightly overestimates the maximal treatment effect when the true treatment effect at t = 0 is large as well as when ρ is large. For scenarios where the treatment effect is small, the proposed model estimates the maximal effect very close to the true one. For the delayed treatment effect scenario, the Lin and León method slightly overestimates the maximal effect for γ=1,2 but in our simulations not by very much. The Lin and León method gave essentially unbiased estimates of the full effect and treatment effect profile over time for γ=0.5.

We have shown through simulations that under the delayed treatment effect scenarios, the proposed model is giving close to the correct profile over time when the treatment effect is small and the γ parameter is correctly chosen. However, this does not guarantee that the proposed model would provide unbiased results had the data generating mechanism been different, for example if the hazard in the control arm was not constant and changes over time.

Use of the weighted hazard ratio approach requires specification of ρ and γ. Although, the Type I error is controlled under an incorrect choice of ρ and γ, the power of the test will be to some extent adversely affected. For estimation of the treatment effect profile over time, the use of incorrect values would lead to biased estimates, as we have demonstrated in the results shown in , while if the values of ρ and γ are close to the true values then the bias is expected to be minimal. Given the difficulty of specifying the value of ρ and γ, it may be preferable to use methods such as restricted mean survival time and milestone survival probabilities, which do not rely on correctly specifying how quickly or slowly the treatment effect changes over time, and for which the interpretation of the treatment effect is clinically meaningful to the clinicians and patients (Royston and Parmar Citation2013).

Supplementary Materials

The R files containing all code to replicate the simulation studies for delayed and diminishing treatment effect scenario.

Supplemental material

Supplemental Material

Download Zip (5 KB)

Disclosure Statement

Jonathan Bartlett’s current/past institutions have received consultancy fees for the author’s advice on statistical methodology from AstraZeneca, Bayer, Novartis, Roche. J.B. has received consultancy fees from Bayer and Roche, and fees for provision of online courses from Roche.

Additional information

Funding

Bharati Kumar’s research was funded by a UK EPSRC studentship (2281147). Jonathan Bartlett was supported by the UK MRC (grant MR/T023953/1).

References

  • Fleming, T. R., and Harrington, D. P. (1981), “A Class of Hypothesis Tests for One and Two Sample Censored Survival Data,” Communications in Statistics-Theory and Methods, 10, 763–794. DOI: 10.1080/03610928108828073.
  • Fleming, T. R., and Harrington, D. P. (1991), Counting Processes and Survival Analysis, pp. 255–277, New York: Wiley.
  • Garès, V., Andrieu, S., Dupuy, J.-F., and Savy, N. (2017), “On the Fleming–Harrington Test for Late Effects in Prevention Randomized Controlled Trials,” Journal of Statistical Theory and Practice, 11, 418–435. DOI: 10.1080/15598608.2017.1295889.
  • Hernán, M. A. (2010), “The Hazards of Hazard Ratios,” Epidemiology, 21, 13–15. DOI: 10.1097/EDE.0b013e3181c1ea43.
  • Kristensen, G., Perren, T., Qian, W., Pfisterer, J., Ledermann, J. A., Joly, F., Carey, M. S., Beale, P. J., Cervantes, A., Oza, A. M., et al. (2011), “Result of Interim Analysis of Overall Survival in the GCIG ICON7 Phase III Randomized Trial of Bevacizumab in Women with Newly Diagnosed Ovarian Cancer,” Journal of Clinical Oncology, 29, LBA5006. DOI: 10.1200/jco.2011.29.15_suppl.lba5006.
  • Lin, R. S., and León, L. F. (2017), “Estimation of Treatment Effects in Weighted Log-Rank Tests,” Contemporary Clinical Trials Communications, 8, 147–155. DOI: 10.1016/j.conctc.2017.09.004.
  • Lin, R. S., Lin, J., Roychoudhury, S., Anderson, K. M., Hu, T., Huang, B., Leon, L. F., Liao, J. J. Z., Liu, R., Luo, X., et al. (2020), “Alternative Analysis Methods for Time to Event Endpoints Under Nonproportional Hazards: A Comparative Analysis,” Statistics in Biopharmaceutical Research, 12, 187–198. DOI: 10.1080/19466315.2019.1697738.
  • Mok, T. S., Wu, Y.-L., Thongprasert, S., Yang, C.-H., Chu, D.-T., Saijo, N., Sunpaweravong, P., Han, B., Margono, B., Ichinose, Y., et al. (2009), “Gefitinib or Carboplatin–Paclitaxel in Pulmonary Adenocarcinoma,” The New England Journal of Medicine, 361, 947–957. DOI: 10.1056/NEJMoa0810699.
  • Royston, P., and Parmar, M. K. B. (2013), “Restricted Mean Survival Time: An Alternative to the Hazard Ratio for the Design and Analysis of Randomized Trials with a Time-to-Event Outcome,” BMC Medical Research Methodology, 13, 152. DOI: 10.1186/1471-2288-13-152.

Appendix A:

Computation of the True Hazard Ratio Function and Proposed Hazard Ratio Function for the Diminishing Effect Scenario for General Value of ρ

We let λ1(t)=λ1 and calculate the true hazard ratio function (4) and the proposed hazard ratio (7). The survival function for the control arm, S1(t)=exp(0tλ1(x))dx=exp(λ1t)

The hazard function for the treatment arm λ2(t) can be derived by the relationship in (4) λ2(t)=λ1×eΔ[exp(λ1t)ρ+[1{exp(λ1t)}ρ]×eΔ]1=λ1×eΔ[eΔ+exp(λ1t)ρ{1eΔ}]1

Thus, the survival function for the treatment arm is, S2(t)=exp(0tλ2(x)dx)0tλ2(x)dx=0tλ1×eΔ[eΔ+exp(λ1x)ρ{1eΔ}]1dx=[1ρlog(1eΔ+eΔ{exp(λ1x)}ρ)]0tS2(t)=exp(1ρlog(1eΔ+eΔ{exp(λ1t)}ρ))=1[1eΔ+eΔ{exp(λ1t)}ρ]1/ρ

Recall, the model proposed by Lin and León (Citation2017) states that the hazard ratio between the two arms can be expressed as, HRLL(t)=λ0eA(t)β×1λ0eA(t)β×0=eβA(t)

Let us substitute the pooled survival function, instead of A(t), (A1) HRLL(t)=eβA(t)=eβ(0.5S1(t)+0.5S2(t))ρ(A1)

For our example, the true hazard ratio under our data generating mechanism is expressed as, HR(t)=eΔeΔ+eΔ(1{exp(λ1t)}ρ)

Now the hazard ratio described in (A1) is expressed as, 12S1(t)+12S2(t)=12exp(λ1t)+121[1eΔ+eΔ{exp(λ1t)}ρ]1/ρHRLL(t)=exp[β·(12exp(λ1t)      +121[1eΔ+eΔ{exp(λ1t)}ρ]1/ρ)ρ]

Appendix B:

Taylor Series Expansion of the True Hazard Ratio Function and the Proposed Hazard Ratio Function for the Diminishing Effect

In this section, we demonstrate that the true hazard ratio function and the proposed hazard ratio functions are approximately equal when eΔ is close to 1 that is, when Δ is small. The evaluation of these results are executed generally for ρ. Recall the true hazard ratio function and the proposed hazard ratio function are of the following form, HR(t,Δ)=eΔS1(t)ρ+eΔ(1S1(t)ρ)HRLL(t,Δ)=exp[β·(12S1(t)+12S2(t))ρ]

We first assume β=Δ in order to check whether the Lin and León model estimates the treatment effect over time correctly if it estimates the initial effect correctly.

Suppose we let f(Δ)=HR(t,Δ)HRLL(t,Δ). We now proceed with a Taylor series expansion for f(Δ), f(x)=n=0f(n)(a)n!(xa)n=f(a)+f(a)(xa)+f(a)(xa)22!+···

We shall expand the first two terms of the series about Δ = 0, f(Δ)f(0)+f(0)(Δ0)

When a = 0, f(0)=HR(t,0)HRLL(t,0)=1. We now find the first order derivative of f(Δ) with respect to Δ using the quotient rule, so we have HRLL(t,0)HR(t,0)ΔHR(t,0)HRLL(t,0)Δ(HRLL(t,0))2=HR(t,0)ΔHRLL(t,0)Δ

We derive the following derivatives for the true hazard ratio function using the product and chain rule, HR(t,Δ)Δ=eΔS1(t)ρ+eΔ(1S1(t)ρ)eΔ(eΔ(1S1(t)ρ))(S1(t)ρ+eΔ(1S1(t)ρ))2HR(t,0)Δ=1S1(t)ρ+1S1(t)ρ1S1(t)ρ(S1(t)ρ+1S1(t)ρ)2=S1(t)ρ

Now, we find the derivative for the proposed hazard ratio function, HRLL(t,Δ)=exp[Δ·(12S1(t)+12S2(t,Δ))ρ],where we now emphasize the dependence of S2(t,Δ) on Δ. HRLL(t,0)Δ={(12S1(t)+12S2(t,Δ))ρ+Δρ[(12S1(t)+12S2(t,Δ))ρ1(12S2(t,Δ)Δ)]}×exp[Δ·(12S1(t)+12S2(t,Δ))ρ]

We now evaluate the derivative of the proposed hazard ratio function at Δ = 0, HRLL(t,0)Δ=(12S1(t)+12S2(t,0))ρ=S1(t)ρ,using the fact that S2(t,0)=S1(t). We can substitute the results in the Taylor series expansion, f(Δ)1+Δ(S1(t)ρS1(t)ρ)=1

The results confirms that the hazard ratio functions are approximately equal to each other when the treatment effect eΔ is small.

Appendix C:

Taylor Series Expansion of the True Hazard Ratio Function and the Proposed Hazard Ratio Function for the Delayed Effect

We demonstrate the approximate equality of the two hazard ratio functions for the delayed effect when φ is small. We show the results for a general γ. Recall, the true hazard ratio and the proposed hazard ratio are of the following form, HR(t,φ)=Lγ((Lγ)1(Lγ(eλ1t)+φ))Lγ(eλ1t)HRLL(t,φ)=eβ((10.5eλ1t0.5L1(L(eλ1t)+φ))γ(10.5eλ1τ0.5L1(L(eλ1τ)+φ))γ)where Lγ(x)=x1(1s)γsds and Lγ(x)=0.5x1sLγ(s)ds. Recall S2(t)=(L)1(L(eλ1t)+φ) and S1(t)=eλ1t. We again assume that the Lin and León model estimates the maximal treatment effect β correctly. For simplicity we write β as some function of φ and note that the maximal treatment effect occurs at t=τ, so we have g(φ)=log(Lγ(S2(τ,φ))Lγ(S1(τ))), HRLL(t,φ)=eg(φ)((10.5S1(t)0.5S2(t))γ(10.5S1(t)0.5S2(t))γ)

We let f(φ)=HR(t,φ)HRLL(t,φ) and proceed with the first order Taylor series expansion for the ratio of the hazard ratio functions, that is f(φ)f(0)+f(0)(φ0) for small φ. We check for φ=0,HR(t,0)=Lγ(S2(t,0))Lγ(S1(t))=1 and HRLL(t,0)=1, hence f(0)=11=1. We now find the first order derivative of f(φ) with respect to φ and we get the same expression as for the diminishing effect, that is f(0)=HR(t,0)φHRLL(t,0)φ

The steps to determine the first order derivative using the quotient rule for the true hazard ratio functions are as follows, HR(t,φ)=Lγ(S2(t,φ))Lγ(S1(t))HR(t,φ)φ=Lγ(S2(t,φ))S2(t,φ)φLγ(S1(t))0(Lγ(S1(t)))2,

Using the inverse function theorem, we have S2(t,φ)φ=1Lγ((Lγ)1(Lγ(S1(t))+φ)). Note that Lγ(x)=1xLγ(x), so we have S2(t,φ)φ=(Lγ)1(Lγ(S1(t))+φ)Lγ((Lγ)1(Lγ(S1(t))+φ))

We now evaluate HR(t,φ)φ at φ=0, HR(t,0)φ=Lγ(S2(t,0))S1(t)Lγ(S1(t))Lγ(S1(t))(Lγ(S1(t)))2=Lγ(S1(t))S1(t)

Note that Lγ(x)=(1x)γx. We then have, HR(t,0)φ=(1S1(t))γS1(t)S1(t)=(1S1(t))γ

Now we determine the first order derivative for the proposed hazard ratio function using the product and chain rule, HRLL(t,φ)=eg(φ)((10.5S1(t)0.5S2(t,φ))γ(10.5S1(τ)0.5S2(τ,φ))γ)HRLL(t,φ)φ=HRLL(t,φ)×{g(φ)(10.5S1(t)0.5S2(t,φ))γ(10.5S1(τ)0.5S2(τ,φ))γ+g(φ)((10.5S1(t)0.5S2(t,φ))γ(10.5S1(τ)0.5S2(τ,φ))γ)}HRLL(t,0)φ=HRLL(t,0)×{g(0)(1S1(t))γ(1S1(τ))γ+g(0)((10.5S1(t)0.5S2(t,φ))γ(10.5S1(τ)0.5S2(τ,φ))γ)}

Note that g(0)=log(Lγ(S2(τ,0))Lγ(S1(τ)))=log(1)=0. We now determine g(0), g(φ)=log(Lγ(S2(τ,φ)))log(Lγ(S1(τ)))g(φ)=1Lγ(S2(τ,φ))×Lγ(S2(τ,φ))×(Lγ)1(Lγ(S1(τ))+φ)Lγ((Lγ)1(Lγ(S1(τ))+φ))g(0)=1Lγ(S1(τ))×Lγ(S2(τ,0))×S1(τ)Lγ(S1(τ))=(1S1(τ))γS1(τ)S1(τ)=(1S1(τ))γ

Substituting the results in HRLL(t,0)φ gives, HRLL(t,0)φ=e0×{(1S1(τ))γ(1S1(t))γ(1S1(τ))γ+0}=(1S1(t))γ

We can substitute the results in the Taylor series expansion, f(φ)1+φ{(1S1(t))γ(1S1(t))γ}=1

This result confirms that the hazard ratio functions are approximately equal to each other when the treatment effect is small in the case of delayed effect.