334
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Joint TITE-CRM: A Design for Dose Finding Studies for Therapies with Late-Onset Safety and Activity Outcomes

, , &
Received 26 Jul 2023, Accepted 06 Mar 2024, Published online: 17 Apr 2024

Abstract

In Phase I/II dose-finding trials, the objective is to find the Optimal Biological Dose (OBD), a dose that is both safe and shows sufficient activity that maximizes some optimality criterion based on safety and activity. In cancer, treatment is typically given over several cycles, complicating the identification of the OBD as both toxicity and activity outcomes may occur at any point throughout the follow up of multiple cycles. In this work we present and assess the Joint TITE-CRM, a model-based design for late onset toxicities and activity based on the well-known TITE-CRM. It is found to be superior to the currently available alternative designs that account for late onset bivariate outcomes, as well as being both intuitive and computationally feasible.

1 Introduction

In traditional drug development, safety and therapeutic activity of potential new drugs have been evaluated separately. A phase I trial first finds the maximum tolerated dose, the dose associated with some predetermined probability of observing a Dose-Limiting Toxicity (DLT). This dose is then carried forward to phase II, where therapeutic activity is evaluated, with limited borrowing of information between the evaluation of safety and activity. An alternative option is a seamless Phase I/II trial where safety and activity are evaluated simultaneously, with the aim to find the Optimum Biological Dose (OBD). The main advantage of collecting information on both outcomes is the increased chance of finding a dose that is both safe and has potential to be efficacious, by allowing for more sharing of information.

There are multiple methods that have been proposed to design such a trial with a binary safety and binary activity endpoint that range in complexity. For example the relatively simple model-assisted toxicity and efficacy interval design (STEIN) of Lin and Yin (Citation2017), or more complex model-based designs that can jointly model the dose–toxicity and dose–activity relationship such as the utility contour of Thall and Cook (Citation2004), or the approach based on toxicity and activity odds ratios (bCRM) by Yin, Li, and Ji (Citation2006), or the bivariate Continual Reassessment Method for competing outcomes of Braun (Citation2002) amongst others. These methods differ in both their approaches to inference of the bivariate (or trinary) outcomes, and the decision criteria based on this inference. In addition, approaches that model safety as binary and use a continuous marker of activity have been proposed (e.g., Nebiyou Bekele and Shen Citation2005; Yeung et al. Citation2015, Citation2017). Common to those proposals is the assumption that as each cohort of patients enters the trial, responses from all previous cohorts are available to inform the decision of the next dose assignment. The next cohort is assigned the best dose in order to collect more information at the current estimate of the OBD.

In oncology, where there are multiple cycles of treatment, both safety and activity outcomes may have a delayed onset. There are a limited number of phase I designs that account for late onset toxicities, but an even more limited number of designs that account for late onset outcomes in both safety and activity. The main contributions to model-based dose-finding trial designs incorporating late onset toxicities include the Time-to-Event version of the Continual Reassessment Method (TITE-CRM) by Cheung and Chappell (Citation2000), the interval censored approach (ISCDP) of Sinclair and Whitehead (Citation2014), and the approach of including a cycle effect in a proportional odds mixed effects model (POMM) by Doussau, Thiébaut, and Paoletti (Citation2013). A comprehensive review of these was undertaken by Barnett et al. (Citation2022). However, these do not take into account the therapeutic activity, the inclusion of which is not a straightforward matter. When including late-onset activity as well, there are therefore a very limited number of approaches. For example, Koopmeiners and Modiano (Citation2014) consider binary late-onset toxicity outcomes with activity measured as survival beyond the observation window, and Altzerinakou and Paoletti (Citation2020) consider a binary late-onset toxicity outcome with a continuous longitudinal activity endpoint. In many trial settings, both toxicity and activity are binary late-onset outcomes, with both being observable in any of the multiple cycles of treatment. For this setting, the method of jointly modeling the time-to-event of activity and toxicity (AT /AE ) using survival models of Yuan and Yin (Citation2009), the model assisted approach of Liu and Johnson (Citation2016), and the TITE-B design of Yan et al. (Citation2019) are available designs. However, the model-assisted design lacks the flexibility of the model based designs and neither of the two model-based designs, the AT /AE or the TITE-B, target specific toxicity or activity levels.

In this article we introduce a joint time-to-event CRM (Joint TITE-CRM), a model-based design for the implementation of a phase I/II trial with delayed onset binary outcomes for both safety and activity, in comparison to the existing methods. This exploration is motivated by the Targeted Alpha Therapy (TAT) platform. TAT is an emerging modality within the field of radiotherapy, combining tumor-targeting molecules linked to alpha particle-emitting radioisotopes that aim to offer a new approach to treating cancers and potentially overcoming resistance.

Several first in human studies have been initiated in recent years to support the development of TAT treatments targeting a broad range of cancer indications (U.S. National Library of Medicine Citation2015, 2018a, 2018b, 2020). Clinical and practical experience accumulated during the conduct of these trials has highlighted the importance of long-term toxicity and activity outcomes and therefore a potential benefit of incorporating these features into the statistical design.

In addition, the need for reliable designs for this purpose directly relates to one of the goals of the FDA’s Project Optimus (U.S. Food & Drug Administration Citation2023), to “develop strategies for dose finding and dose optimization that leverages nonclinical and clinical data in dose selection, including randomized evaluations of a range of doses in trials. An emphasis of such strategies will be placed on performing these studies as early as possible in the development program and as efficiently as possible to bring promising new therapies to patients.”

The article is outlined as follows. In Section 2 we introduce the methods and setting, presenting results of their application in Section 3. We consider a realistic setting with stopping and enforcement rules, a theoretical setting where these rules are relaxed, and a setting with varying activity time trends, before concluding with a discussion in Section 4.

2 Methods

In this section, we outline the proposed Joint TITE-CRM design, the model-assisted design of Liu and Johnson (Citation2016), the AT /AE design of Yuan and Yin (Citation2009) and the TITE-B design of Yan et al. (Citation2019). We first consider the general setting of the trial, before detailing how the designs differ.

We consider the setting where there are J dose levels, d1,,dj,,dJ available for exploration. Patients enter the trial in cohorts, with all patients in each cohort assigned to the same dose and followed up for τ cycles of treatment. A DLT (yes/no) may occur at any time during the follow up period, in which case the patient leaves the trial. The patient may also observe an activity response (yes/no) at any time during their follow up, which is censored at the time of DLT if the patient observes a DLT and no activity response. This censoring is used for all methods, since it is based on the assumption that the patient leaves the trial following a DLT response

The choice of dose to assign to the next cohort and the final dose recommendation are both chosen based on the design. For comparability, in each case, the trial proceeds in the following way:

  1. The first cohort of patients is assigned to the lowest dose.

  2. After one cycle of follow up, if no DLT is observed, escalate to next highest dose and continue after each cycle until a DLT is observed or the highest dose is reached. If a DLT is observed, the models from the design take over the dosing decision (there is no requirement on activity responses to start using the design model).

  3. The relevant models are fitted to the currently observed responses (which will be from one cycle of follow up for the last cohort, two cycles for the last but one cohort etc.) and posterior distributions are updated.

  4. A set of “admissible” doses are calculated based on the models fitted.

  5. The best dose of the “admissible” set according to some criterion of the design is chosen to assign to the next cohort (subject to certain pre-specified rules discussed in Section 2.5).

  6. After the next cycle of follow up, return to step 3. The trial is stopped when one of the pre-specified stopping rules is triggered.

It is worthwhile to note that the form of the prior and posterior in step 3, the admissible doses in step 4 and criterion in step 5 are the only aspects unique to each design.

2.1 Joint TITE-CRM

Based on the TITE-CRM (Cheung and Chappell Citation2000) and similar to the bCRM (Braun Citation2002), we modify the procedure to include both safety and activity, whilst keeping the structure of the TITE-CRM.

We use a two-parameter logistic model for both safety and activity outcomes: F(d,βA)=exp(βA,0+βA,1d)1+exp(βA,0+βA,1d),F(d,βT)=exp(βT,0+βT,1d)1+exp(βT,0+βT,1d), where βA=(βA,0,βA,1) and βT=(βT,0,βT,1) are the parameter vectors for activity and toxicity, respectively.

We then must give weights to both toxicity and activity observations based on their follow up times, and a weighted dose response model, G, is used for each: G(d,w(A),βA)=w(A)F(d,βA),G(d,w(T),βT)=w(T)F(d,βT), where 0w(A),w(T)1 are functions of time-to-event of a patient response. We use the proposed weights of Cheung and Chappell (Citation2000), that is the proportion of total follow-up. In this context, this simplifies to using wi(T)=ui/τ, where ui is the current number of cycles patient i has been observed for, unless a DLT is observed in which case wi(T)=1. In a similar fashion, we use wi(A)=ui/τ, where ui is the current number of cycles patient i has been observed for, unless an activity outcome is observed, in which case wi(A)=1. The exception to this is if a DLT is observed before an activity outcome can be observed, in which case the activity observation is censored by DLT time and so wi(A)=(DLT time ‐ entry time)/τ. We note that although here we use discrete numbers of cycles for the weights unless censoring occurs, this is a consequence of the setup of the trial and not the Joint TITE-CRM itself, which can use any weights.

We consider all binary combinations of toxicity and activity outcomes (as opposed to trinary sometimes used in such applications), related by a Gumbel Model (Thall and Cook Citation2004): πa,b(d,β,w)=(G(d,w(A),βA))a(1G(d,w(A),βA))1a(G(d,w(T),βT))b(1G(d,w(T),βT))1b+(1)a+b(G(d,w(A),βA))(1G(d,w(A),βA))G(d,w(T),βT)(1G(d,w(T),βT))(eψ1eψ+1), where πa,b is the probability of observing each of the four combinations of binary activity outcome (a = 0 for no activity observed, a = 1 for activity observed) and binary toxicity outcome (b = 0 for no DLT observed, b = 1 for DLT observed), β=(βA,βT), w=(w(A),w(T)) and ψ(,) is a correlation parameter of the distribution, with positive values of ψ corresponding to a positive correlation between activity and toxicity, negative values corresponding to a negative correlation, and larger absolute values of ψ corresponding to stronger correlations.

The likelihood is then: (1) L(β)=i=1na=01b=01{πa,b(d[i],β,wi)}I(Yi=(a,b)),(1) for patients i=1,,n, where d[i] is the dose assigned to patient i and Yi is the observed response for patient i.

Priors are elicited on ψ, βA,0, βA,1, βT,0, βT,1, and the likelihood (1) is used to update the joint posterior for all of the parameters using MCMC methods. The next dose is then chosen based on a utility function of πA and πT , the probability of activity and toxicity, respectively.

We use the following utility function, as a result of a sensitivity analysis conducted on this choice: (2) U(πA,πT)=πAω1πTω2πT I(πT>ϕT).(2)

This uses two weights, ω1 and ω2 and a toxicity threshold ϕT, whereby we penalize doses that have a higher probability of DLT than this threshold. The next dose is chosen to maximize the utility out of a set of admissible doses.

We varied the weights ω1 and ω2 in the linear utility function, and also consider the more complex utility contour method suggested by Thall and Cook (Citation2004). It was found that there was negligible difference in the outcomes for any of the alternative utilities. This led to use using the simple linear utility used by Liu and Johnson (Citation2016) in our implementation of the Joint TITE-CRM, with the same weights, ω1=0.33 and ω2=1.09.

This utility is simple to implement and to interpret. Using the same utility function for the two different designs also allows the results to be interpreted in terms of the inference used.

2.2 Model-Assisted Approach (Liu and Johnson Citation2016)

The model-assisted method by Liu and Johnson (Citation2016) uses a Bayesian dynamic model for binary responses, y, as follows: yi(T)|d=djBern(pT,j)pT,j=pT,j1+(1pT,j1)βT,j,j=2,JpT,1=βT,1βT,jBeta(aT,j,bT,j)j=1,J.

The subscript T indicates toxicity, with the equivalent for activity labeled A. This is described as a model-assisted approach since there is no dose–response model, and as such, the specification of the pT,j ensures monotonicity.

The likelihood is given as L(y|βT,βA)=k={A,T}i=1n{wi(k)(1r=1j[i](1βk,r))}yi(k){1wi(k)(1r=1j[i](1βk,r))}1yi(k), where j[i] is the dose level assigned to patient i. This likelihood uses the same weights as in the Joint TITE-CRM and is used to update the posterior distribution. The same utility function as the Joint TITE-CRM (2) is used to assign the next dose.

2.3 AT /AE Design (Yuan and Yin Citation2009)

We here give an overview of the AT /AE design, for further details we refer the reader to the original proposal by Yuan and Yin (Citation2009).

This design fits survival models to the time-to-event data for both toxicity (tT ) and activity (tA ) outcomes, assuming a Weibull distribution: ST(tT|d)=exp{λTtTαTexp(βTd)},SA(tA|d)=exp{λAtAαAexp(βAd)},SA*(tA|d)=1π+πSA(tA|d), where S* is the improper survival with the proportion of the population susceptible to the activity outcome denoted as π. This improper survival is therefore used to calculate the probability of observing activity outcomes at any given dose, as well as in the AT /AE criterion.

The bivariate time-to-event data is then modeled as S(tT,tA|d)={ST(tT|d)1/ϕ+SA(tA|d)1/ϕ1}ϕ, where ϕ is the correlation between the times to toxicity and activity. In order to compute the likelihood, define yT=min(tT,cT) and ΔT=I(tTcT) where cT is the censoring time, and define yA and ΔA similarly for activity. The likelihood is then L(θ|datai)=L1ΔTΔAL2ΔT(1ΔA)L3(1ΔT)ΔAL4(1ΔT)(1ΔA), where, for the π and S defined as above: L1=π2S(yT,yA|d)yTyA,L2=(1π)ST(yT|d)yTπS(yT,yA|d)yT,L3=πS(yT,yA|d)yA,L4=(1π)ST(yT|d)+πS(yT,yA|d).

The parameters λA , αA , βA , λT , αT , and βT are assigned independent gamma priors and π and ϕ are assigned uniform priors. The joint posterior distribution is then updated using a Gibbs Sampler.

The criterion used for decision making is a ratio of the area under the curve (AUC) for the survival for toxicity and activity: ATAE=αT1{λTexp(βTd)}1/αTΓ{αT1,λTexp(βTd)ταT}(1π)τ+παA1{λAexp(βAd)}1/αAΓ{αA1,λAexp(βAd)ταA}, where Γ(a,b) is the incomplete gamma function and τ is the follow up time. The dose in the admissible dose set that maximizes the AT /AE is chosen for the next cohort of patients.

2.4 TITE-B Design (Yan et al. Citation2019)

The TITE-B design by Yan et al. (Citation2019) also expands the TITE-CRM to include late-onset activity endpoints, however, the approach of both estimating the probabilities of toxicity and activity and choosing the OBD is different to that of the proposed Joint TITE-CRM. The TITE-B uses a one-parameter power model for both toxicity and activity, and fits the models independently of one another. The toxicity model is ψ(dj,θ)=pjexp(θ), where p1<p2<pJ<1 are pre-specified constants of the skeleton, corresponding to doses d1,,dJ. Using the same weights as in the Joint TITE-CRM, the following likelihood is constructed: L(θ)=i=1nwi(T)ψ(d[i],θ)yi(T)(1wi(T)ψ(d[i],θ))1yi(T).

The posterior distribution for θ is updated using this likelihood and the prior distribution for θ, and used to calculate the posterior estimate of toxicity at each dose. For activity, multiple working models, labeled l=1,,L, are used: ϕl(dj,βl)=qjlexp(βl), with qjl being the pre-specified constants in the skeleton for working model l. Again, using the same weights as in the Joint TITE-CRM, the following likelihood is constructed for each working model l=1,,L: (3) L(βl)=i=1nwi(A)ϕ(d[i],βl)yi(A)(1wi(A)ϕ(d[i],βl))1yi(A).(3)

Prior probabilities h(1),,h(L) are assigned to each of the working models, and the posterior probability of each working model is calculated: κl=h(l)L(βl)g(βl)dβll=1Lh(l)L(βl)g(βl)dβl, where g(βl) is the prior distribution elicited on βl.

The criterion used to make the next dose recommendation is to first define a set of safe doses using the toxicity model, such that the posterior estimate of toxicity is below a pre-specified level, then to randomize the dosing choice from this set based on the posterior probabilities of a subset of the working models and the recommended dose from each candidate model, S(l). The randomization probability, Rj*, for dose dj is given as Rj*=Rj**j=1JRj**whereRj**=l=12J1κn(l) I(dj=S(l)andκn(l)κ(LL+1)).

Here, κ(1)κ(L) denote the ordered posterior model probabilities κn(l). L=2J1 is the total number of working models and L is the number of candidate models considered. The number of candidate models used in the randomization reduces as more patients enter the trial: L=(NnN)δL, where δ is a pre-specified constant.

2.5 Rules

As well as ensuring any assigned dose is admissible, we also apply a set of enforcement and stopping rules. For any given dose dj , p1,dj is the P(DLT) in the first cycle, pτ,dj is the P(DLT) in the full follow up of τ.

2.5.1 Admissible Dose Set

For the Joint TITE-CRM, Model-Assisted Approach and AT /AE design, the admissible dose set is calculated as all doses satisfying the following criteria: P(πT<πT*)>qT  &  P(πA>πA*)>qA. for some thresholds qT and qA , where πT* and πA* are the target probabilities for toxicity and activity in the entire follow-up τ. These constraints ensure that the escalation proceeds to a promising dose that is considered neither futile nor unsafe.

For the TITE-B design, following the specification by Yan et al. (Citation2019), the admissible set is defined only by safety, and as those doses that satisfy π̂T(dj)<ξ, where π̂T(dj) is the estimate of DLT rate at dose dj and ξ is the maximum acceptable DLT rate for the entire follow up.

2.5.2 Enforcement Rules

Enforcement rules are to ensure the safety within the trial. The hard safety rule ensures safety in the first cycle of treatment, and as such, the safety cutoff of 0.3 is used as opposed to πT*. The K-fold skipping doses rule ensures escalation is not too aggressive and is based on dose values as opposed to levels so that it is meaningful for the model-based and model-assisted designs.

  1. Hard Safety: This rule ensures that if there is a very high probability that the toxicity of an experimented dose exceeds 0.3 in the first cycle, then that dose and all above are excluded from any further experimentation (i.e., dose dj and all above are excluded when P(p1,dj>0.3)>ζ for some threshold ζ). In this implementation we use a threshold for excessive toxicity of ζ=0.95, with a Beta(1,1) prior for Binomial responses. This is equivalent to the case where there are at least 3 DLT responses out of 3 patients, at least 4 DLT responses out of 6 patients, or at least 5 DLT responses out of 9 patients, then all dose assignments must be strictly lower than that dose for the rest of the study. If the lowest dose is excluded then the trial stops with no dose recommendation made.

  2. K-fold Skipping Doses: The next dose assignment must be no more than a 2-fold increase in the value of the highest experimented dose so far.

2.5.3 Stopping Rules

Stopping rules are to define when we may stop the trial. This may be because we have a level of certainty about the estimated OBD or for futility/safety issues. Either that the dose range is unsafe and hence entirely too high, the dose range is all below target toxicity and hence entirely too low, or that the dose range is all below target activity. It is useful to be able to recommend stopping the trial when all doses are deemed too safe so that a new dose set can be established and investigated.

  1. No Admissible Doses: If no doses satisfy the two constraints in Section 2.5.1, then the trial is stopped, either for futility, safety or both.

  2. Lowest Dose Deemed Unsafe: If P(p1,d1>30%)>0.80 and at least one cohort of patients has been assigned to dose d1, the trial is stopped.

  3. Highest Dose Deemed Very Safe: If P(p1,dJ30%)>0.80 and at least one cohort of patients has been assigned to dose dJ , the trial is stopped.

  4. Sufficient Information: For a pre-defined cutoff value, Csuff , if a dose is recommended for the next cohort on which Csuff patients have already been assigned in the escalation, the trial is stopped.

  5. Precision: If the safety and activity profile are both estimated precisely enough, the trial is stopped. This precision is defined as CV(MTD)<30% and CV(d[πA*])<30%, with the coefficient of variation calculated as an adjusted median absolute deviation divided by the median and d[πA*] is the dose associated with the target activity. Both of these must be satisfied before the stopping rule is enforced. This stopping rule is only used once at least Csuff patients have had at least one cycle of treatment in the escalation, on any dose. This rule is not applicable for the model-assisted method.

  6. Hard Safety: If the lowest dose is considered unsafe according to the hard safety enforcement rule, the trial is stopped.

  7. Maximum Patients: If the maximum number of patients (n=nmax) have been recruited, the trial is stopped.

We consider two settings; a realistic setting where all enforcement and stopping rules are applied, and a theoretical setting where stopping rules 2, 3, 4, 5 are not applied. This second setting is to investigate the behavior of the designs without restrictions.

3 Simulations

In order to compare the Joint TITE-CRM to the model-assisted design, the AT /AE design and the TITE-B design, we conduct simulation studies in a range of scenarios. For comparison, we also consider a non-time-to-event method, the Joint CRM. For this implementation, all τ cycles must be observed from the previous cohort before the next cohort is assigned. This is equivalent to setting w(A)=w(T)=1 for each patient in the Joint TITE-CRM, effectively removing the time-to-event element of the design.

3.1 Setting

We consider the setting where six doses are investigated: 1.5MBq, 2.5MBq, 3.5MBq, 4.5MBq, 6.0MBq, 7.0MBq with a follow-up period of τ = 3 cycles, with each cycle lasting 6 weeks. The trial proceeds as outlined in Section 2, with cohorts of size 3. In addition the following parameter values are used for the rules: nmax=60 and Csuff = 30.

We specify πT*=0.391 as the target probability of DLT for the full follow up period of 3 cycles, corresponding to P(DLT)=0.3 in cycle 1. Following Barnett et al. (Citation2022), this assumes the probability of DLT decreases by a factor of 1/3 in subsequent cycles, conditional on no DLT in the previous cycles. Note that this is also reflective of cumulative toxicity, in line with results from Altzerinakou, Collette, and Paoletti (Citation2019). This choice is made to represent a realistic setting, and can be different based on assumptions on the setting.

We use a combination of five safety scenarios (T1, T2, T3, T4, T5) and four activity scenarios (A1, A2, A3, A4), giving 20 scenarios in total. We then extend to consider different activity patterns in Section 3.6. The five safety scenarios represent four scenarios where there is at least one safe dose and one where no doses are safe. In T1, all doses are safe; in T2, only the highest dose is unsafe; in T3, dose levels 4 and above are unsafe; in T4, dose levels 2 and above are unsafe; and in T5 all doses are unsafe. The four activity scenarios are described by the probability of activity in the whole follow up period, with the lower bound on target activity as πA*=0.2. In A1, all dose levels are active with a plateau reached at dose level 3; in A2, all dose levels are active, starting at a lower activity at dose 1 than A1 and increasing with dose; in A3, dose levels 3 and above are active; and in A4, no dose levels are active. Scenarios with lower numbers are therefore most safe/active. Here we assume the probability of activity in the total follow up is three times the probability of activity in the first cycle. In Section 3.6, we alter this assumption.

Scenarios are referred to as Tx.Ay where x is the safety scenario, y is the activity scenario. gives the individual safety and activity scenarios, while gives the utility and AT /AE values for each of the 20 combination scenarios.

Table 1 Scenario definitions for activity and toxicity.

Table 2 Utility for the 20 scenarios, with OBD highlighted in bold and acceptable doses highlighted in italics.

In , it can be seen that the utility criterion and AT /AE criterion do not always agree on the OBD. For example in scenarios T3.A2 and T3.A1, the 3.5MBq dose is the OBD according to the utility criterion, and the 1.5MBq dose is the OBD according to the AT /AE criterion. We have chosen to designate the 3.5MBq dose as the true OBD as this is more realistic, with a P(DLT) equal to target and a higher P(Activity) than the lower dose. This does, however, highlight an issue concerning the AT /AE criteria, in that in maximizing the ratio between the two areas under the curves, it does not target any specific safety level. Hence, when ratios are similar across doses, even when values are not, there may be difficulties in selection of the optimal dose.

3.2 Data Generation

To generate the event times for toxicities and activity, tT and tA , we use a model that is not based on the assumptions of any of the methods. We use a bivariate log-normal Lognormal2) distribution for event times, with parameters matched to the first cycle and full follow up probabilities of toxicity and activity, with a correlation parameter of–1/2. In the range of scenarios considered, all implicitly assume a positive association between activity and toxicity. The negative association is between activity and toxicity event times only, under the specifications of the probabilities described. This reflects the likelihood that early activity outcomes are associated with DLT outcomes later, although this is not a strong association. This is a simple yet effective mechanism to generate data that allows for easy specification of probabilities at dose levels that do not themselves follow any specific parametric dose-response model. (tTtA)Lognormal2((μTμA),(σT2σTσA2σTσA2σA2)).

3.3 Prior Distributions

Since all methods are Bayesian, we must specify prior distributions for the parameters of the models.

3.3.1 Joint TITE-CRM and Joint CRM

Here we use priors (β0log(β1))N2((c1c2),(v100v2)). for both toxicity and activity. Values of hyper-parameters used are: c1,T=log(1/16), c2,T=log(1/4), v1,T=1, v2,T=2 and c1,A=3, c2,A=0.2, v1,A=1, v2,A=1. The values for toxicity are calibrated over a range of scenarios (see Mozgunov et al. Citation2021). The values for activity are chosen so that all doses have prior mean (πA)> 0.3 and this increases with dose, with average effective prior sample size of one patient per dose level. We used a vague prior for ψN(0,100).

3.3.2 Model-Assisted Approach (Liu and Johnson Citation2016)

Here we use the priors aA=(0.2,0.3,0.4,0.45,0.5,0.6), bA=(0.95, 0.9, 0.8, 0.75, 0.7, 0.65), aT=(0.05, 0.1, 0.2, 0.25, 0.3, 0.35) and bT=(0.95, 0.9, 0.8, 0.75, 0.7, 0.65), as suggested by Liu and Johnson (Citation2016).

3.3.3 AT /AE Design (Yuan and Yin Citation2009)

Here, we follow the priors implemented by Yuan and Yin (Citation2009), with some minor modification to fit our setting. λA , αA , βA , λT , αT , and βT are assigned independent Gamma priors with shape and rate parameters of 0.1. These are constrained to be less than 3, 4, 1, 2, 3, and 1, respectively. The modification is of the upper bound of the β to account for the differing values of doses in our implementation to the original implementation. The uniform priors for π and ϕ are πU(0.6,1) and ϕU(0,5).

3.3.4 TITE-B Design (Yan et al. Citation2019)

For both θ and βl, a N(0,1.34) prior is used. Following Yan et al. (Citation2019), the skeleton for the toxicity model is (0.02, 0.06, 0.12, 0.20, 0.30, 0.40) and the L = 11 efficacy skeletons used are: Q=(q1q2q3q4q5q6q7q8q9q10q11)=(0.590.500.400.300.200.120.500.590.500.400.300.200.400.500.590.500.400.300.300.400.500.590.500.400.200.300.400.500.590.500.120.200.300.400.500.590.200.300.400.500.590.590.300.400.500.590.590.590.400.500.590.590.590.590.500.590.590.590.590.590.590.590.590.590.590.59)

Equal prior weights are given to all activity working models, and a value of δ = 2 is used to determine the number of candidate models to include in the randomization.

3.4 Results

We conducted 1000 simulations in each of the 20 scenarios. shows the percentage of simulations that select the true OBD if there is one, or the trial is stopped correctly when no true OBD exists in the dose set. The most noticeable feature is that the model-assisted method and the TITE-B design give a lower proportion of the correct recommendations than the two other methods that account for late-onset outcomes (Joint TITE-CRM and AT /AE ). In scenarios where all doses are very safe, the model-assisted method performs poorly, a reflection on the cautious escalation often apparent in model-assisted designs. The TITE-B design also finds such scenarios challenging. The overall performance of the other two model based methods is similar, although with some differences in individual scenarios. For example the performance in T4.A1, where the lowest dose is the OBD, the Joint TITE-CRM method vastly outperforms the AT /AE design, which stops too often early for no admissible doses.

Fig. 1 Proportion of correct and acceptable selections across scenarios.

Fig. 1 Proportion of correct and acceptable selections across scenarios.

Interestingly, the percentage of acceptable selections does not follow the same pattern (). Here we define an “acceptable” selection as one that is both safe and active, regardless of if it is the best in terms of the utility criterion. The Joint TITE-CRM is the best performing overall out of the designs accounting for late-onset outcomes. The model-assisted method performs poorly for example in T1.A4, where the acceptable selections are either the highest dose or stopping because the highest dose is too safe.

In terms of sample size, the model assisted method and TITE-B have a higher sample size on average across scenarios, as shown in while the other three methods have very similar sample sizes.

Fig. 2 Mean sample size and trial duration across scenarios.

Fig. 2 Mean sample size and trial duration across scenarios.

Although the performance of the Joint CRM is similar to that of the Joint TITE-CRM, with some slight advantages, this is at the cost of a much greater average trial length. indicates this relationship, with an average trial duration of more than twice that of the Joint TITE-CRM.

The patterns are similar between the model-based methods in terms of total sample sizes, correct and acceptable selections, however, there is a difference in the number of patients treated at unsafe doses, shown in , with the AT /AE method assigning more patients to unsafe doses than both other methods in almost all scenarios. The model assisted method and TITE-B have much fewer patients assigned to unsafe doses, due to the cautious escalation behavior of model-assisted methods and the more strict safety criterion of the TITE-B.

Fig. 3 Number of patients assigned to unsafe doses across scenarios.

Fig. 3 Number of patients assigned to unsafe doses across scenarios.

The overall performance of the five methods in the setting of realistic stopping and enforcement rules can be summarized as follows. The cautious escalation of the model-assisted method gives rise to a poorer performance in terms of selections of correct and acceptable doses, with larger sample sizes and fewer patients exposed to unsafe doses, with the TITE-B having similar operating characteristics. The other two model based designs that account for late onset outcomes perform better, with similar percentages of correct and acceptable recommendations as each other, but with the AT /AE design assigning more patients to unsafe doses, showing a more aggressive escalation that is not rewarded with better selection performance. Additionally, the AT /AE design is substantially more complex and computationally intensive than the other methods, a cost that does not increase the performance. All four approaches considering partial information are vastly superior to the approach not allowing for such information in terms of trial duration.

3.5 Relaxed Stopping Rules

When the stopping rules are relaxed to investigate the operating characteristics of the designs more closely, note that since stopping rule 3 is no longer enforced, there is no longer a correct selection in T1 and so all methods give 0% correct selection. We see in that in most scenarios where there is a true OBD (e.g., T2.A2, T4.A1, T2.A3) the Joint TITE-CRM gives a better performance of correct selections. When there are no admissible doses (e.g., T3.A4, T2.A4) the Joint TITE-CRM performs worse than the AT /AE design and the model assisted approach, but better than the TITE-B design.

Fig. 4 Proportion of correct and acceptable selections across scenarios with relaxed stopping rules.

Fig. 4 Proportion of correct and acceptable selections across scenarios with relaxed stopping rules.

It is noticeable that when considering acceptable selections, the Joint TITE-CRM performs the best out of the methods accounting for partial information in most scenarios. In the absence of the early stopping rules, this provides evidence that the Joint TITE-CRM is benefiting from the model-based inference that the model-assisted method lacks. However, the AT /AE method and TITE-B design also perform worse than the Joint TITE-CRM. For the AT /AE method, this may be due to the deviations from assumptions on the survival model used in the inference and data generation. For the TITE-B method, activity scenarios where many or all doses are not active give a low percentage of acceptable recommendations, since the design weights the dose recommendation based on activity, and does not specify an activity target or include activity in the choice of admissible doses.

shows that the sample size is of course considerably larger when the stopping rules are relaxed, with many scenarios reaching nearly the maximum sample size on average. The Joint TITE-CRM has a larger sample size on average, which tallies up to the observation that this design had fewer correct stoppings for no admissible doses. This suggests that in general, this method is more willing to label a dose as admissible than the other methods.

Fig. 5 Mean sample size and trial duration across scenarios with relaxed stopping rules.

Fig. 5 Mean sample size and trial duration across scenarios with relaxed stopping rules.

The mean trial duration is also drastically increased for all methods, with the Joint CRM having an average duration of up to seven years, compared to the other methods giving less than three years.

With the relaxed stopping rules, the AT /AE method once again has a large number of patients assigned to unsafe doses, owing to a more aggressive escalation, as seen in . The model-assisted method and the TITE-B have much fewer patients assigned to unsafe doses, due to the more cautious escalation.

Fig. 6 Number of patients assigned to unsafe doses across scenarios with relaxed stopping rules.

Fig. 6 Number of patients assigned to unsafe doses across scenarios with relaxed stopping rules.

3.6 Activity Time Trend

In the main evaluation above, the data generation of the activity times assumed that the probability of activity in the first cycle is one third of the probability of activity in the entire follow up of three cycles. However, it may be the case that this is actually higher or lower than one third.

To investigate the effect of activity occurring at various points in the follow up, we consider three activity patterns. In activity pattern 1, the probability of activity in cycle 1 is 1/3 of the total probability; in activity pattern 2, it is 1/6 and in activity pattern 3 in is 1/2. This is to investigate the effect of varying time trends in activity.

It is possible that this change in activity time trend will affect the performance of the three methods that use the time-to-event outcomes, and so we investigate the difference that this can make in a selection of scenarios.

shows the operating characteristics of the four time-to-event methods when the time trend is varied. The Joint CRM is not included here since it cannot account for time trends. There is no difference in the number of patients assigned to unsafe doses, and very little difference to the overall sample size. In terms of selections, the AT /AE design is most variable to the time trend, since this method is more heavily dependent on the survival curve. The Joint TITE-CRM, model-assisted method of Liu & Johnson and TITE-B are more robust to activity time trends.

Fig. 7 Operating characteristics for the three activity time trends.

Fig. 7 Operating characteristics for the three activity time trends.

4 Discussion

In this work, we have compared our designs for a dose-finding trial that uses both toxicity and activity outcomes in the form of time-to-event responses. The comparison has highlighted the impact of both the inference of dose–response relationship and the decision criteria. The challenge presented by a trial using both toxicity and activity outcomes is compounded by the late onset nature of the responses.

The decision criteria used in the AT /AE design, whilst intuitive for survival outcomes, does somewhat deviate from the objective of the dose-finding trial. Without penalty for unsafe doses, it gave an aggressive escalation path in many scenarios. This leads on from the point made regarding , that without targeting any safety level, scenarios with similar true ratios across levels present very challenging for this design. Additionally, the increased computational intensity of this design is not rewarded by an increase in performance.

The decision criteria used in the TITE-B design is less complex than the AT /AE design, but without specification of minimum efficacy, or requiring closeness to target toxicity, it is more challenging to recommend the true OBD. The stricter safety criteria for the admissible dose set leads to a more cautious escalation.

The model-assisted method and the Joint TITE-CRM implemented both used the same utility criterion, which allowed a comparison between the simple inference of Liu and Johnson (Citation2016) and the joint logistic model. Here the increase in complexity is rewarded with increased performance. The Joint TITE-CRM is recommended as an alternative that balances complexity and performance across a range of scenarios.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Data Availability Statement

Software in the form of R code used to produce the presented results is available at https://github.com/helenyb/DF_Late_Activity_Safety

Additional information

Funding

T Jaki and H Barnett received funding from UK Medical Research Council (MC_UU_00002/14).

References

  • Altzerinakou, M.-A., Collette, L., and Paoletti, X. (2019), “Cumulative Toxicity in Targeted Therapies: What to Expect at the Recommended Phase II Dose,” JNCI: Journal of the National Cancer Institute, 111, 1179–1185. DOI: 10.1093/jnci/djz024.
  • Altzerinakou, M.-A., and Paoletti, X. (2020), “An Adaptive Design for the Identification of the Optimal Dose Using Joint Modeling of Continuous Repeated Biomarker Measurements and Time-to-Toxicity in Phase I/II Clinical Trials in Oncology,” Statistical Methods in Medical Research, 29, 508–521. DOI: 10.1177/0962280219837737.
  • Barnett, H., Boix, O., Kontos, D., and Jaki, T. (2022), “Dose Finding Studies for Therapies with Late-Onset Toxicities: A Comparison Study of Designs,” Statistics in Medicine, 41, 5767–5788. DOI: 10.1002/sim.9593.
  • Braun, T. M. (2002), “The Bivariate Continual Reassessment Method: Extending the CRM to Phase I Trials of Two Competing Outcomes,” Controlled Clinical Trials, 23, 240–256. DOI: 10.1016/s0197-2456(01)00205-7.
  • Cheung, Y. K., and Chappell, R. (2000), “Sequential Designs for Phase I Clinical Trials with Late-Onset Toxicities,” Biometrics, 56, 1177–1182. DOI: 10.1111/j.0006-341x.2000.01177.x.
  • Doussau, A., Thiébaut, R., and Paoletti, X. (2013), “Dose-Finding Design Using Mixed-Effect Proportional Odds Model for Longitudinal Graded Toxicity Data in Phase I Oncology Clinical Trials,” Statistics in Medicine, 32, 5430–5447. DOI: 10.1002/sim.5960.
  • Koopmeiners, J. S., and Modiano, J. (2014), “A Bayesian Adaptive Phase i-ii Clinical Trial for Evaluating Efficacy and Toxicity with Delayed Outcomes,” Clinical Trials, 11, 38–48. DOI: 10.1177/1740774513500589.
  • Lin, R., and Yin, G. (2017), “STEIN: A Simple Toxicity and Efficacy Interval Design for Seamless Phase I/II Clinical Trials,” Statistics in Medicine, 36, 4106–4120. DOI: 10.1002/sim.7428.
  • Liu, S., and Johnson, V. E. (2016), “A Robust Bayesian Dose-Finding Design for Phase I/II Clinical Trials,” Biostatistics, 17, 249–263. DOI: 10.1093/biostatistics/kxv040.
  • Mozgunov, P., Knight, R., Barnett, H., and Jaki, T. (2021), “Using an Interaction Parameter in Model-based Phase I Trials for Combination Treatments? A Simulation Study,” International Journal of Environmental Research and Public Health, 18, 1–19. DOI: 10.3390/ijerph18010345.
  • Nebiyou Bekele, B., and Shen, Y. (2005), “A Bayesian Approach to Jointly Modeling Toxicity and Biomarker Expression in a Phase I/II Dose-Finding Trial,” Biometrics, 61, 343–354. DOI: 10.1111/j.1541-0420.2005.00314.x.
  • Sinclair, K., and Whitehead, A. (2014), “A Bayesian Approach to Dose-Finding Studies for Cancer Therapies: Incorporating Later Cycles of Therapy,” Statistics in Medicine, 33, 2665–2680. DOI: 10.1002/sim.6132.
  • Thall, P. F., and Cook, J. D. (2004), “Dose-Finding Based on Efficacy–Toxicity Trade-Offs,” Biometrics, 60, 684–693. DOI: 10.1111/j.0006-341X.2004.00218.x.
  • U.S. Food & Drug Administration. (2023), “Project Optimus,” available at https://www.fda.gov/about-fda/oncology-center-excellence/project-optimus, Accessed 2023-06-05.
  • U.S. National Library of Medicine. (2015), “Safety and Tolerability of BAY1862864 Injection in Subjects With Relapsed or Refractory CD22-positive Non-Hodgkin’s Lymphoma,” NCT02581878, available at https://beta.clinicaltrials.gov/study/NCT02581878.
  • ——— (2018a), “First-in-Human Study of BAY2287411 Injection, a Thorium-227 Labeled Antibody-chelator Conjugate, in Patients With Tumors Known to Express Mesothelin,” NCT03507452, available at https://beta.clinicaltrials.gov/study/NCT03507452.
  • ——— (2018b), “Study to Evaluate the Safety, Tolerability, Pharmacokinetics, and Antitumor Activity of a Thorium-227 Labeled Antibody-chelator Conjugate Alone and in Combination With Darolutamide, in Patients With Metastatic Castration Resistant Prostate Cancer,” NCT03724747, available at https://beta.clinicaltrials.gov/study/NCT03724747.
  • ——— (2020), “A First in Human Study of BAY2701439 to Look at Safety, How the Body Absorbs, Distributes and Excretes the Drug, and How Well the Drug Works in Participants With Advanced Cancer Expressing the HER2 Protein,” NCT04147819. Available at https://beta.clinicaltrials.gov/study/NCT04147819.
  • Yan, D., Tait, C., Wages, N. A., Kindwall-Keller, T., and Dressler, E. V. (2019), “Generalization of the Time-to-Event Continual Reassessment Method to Bivariate Outcomes,” Journal of Biopharmaceutical Statistics, 29, 635–647. DOI: 10.1080/10543406.2019.1634087.
  • Yeung, W. Y., Reigner, B., Beyer, U., Diack, C., Sabanés bové, D., Palermo, G., and Jaki, T. (2017), “Bayesian Adaptive Dose-Escalation Designs for Simultaneously Estimating the Optimal and Maximum Safe Dose based on Safety and Efficacy,” Pharmaceutical Statistics, 16, 396–413. DOI: 10.1002/pst.1818.
  • Yeung, W. Y., Whitehead, J., Reigner, B., Beyer, U., Diack, C., and Jaki, T. (2015), “Bayesian Adaptive Dose-Escalation Procedures for Binary and Continuous Responses Utilizing a Gain Function,” Pharmaceutical Statistics, 14, 479–487. DOI: 10.1002/pst.1706.
  • Yin, G., Li, Y., and Ji, Y. (2006), “Bayesian Dose-Finding in Phase I/II Clinical Trials Using Toxicity and Efficacy Odds Ratios,” Biometrics, 62, 777–787. DOI: 10.1111/j.1541-0420.2006.00534.x.
  • Yuan, Y., and Yin, G. (2009), “Bayesian Dose Finding by Jointly Modelling Toxicity and Efficacy as Time-to-Event Outcomes,” Journal of the Royal Statistical Society, Series C, 58, 719–736. DOI: 10.1111/j.1467-9876.2009.00674.x.