1,667
Views
1
CrossRef citations to date
0
Altmetric
Clinical Measurement

Incorporating patient preferences and burden-of-disease in evaluating ALS drug candidate AMX0035: a Bayesian decision analysis perspective

, , &
Pages 281-288 | Received 23 Mar 2022, Accepted 06 Oct 2022, Published online: 26 Oct 2022

Abstract

Objective

Provide US FDA and amyotrophic lateral sclerosis (ALS) society with a systematic, transparent, and quantitative framework to evaluate the efficacy of the ALS therapeutic candidate AMX0035 in its phase 2 trial, which showed statistically significant effects (p-value 3%) in slowing the rate of ALS progression on a relatively small sample size of 137 patients.

Methods

We apply Bayesian decision analysis (BDA) to determine the optimal type I error rate (p-value) under which the clinical evidence of AMX0035 supports FDA approval. Using rigorous estimates of ALS disease burden, our BDA framework strikes the optimal balance between FDA’s need to limit adverse effects (type I error) and patients’ need for expedited access to a potentially effective therapy (type II error). We apply BDA to evaluate long-term patient survival based on clinical evidence from AMX0035 and Riluzole.

Results

The BDA-optimal type I error for approving AMX0035 is higher than the 3% p-value reported in the phase 2 trial if the probability of the therapy being effective is at least 30%. Assuming a 50% probability of efficacy and a signal-to-noise ratio of treatment effect between 25% and 50% (benchmark: 33%), the optimal type I error rate ranges from 2.6% to 26.3% (benchmark: 15.4%). The BDA-optimal type I error rate is robust to perturbations in most assumptions except for a probability of efficacy below 5%.

Conclusion

BDA provides a useful framework to incorporate subjective perspectives of ALS patients and objective burden-of-disease metrics to evaluate the therapeutic effects of AMX0035 in its phase 2 trial.

Introduction

Amyotrophic lateral sclerosis (ALS) is a fatal motor neuron disease with no curative treatments. Typically, ALS progresses from muscle weakness to death by respiratory paralysis in three to five years (Citation1). In 2015, ALS affected more than 16,500 patients in the US (Citation2), and the nationwide economic burden of ALS is over 1 billion USD (Citation3). In a statement on March 2, 2021, the US Food and Drug Administration (FDA) stated that it is ‘prepared to use all expedited development and approval pathways available’ to facilitate the development and approval of ALS therapeutics (Citation4).

On November 2, 2021, the biotech company Amylyx submitted a New Drug Application (NDA) to the US FDA for its investigational ALS therapeutic AMX0035 (Citation5). This NDA was controversial since the drug candidate had not yet completed its phase 3 clinical trial (Citation6), although its phase 2 trial, with a sample size of 137, showed a statistically significant reduction in the rate of disease progression (Citation7,Citation8). Encouraged by the phase 2 results, two ALS patient advocacy groups submitted over 50,000 signatures to the FDA, calling on the agency to approve AMX0035 (Citation9). In December 2021, the FDA designated Priority Review status to the NDA and, in March 2022, held the first advisory committee meeting at which the members voted 6–4 against AMX0035, citing inconclusive clinical evidence of treatment efficacy. After Amylyx supplied additional clinical data and updated analysis, the FDA convened a second advisory committee meeting in September 2022, at which the members voted 7–2 in favor of approving the drug, amid continued controversy over its efficacy and long-term benefits to patients.

To determine whether clinical evidence justified the NDA for AMX0035, we apply Bayesian decision analysis (BDA) to determine the optimal tradeoff between a false positive with potential side effects of an ineffective drug (type I error) versus a false negative in which patients cannot access an effective drug (type II error). We find that the BDA-optimal type I error most closely associated with the AMX0035 phase 2 trial parameters ranges from 2.6% to 26.3% (benchmark: 15.4%) when the signal-to-noise ratio ranges from 25% to 50% (benchmark: 33%)— which is higher than the 3% p-value reported in that trial (Citation7)—and even higher for other plausible measures of burden-of-disease.

Literature review

The traditional approach to assessing the statistical evidence of a randomized clinical trial (RCT) is to compare the p-value of the standardized test statistic associated with the trial outcome against the desired type I error (false positive) rate, usually 2.5% for a one-tailed hypothesis test and 5% for a two-tailed test. Trial outcomes with p-values below this threshold are deemed statistically significantly different from the null hypothesis of no effect, supporting regulatory approval, while those above it are deemed statistically indistinguishable from the null hypothesis and do not support approval. In practice, regulators use other contextual factors when deciding whether to authorize a drug in addition to statistical significance, including clinical significance and key secondary endpoints.

The question raised and answered by BDA is ‘Are patient preferences consistent with using a fixed 2.5% significance level threshold for type I error rate?’. For fatal diseases with no existing treatments, patients may be willing to accept a much higher false positive rate, especially if it yields a lower false negative rate or type II error, as is often the case. For example, suppose the conventional 2.5% type I error is associated with a type II error of 25%. A glioblastoma patient who has exhausted the standard of care may accept a type I error of 20% if it is associated with a type II error of 10%. Since such patients have no other recourse, the relative importance of false positives and negatives should reflect their circumstances. On the other hand, diabetes patients may require a much lower type I error threshold of 1.3% (Citation10) as there already are treatments to enhance their quality of life.

Regulatory authorities recognize the challenge facing patients with no treatment options and have developed mechanisms for expediting the approval process for potentially effective treatments. FDA offers four special designations—fast track, breakthrough therapy, accelerated approval, and priority review—that involve faster reviews and/or use surrogate endpoints to judge efficacy. However, published descriptions (Citation11,Citation12) do not indicate any differences in the statistical thresholds used in these programs versus the standard approval process, nor do they mention adapting these thresholds to the severity of the disease. One reason is that, even under the traditional thresholds of statistical significance, drugs with severe side effects still manage to receive regulatory approval (Citation13–16). Therefore, regulatory agencies mandated to protect the public’s health are understandably reluctant to adopt more risk-tolerant statistical criteria for drug approvals.

However, under the simplifying assumption that the drug’s approval by the FDA depends solely on the statistical significance of its efficacy, there is a significant tradeoff between the risk of false positives and that of false negatives given the fixed clinical trial properties (sample size, effect size, inclusion criteria, etc.), so being risk averse with respect to one criterion often means being risk tolerant with respect to the other criterion. BDA seeks to balance the risks of both criteria by minimizing the expected loss to patients, defined as the weighted sum of the measured impact of false positives and false negatives, weighted by their probabilities. This yields optimal false-positive and false-negative rates that reflect the different costs of each type of error, offering the ‘greatest good for the greatest number.’

Montazerhodjat et al. (Citation17) find that the optimal type I errors determined by BDA were often much larger than 2.5% for terminal cancers with short survival times and no effective therapies (e.g., a 47.5% BDA-optimal type I error for glioblastoma), and smaller than 2.5% for less serious cancers with longer survival times and multiple effective therapies (e.g., a 0.9% BDA-optimal type I error for early-stage prostate cancer). Isakov et al. (Citation10) provide corresponding results for the 25 most lethal diseases in the US.

Chaudhuri et al. (Citation18) apply Bayesian patient-centered models to anti-infective therapeutics, incorporating epidemiological models to determine the optimal type I errors during outbreaks of infectious disease. Most recently, a survey of over 2,700 Parkinson’s disease (PD) patients (Citation19) finds that risk thresholds in a BDA framework for new neurostimulative devices in the treatment of PD increase markedly with the perceived benefit of the device to the patient. BDA has also been applied retrospectively to medical devices for treating obesity (Citation20) and adaptive platform (Citation21), and is being considered as a prospective input to design trials for kidney replacement therapies (Citation22).

BDA applications require more information than the traditional approach, e.g., the losses under both types of errors must be specified. Several metrics have been purposed in the health technology assessment literature, including quality adjusted life years (QALYs) to assess burden-of-disease and patient survey tools that directly gauge patient preference (Citation22).

The most challenging issue in implementing this framework is the consequence of a larger number of false positives. This can be addressed by creating a temporary license to market ‘speculative’ therapies that expires after a short period (e.g. two to three years) (Citation10). During this period, the licensee is required to collect and share data on the safety and efficacy of its therapy. If the results are positive, the license converts to a standard approval. Otherwise the therapy is withdrawn upon expiration. Regulators should have the right to terminate the temporary license at any time in response to adverse events or significantly negative data. Such licenses would greatly accelerate the pace of therapeutic development for many underserved medical needs without limiting regulatory flexibility.

Methods

We adopt the BDA model proposed in (Citation10,Citation20) and calibrate its parameters using the available data for ALS and AMX0035. The assumed values of the parameters are listed in . We calibrate our BDA model to follow the phase 2 trial design of AMX0035 (Citation7) as closely as possible by assuming an imbalanced two-arm RCT in which participants are randomly assigned to treatment and control arms with a ratio of 2:1. We denote the size of the treatment and control arms by 2n and n, respectively.

Table 1 Assumed values of parameters in the Bayesian clinical trial model.

The Bayesian cost matrix is shown in . Here, N=16,583 denotes the estimated prevalence of ALS in the US in 2015 (Equation2) and  DFt=ert is a discount factor (Citation20) that models the patient’s temporal preferences, where r is the annual discount rate. Higher values of r reflect a lower willingness to wait from the patient. t is the estimated length of the regulatory process given by t=s+ntrialη+f+τ () where ntrial represents the number of patients enrolled in the clinical trial (and is optimized by the BDA), ntrialη is interpreted as the patient’s enrollment time, and ntrialη+f represents the clinical trial duration from the first patient to the last patient. The cost of type I error, c1, is proportional to the loss per patient due to the adverse effects of ALS treatment. We assume that c1=0.07, the value used in (Citation10), which accounts for the adverse effects of all medical treatments. This is likely to be an overestimate of c1 for AMX0035 for two reasons. First, most adverse effects reported in the phase 2 trial of AMX0035 are gastrointestinal events (Citation7), which are milder than many adverse medical effects used to estimate c1 (e.g. amputation of a limb, traumatic brain injury, etc.). In addition, there is abundant clinical evidence to support the safety of the two drugs in the AMX0035 combination therapy (sodium phenylbutyrate and TUDCA). Sodium phenylbutyrate was approved by the FDA in 1996 to treat urea cycle disorders, and TUDCA was tested in a small phase 2 trial (with 34 patients) in 2012 to treat ALS with no significant adverse effects reported (Citation23).

Table 2 Assumed cost matrix Cij for Bayesian decision analysis. Here N denotes the prevalence of ALS in the US, DFt the time discount factor, and c1 and c2 the costs of type I and II errors, respectively.

Similarly, the cost of type II error, c2, is proportional to the loss due to the disease burden of ALS suffered by each patient. We use the heuristic proposed in (Citation10) to estimate c2: (1)  c2=D+YLDD+N(1)

Here, D denotes the number of deaths caused by the disease, YLD is the number of years lived with disability, and N is the disease prevalence, measured in age-standardized values. So far, we have not identified any study in the literature that estimates the disease severity specifically for ALS. Instead, we use the corresponding values for all motor neuron diseases (Citation24)—which include ALS, spinal muscular atrophy, hereditary spastic paraplegia, primary lateral sclerosis, progressive muscular atrophy, and pseudobulbar palsy—as a proxy. The age-standardized values of the parameters are D=0.46, YLD=1.0 and N=4.5 (per 100,000 individuals). Using Equation 1, we have c2=0.29. For comparison, the authors of (Citation10) estimate the disease severity of brain cancer (c2=0.30) and leukemia (c2=0.21).

We also consider an alternative definition of disease severity using disability-adjusted life years (DALY) instead of YLD in Equation 1. The motivation behind using DALY is to account for the physical and mental afflictions of ALS patients caused by the exacerbation of muscular atrophy and respiratory paralysis over the span of three to five years. We find that the DALY of medical adverse effects is 53.1 while that of ALS is 13.2 (per 100,000 individuals) (Citation10,Citation24). Consequently, the disease severity estimate using DALY is c˜1=0.16 for medical adverse effects and c˜2=2.75 for ALS. Since the ratio of type II versus type I errors, c2/c1, is much higher for the DALY estimates (16.81) than YLD estimates (4.37), we expect that the corresponding Bayesian optimal type I error rates will be higher for DALY estimates as well.

The objective of BDA is to minimize the expected loss incurred by the patients given by Equation 2, (2) C(n,λ)=p0 (αC10+(1α)C00)+p1 (βC01+(1β)C11)(2) where C00, C01, C10, and C11 are shown in , p1=1p0 is the prior probability of having an effective drug, and α and β are the type I and type II error rates, respectively. In line with the clinical equipoise principle (Citation25), we use a non-informative prior by setting p0=p1=50% as the baseline value of our BDA model. We verify the robustness of our results against a wide range of p0 which extends to 95% to match the historical probability of failure of ALS drugs (Citation26). The mathematical details are available in Supplementary Materials.

As shown in Equation 2, the BDA framework requires two separate types of input. The first input corresponds to the patient’s costs, which we estimate using burden-of-disease data, potential treatment side effects, and patient preferences. This input should depend only on the disease (through objective burden-of-disease metrics) and the patient’s subjective preferences for this disease. The second input concerns the expected efficacy of the treatment, which affects Equation 2 through α and β (Section S1.3 of the Supplementary Materials). Hence, unlike the patient costs, α and β depend specifically on the particular efficacy endpoint defined in the RCT. Since our objective is to study the AMX0035 drug, we assess the efficacy of the drug using the primary endpoint for its phase 2 trial, namely the revised ALS functional rating scale (ALSFRS-R) score, and calibrate the BDA model using data from the trial (Citation7).

In addition, we generalize the approach and use the BDA framework to calculate the optimal significance level in the hypothetical scenario that the trial uses survival time (rather than the ALSFRS-R score) as its primary endpoint. We calibrate the model using the survival properties (hazard rate, hazard ratio, and observation period) reported for the AMX0035 trial (Citation8). To test the robustness of our results, we compare the results against a calibration that uses data from the Riluzole trial (Citation27).

Results

We calculate the BDA-optimal type I error rate α* under an ALSFRS-R primary endpoint when the sample size of the treatment arm (n1=89) and control arm (n2=48) are fixed and chosen to match the actual sizes of the phase 2 trial of AMX0035. summarizes the results where the disease severities, c1 and c2, are calibrated using YLD and DALY.

Table 3 Optimal type I error rate for phase 2 trial of AMX0035 with 89 patients in the treatment arm and 48 in the control arm.

For both YLD and DALY measures, we find α* above 10% for values of p070% and for a signal-to-noise ratio ρ<0.5. When the trial size is fixed, the results are identical between YLD and DALY calibrations when the optimal power reaches its 80% upper bound (Equation S8 of Supplementary Materials). In the baseline model (p0=50% and ρ=0.33), we obtain an α*=15.4% for both calibrations, which is four times higher than the reported p-value of 3% for AMX0035. However, under a higher prior probability of failure p0=95%, we find α*=1.5% (YLD) and α*=2.6% (DALY). Using a prior probability of success of 5% may be overly conservative in this case given the preliminary clinical evidence on the safety and efficacy of AMX0035 (Citation7,Citation23).

In contrast, we apply BDA to a hypothetical ALS drug that uses a survival time endpoint and provide the results in and . compares BDA outputs when using AMX0035 or Riluzole trial data for calibration using the same sizes of control and treatment arms as in AMX0035’s phase 2 trial. The α* for such a hypothetical drug is always above 3% when p095%.

Table 4 Optimal type I error rate for a hypothetical ALS therapy under a phase 2 trial of AMX0035 with 89 patients in the treatment arm and 48 in the control arm.

Table 5a Optimal sample size and type I error rate for a hypothetical ALS therapy with randomization ratio 2:1 and h0 calibrated with AMX0035 survival data. The disease severity measures, c1 and c2, are calibrated using YLD and DALY. hr denotes the hazard ratio of ALS treatment versus placebo; tobs denotes the observation time of the trial; and n* and α* denote the Bayesian optimal sample size and type I error rate, respectively.

summarize the results where the burden of disease measures c1 and c2 under a YLD calibration (top panel) and a DALY calibration (bottom panel). The baseline hazard ratio hr is calibrated using survival data from the AMX0035 () and Riluzole () trials, and p0 is fixed to 50%. To simulate different scenarios of treatment effects and prior probabilities of an effective drug, we simultaneously vary the values of the hazard ratio, hr, and the observation period of the clinical trial, tobs.

Table 5b Optimal sample size and type I error rate for a hypothetical ALS therapy with randomization ratio 2:1 and h0 calibrated with Riluzole survival data. The disease severity measures, c1 and c2, are calibrated using YLD and DALY. hr denotes the hazard ratio of ALS treatment versus placebo; tobs denotes the observation time of the trial; and n* and α* denote the Bayesian optimal sample size and type I error rate, respectively.

When jointly optimizing the sample size and the type I error rate, we find that α* exceeds 3% for all combinations of parameter values of hazard ratio and observation period. When the hazard ratio of an effective drug is 0.6 (the hazard ratio of AMX0035), BDA recommends using a significance level above 15% and a smaller sample size than in the AMX0035 phase 2 trial, reflecting the urgency of the unmet medical needs of ALS patients.

As expected, we find that the α* computed using the YLD burden-of-disease measure are more conservative (i.e., lower) than the corresponding values computed using a DALY burden-of-disease measure, since the latter reflect the afflictions caused by the progression of ALS and have a higher cost ratio, c2/c1.

Discussion

The BDA framework formalizes the notion that potentially effective treatments for terminal diseases with no effective treatments such as ALS should be evaluated with a higher p-value threshold than the traditional 2.5% or 5% value. The optimal type I error for the ALSFRS-R clinical endpoint under the baseline model is α*=15.4%, using the conservative YLD estimates for ALS severity, which lies between the corresponding values for lung cancer (13.7%) and pancreatic cancer (23.9%) found in (Citation10) and constitutes a reasonable reflection of the prevalence and severity of ALS.

Our analysis has limitations that need to be addressed in future work. A more accurate and rigorous procedure is needed to estimate the loss of type II error (i.e., ALS disease severity, c2). The results of our BDA model are highly sensitive to disease severity estimates, hence input from ALS medical experts and patient advocates should be incorporated into the final values used by decision makers (Citation15). Our heuristic to estimate c2 (Equation 1) uses the mortality rate and YLD of ALS, while the phase 2 trial of AMX0035 reports the reduction in the ALS disease progression (measured by the ALSFRS-R) as its primary outcome (Citation7). This reduction in ALS progression must be translated into an equivalent reduction in YLD or DALY to more accurately gauge the benefits of an effective treatment to patients.

Conclusion

Based on the phase 2 trial results of AMX0035, BDA suggests that the benefits of therapeutic effects seem to outweigh the risks of adverse effects. That perspective strikes an optimal tradeoff between type I and type II errors, yielding a BDA-optimal p-value threshold which, under a wide range of realistic assumptions, is consistently higher than the reported value of 3% of the trial data. While we recognize the complexity of factors involved in the regulatory decision, BDA provides a systematic, repeatable, and transparent framework to incorporate both subjective perspectives of ALS patients and objective burden-of-disease metrics when evaluating the therapeutic effects observed in the phase 2 clinical trial of AMX0035.

Acknowledgements

The authors thank two anonymous reviewers for their valuable comments and suggestions. Research support from MIT Laboratory for Financial Engineering is gratefully acknowledged. The views and opinions expressed in this article are those of the authors only, and do not necessarily represent the views and opinions of any institution or agency, any of their affiliates or employees, or any of those acknowledged above.

Declaration of interest

QX reports personal investments in publicly traded pharmaceutical companies. JC and ZBC have no conflict of interest to disclose. AWL reports personal investments in private biotech companies, biotech venture capital funds, and mutual funds. AWL is a co-founder and principal of QLS Advisors LLC, a healthcare investments advisor, and QLS Technologies LLC, a healthcare analytics and consulting company; an advisor to Apricity Health, Aracari Bio, BrightEdge Impact Fund, Enable Medicine, FINRA, Health at Scale, Lazard, NIH/NCATS, Quantile Health, Roivant Social Ventures, SalioGen Therapeutics, Swiss Finance Institute, and Thalēs; and a director of AbCellera, Annual Reviews, Atomwise, BridgeBio Pharma, and Roivant Sciences. During the most recent six-year period, AWL has received speaking/consulting fees, honoraria, or other forms of compensation from: AbCellera, AlphaSimplex Group, Annual Reviews, Apricity Health, Aracari Bio, Atomwise, Bernstein/Fabozzi Jacobs Levy Award, BridgeBio Pharma, Cambridge Associates, Chicago Mercantile Exchange, Enable Medicine, Financial Times, Harvard Kennedy School, IMF, Journal of Investment Management, Lazard, National Bank of Belgium, New Frontier Advisors/Markowitz Award, Oppenheimer, Princeton University Press, Q Group, QLS Advisors, Quantile Health, Research Affiliates, Roivant Sciences, SalioGen Therapeutics, Swiss Finance Institute, and WW Norton.

Data availability statement

The data supporting the results reported in the article can be found in the cited references. The software for Bayesian decision analysis is available upon reasonable request to the corresponding author.

Additional information

Funding

QX, JC, and ZBC gratefully acknowledge research support from MIT Laboratory for Financial Engineering. No direct funding was received for this study and no funding bodies had any role in study design, data collection and analysis, decision to publish, or preparation of this manuscript. The authors were personally salaried by their institutions during the period of writing (though no specific salary was set aside or given for the writing of this manuscript).

References