Abstract
Objective: Uniform data collection is fundamental for multicentre clinical trials. We aim to determine the variability, between ALS trial centers, in the prevalence of unexpected or implausible improvements in the revised ALS functional rating scale (ALSFRS-R) score, and its associations with individual patient and item characteristics.
Methods: We used data from two multicentre studies to estimate the prevalence of an unexpected increase or implausible improvement in the ALSFRS-R score, defined as an increase of 5 points or more between two consecutive, monthly visits. For each patient with a 5-point or more increase, we evaluated the individual contribution of each ALSFRS-R item.
Results: Longitudinal ALSFRS-R scores, originating from 114 trial centers enrolling a total of 1,240 patients, were analyzed. A 5-point or more increase in ALSFRS-R total score was found in 151 (12.2%) patients, with prevalence per study center ranging from 0% to 83%. Bulbar onset, faster disease progression at enrollment, and a lower ALSFRS-R score at baseline were associated with a sudden 5-point or more increase in the ALSFRS-R total score. ALSFRS-R items 2 (saliva), 9 (stairs), 10 (dyspnea), and 11 (orthopnea) were the primary drivers when a 5-point or more increase occurred.
Conclusions: Sudden 5-point or more increases in ALSFRS-R total scores between two consecutive visits are relatively common. These sudden increases were not found to occur with equal frequency in trial centers; which underscores the need for amending existing standard operating procedures toward a universal version and monitoring of data quality during the study, in multicentre research.
Introduction
Amyotrophic lateral sclerosis (ALS) is characterized by unrelenting functional loss over time with extensive variation between patients (Citation1). Currently, the revised ALS functional rating scale (ALSFRS-R) is the most commonly used primary endpoint in clinical trials (Citation2). Regulators (Citation3,Citation4) encourage the use of the ALSFRS-R which had proven validity and reliability (Citation5,Citation6). The ALSFRS-R contains 12 items evaluating different aspects of the patient’s daily physical functioning and symptomatology, and the score has been shown to be related to overall survival time (Citation7).
Despite being widely adopted in both clinical trials and care, rating of the ALSFRS-R in a clinical trial context may not be straightforward and requires specific training (Citation8,Citation9). Item categories may be interpreted differently depending on the rater. For example, the natural history for item 10 (dyspnea) should be a uniformly declining function over time as ALS is a progressive disorder and the ALSFRS-R is intended to monitor disease progression (Citation5). At a certain point in the disease, however, noninvasive ventilation may be initiated, resolving symptoms of dyspnea. This increases the score of item 10, which may falsely indicate a true improvement in the patient’s respiratory function, but also change the interpretation of item 10 as it now reflects the residual presence of symptoms under respiratory support.
The above scenario should be resolved, preferably using standard operating procedures (SOPs), and by training evaluators to ensure adequate and uniform scoring strategies across raters and centers. Nevertheless, training may be suboptimal, or the SOP may overlook certain clinical scenarios, which could result in unnatural sudden changes in ALSFRS-R scores. Long-term improvements in the ALSFRS-R, i.e. reversals, have been reported (Citation10–12), but the prevalence of sudden increases in ALSFRS-R trajectories remains unknown. These may highlight limitations in the current SOPs and scoring strategies, which could impact the accuracy of patient monitoring. In this study, therefore, we aim to determine the prevalence of sudden unexpected increases, or implausible improvements, in ALSFRS-R between two subsequent visits, and evaluate variability between centers in clinical trials. In addition, we explore which patient- and item-related characteristics are associated with the prevalence of these sudden increases in the total score.
Methods
Individual participant data
For this study, we used data from two multicentre clinical trials to estimate the prevalence of sudden increases in ALSFRS-R total scores. The EMPOWER clinical trial was a randomized placebo-controlled clinical trial that evaluated the safety and efficacy of dexpramipexole in patients with ALS (Citation13). A total of 942 patients were enrolled in 80 trial centers across 11 countries between March 2011 and September 2011. The primary outcome was the Combined Assessment of Function and Survival (CAFS), a joint-rank score of the ALSFRS-R and survival time, at 12 months (Citation14). The ALSFRS-R was measured at monthly intervals for at least 12 months. The second trial assessed the safety and efficacy of ozanezumab compared to placebo in patients with ALS (Citation15). A total of 303 patients were enrolled in 34 trial centers across 11 countries between December 2012 and November 2013. The primary outcome was the joint-rank analysis of function (ALSFRS-R) and survival at week 48, with ALSFRS-R scores obtained at monthly intervals. Both studies concluded a lack of efficacy; therefore, anonymised individual patient data from both the placebo and active arms were used in the current study. For both studies, raters and centers were certified by ALSFRS-R outcome measure training, and employed SOP guidance as provided by the Northeast ALS Consortium (NEALS) (Citation16).
Statistical analysis
We considered an increase of 5 points or more between two consecutive monthly visits to be an unnatural, sudden change. The cutoff was based on test-retest reliability data published previously (Citation8), indicating that scores within patients may vary up to 4.3 points due to random variability. Per patient, we calculated the sequential difference between two longitudinal ALSFRS-R measurements. To illustrate, if the ALSFRS-R total score was 43 at screening, 42 at baseline, and 40 at month 1, the sequential difference between visits was −1 (baseline–screening) and −2 (month 1- baseline). Subsequently, we evaluated whether a patient encountered any sequential difference equal to, or larger than +5 points and flagged these patients as having an unnatural, sudden change. Finally, we determined the number of patients with at least one sudden increase of 5 points or more per trial and per trial center. As a sensitivity analysis, we adjusted the sequential difference between visits for the time between two visits, due to the fact that visits may not occur exactly at monthly intervals (adjusted difference = difference/time between visits in months). A patient was subsequently flagged as having an unnatural, sudden change if the adjusted difference was larger than or equal to +5 points per month.
To distinguish the variability in prevalence between centers from random noise and potential underperformance of a particular center, for each center we calculated the probability of observing the number of patients with a 5-point or more increase out of the total number of patients enrolled in that center, given the average background prevalence observed in the other trial centers. For example, one can calculate that the probability of observing 2 patients with a 5-point or more increase out of the 4 enrolled patients per center, with a background prevalence of 15%, based on the binomial distribution, is 0.098. Centers with a probability less than 0.05 were flagged as potential outliers. In case of significant variability between centers, logistic regression models were used to evaluate whether the center prevalence of patients with a sudden increase depended on the center’s number of enrolled patients.
Descriptive statistics were used to compare patients with and without a sudden 5-point or more increase. Baseline data were summarized using the mean and standard deviation (SD) for continuous variables, or number and percentage for categorical variables. Means or proportions were compared using Student’s t or Chi-square tests, respectively. Finally, for each patient with a 5-point or more increase, we evaluated the individual contribution of each ALSFRS-R item by calculating the change per item and expressing it as a proportion of the total change.
Results
In total, 14,297 longitudinal ALSFRS-R scores were analyzed; these originated from 114 trial centers enrolling a total of 1,240 patients. Five patients in the dexpramipexole trial were excluded from the analysis as only one ALSFRS-R measurement was available. In the dexpramipexole trial, the number of enrolled patients per center varied from 1 to 37 per center (median 9 patients per center), whereas in the ozanezumab trial, this ranged from 2 to 23 per center (median 8 patients per center).
Prevalence of sudden increases in the dexpramipexole and ozanezumab trials
Pooled across trials, we identified a total of 151 patients with at least one 5-point or more increase between two consecutive, monthly visits. In the dexpramipexole trial, 123 out of 937 (13.1%) patients had at least one 5-point or more increase, and in the ozanezumab trial, 28 out of 303 (9.2%) patients, resulting in an average prevalence of 12.2% (95% CI 10.4% to 14.2%). This percentage was similar when adjusted for the time between visits (12.1% in dexpramipexole and 9.9% in the ozanezumab trial). Importantly, among trial centers there was extensive variability in the prevalence of sudden increases, illustrated in for the 36 largest trial centers of the dexpramipexole trial. Prevalence per center ranged from 0% to 83% in the dexpramipexole trial and from 0% to 40% in the ozanezumab trial. In Figure e1 (dexpramipexole) and Figure e2 (ozanezumab) we present the observed prevalence for each center.
Exploring between-Center variability: chance vs. underperformance
To distinguish the variability between centers from random noise and potential underperformance of a particular center, for each center we calculated the probability of observing the number of sudden increases given the average background prevalence observed in the other trial centers and the number of enrolled patients. These probabilities are presented in . To illustrate: the probability of observing 10 sudden increases in 12 enrolled patients is 4 in one-hundred million, which makes it highly unlikely that this number of sudden increases is due to chance, and may suggest underperformance of a trial center compared to the other centers. In total, we identified 13 (16.3%) sites in the dexpramipexole trial that had a probability lower than 5%, suggesting a high likelihood of systematic differences between centers. The variability between centers in the dexpramipexole trial was related to the number of enrolled patients (odds ratio per patient 0.88, 95% CI 0.1678 to 0.2099, p= 0.029, ), where sites that enrolled a higher number of patients had, on average, a lower prevalence of patients with a 5-point or more increase. In contrast, between-center variability in the ozanezumab trial fell within the expected range, suggesting that centers were performing similarly and no clear underperforming centers could be identified.
Patient characteristics associated with potential measurement errors
In , we provide the individual characteristics of patients with and without a 5-point or more increase. Patients with high baseline ALSFRS-R scores have fewer 5-point or more increases during follow-up, possibly as they are less likely to gain 5 points or more in their ALSFRS-R total score. Patients with bulbar onset and faster disease progression (expressed as ΔFRS) (Citation7) were more likely to have a 5-point or more increase.
Item characteristics associated with potential measurement errors
Finally, in we show the average change scores for each individual ALRFRS-R item in which no sudden increase occurred, averaged over the measurements that were flagged as a 5-point or more increase. Despite differences in observed frequency of sudden increases between the dexpramipexole and ozanezumab trials, items 2, 9, 10, and 11 were consistently identified as the primary drivers of a 5-point or more increase in ALSFRS-R total scores; this may reflect potential scoring difficulties for these questions.
Discussion
In this study, we show the relatively common occurrence of sudden, large increases in ALSFRS-R total scores between two consecutive monthly measurements. These sudden increases did not occur with equal frequency in trial centers, which underscores the potential presence of an external cause, such as a difference in scoring strategies among trial centers, or dissimilarities between raters. We found several patient- and item characteristics that were associated with the prevalence of sudden increases. Identified patient characteristics were bulbar onset, faster disease progression at enrollment, and a lower ALSFRS-R score at baseline. ALSFRS-R items that were associated sudden increases were items 2 (saliva), 9 (stairs), 10 (dyspnea), and 11 (orthopnea). Given the identified patient- and item related characteristics, an important source of these sudden increases may be related to the initiation of symptomatic interventions, especially for respiratory and bulbar symptomatology. Although the study staff of both clinical trials were well-trained, our results indicate that the current SOPs may leave room for improvement and highlight the potential benefit of real-time monitoring of data quality to ensure SOP conformity.
The unnatural, large, sudden increases in the ALSFRS-R total score, and especially the imbalance in distribution among trial centers, suggest that these increases are most likely the result of a limitation in the ALSFRS-R itself rather than due to a biological mechanism. Given that several items in the bulbar and respiratory domains were marked as important drivers for sudden increases. The vulnerability of these two domains to sudden increases is further substantiated by the finding that the increases occurred significantly more often in patients with low respiratory and bulbar scores at baseline. Especially for these domains several symptomatic treatment options are available, a main challenge for the ALSFRS-R scoring is how to handle symptomatic interventions. Both the dexpramipexole and the ozanezumab study employed SOP guidance as provided by the Northeast ALS Consortium (NEALS) (Citation16). In the NEALS SOP, items 2 and 10 are rated irrespective of treatment, which may cause sudden increases in scores when treatments are initiated. A straight-forward solution could be to score these items 0 as soon as a symptomatic treatment is started. A disadvantage is that, if interest lays in the benefit of the symptomatic treatment, the ALSFRS-R is no longer a feasible endpoint. The ALSFRS-R, however, was developed to monitor disease progression and to quantify the efficacy of experimental treatments that are disease-modifying in clinical trials. From a clinical trial perspective, therefore, it would be preferable if an improvement in ALSFRS-R score resulted only from an experimental intervention. It is thus important to separate the effect caused by a potential symptomatic intervention from the effect caused by the experimental intervention; furthermore, initiation of a symptomatic intervention should reflect natural disease progression.
The initiation of a symptomatic treatment might not only cause a sudden increase in ALSFRS-R (e.g. salivation therapy improving the patient condition from severe drooling to no salivary excess), but may also be a reflection of day-to-day variation in the patient’s symptomatology. In this study, we did not look at small sudden changes in the ALSFRS-R items and subdomains, but given that the random variation for the subdomains ranges from 1.6 to 2.4 points (Citation8), a few items coincidently improving between two consecutive visits could also result in a 5-point or more increase. Just as with the symptomatic interventions, the effect of these natural improvements should be minimized, so that the natural trajectory of the ALSFRS-R becomes a uniformly declining function over time. This highlights not only the importance of facilitating uniform scoring strategies, but also of continuously evaluating the accuracy of the ALSFRS-R items (Citation17,Citation18). A targeted adjustment of the ALSFRS-R SOP might be justified to develop one universal version to prevent differences in scoring. To ensure broad consensus, this requires a collaborative effort between large ALS trial networks, such as the Northeast ALS Consortium (NEALS) (Citation16), Trial Research Initiative to Cure ALS (TRICALS) (Citation19), and the Motor Neurone Disease group Australia (Citation20).
Although sudden increases might be related to limitations in the ALSFRS-R, underperformance of individual trial centers may play a role. By calculating the probability of the proportion of sudden increases, we were able to get an impression of which individual centers could have been flagged with suspected underperformance during trial conduct. The results demonstrate that the number and the degree of deviation of the outlier centers was higher in the dexpramipexole trial, compared to the ozanezumab trial. These differences could very well be due to an improvement in training and refined standardization of the ALSFRS-R scoring strategies, as the ozanezumab study was conducted four years after the dexpramipexole study (in particular in case of overlapping sites or raters). However, the ozanezumab study additionally employed a central in-stream blinded monitoring system during the study, to identify outlier efficacy data values at patient or site level triggering data queries to the sites. Interestingly, we found that high proportions of sudden increases in trial centers of the dexpramipexole trial occurred more often in centers with a low number of enrolled patients. This finding is consistent with existing literature that points out that factors such as reaching enrollment goals may be related to center-related performance in data quality, highlighting the importance of recognizing centers of excellence via disease networks (Citation17,Citation21).
Our study has several limitations. First, a true improvement cannot be entirely ruled out in individual cases (Citation10). For example, dietary supplements or other experimental treatments may have led to a real improvement in function (Citation11). Second, although our analysis indicated that the number of enrolled patients per trial center was an explanatory factor for the occurrence of sudden increases, the available data did not allow us to analyze other center characteristics that were potentially associated with sudden increases, such as previous trial experience. However, our results, supported by previous literature, indicate that preliminary selection and interim assessment of participating trial centers, could potentially contribute to improvement of data quality (Citation17). Finally, the influence of different raters for the same patient, and the influence of unknown placebo effects, as a source of unwanted variability could not be estimated. However, longitudinal scoring by the same rater, possibly supported by video review (Citation22) of expert raters, could contribute to optimizing data quality. Since adjusted SOPs cannot prevent all sources of variation, for example inadequate training of raters, video review and other methods for monitoring of data quality (including real-time monitoring) are likely to be of important added value.
In conclusion, the results of this study suggest that sudden increases in consecutive ALSFRS-R total scores occur relatively frequently in multicentre studies. We found that these sudden increases did not occur with equal frequency in trial centers. In addition, multiple ALSFRS-R items were related to sudden increases, especially score for the items in the bulbar and respiratory domains, which can be impacted by available symptomatic treatments. Patients with a bulbar onset, a low ALSFRS-R baseline score and a faster disease progression were more likely to have a sudden increase. To facilitate adequate and uniform handling of improvements due to symptomatic treatment, a targeted adjustment of the SOP, and corresponding skill-training is warranted, requiring a global effort to define one universal version. In addition, multicentre research could benefit from methodology to monitor for data quality, as well as interim video reviews by expert raters.
Supplemental Material
Download TIFF Image (862.5 KB)Supplemental Material
Download TIFF Image (1.6 MB)Acknowledgements
The authors acknowledge the ALS patients who participated in the dexpramipexole and ozanezumab trials, and their contribution to the search for a cure for ALS. We would also like to thank Biogen and GlaxoSmithKline for sharing their clinical research data.
Declaration of interest
L. Kendall and N. Epstein are current and S.S. Han and A. Lavrov former employees of GlaxoSmithKline and held shares in the company. The other authors report no conflict of interest.
Additional information
Funding
References
- van Es MA, Hardiman O, Chio A, Al-Chalabi A, Pasterkamp RJ, Veldink JH, et al. Amyotrophic lateral sclerosis. Lancet. 2017;390:2084–98.
- van Eijk RPA, Kliest T, van den Berg LH. Current trends in the clinical trial landscape for amyotrophic lateral sclerosis. Curr Opin Neurol. 2020;33:655–61.
- Food Drug Administration Center for Drugs Evaluation Research. Guidance for industry: amyotrophic lateral sclerosis: developing drugs for treatment 2019 [Available from: https://www.fda.gov/regulatory-information/search-fda-guidance-documents/amyotrophic-lateral-sclerosis-developing-drugs-treatment-guidance-industry.
- European Medicines Agency. Guideline on clinical investigation of medicinal products for the treatment of amyotrophic lateral sclerosis (ALS) 2016 [Available from: https://www.ema.europa.eu/en/documents/scientific-guideline/guideline-clinical-investigation-medicinal-products-treatment-amyotrophic-lateral-sclerosis_en.pdf.
- Cedarbaum JM, Stambler N, Malta E, Fuller C, Hilt D, Thurmond B, et al. The ALSFRS-R: a revised ALS functional rating scale that incorporates assessments of respiratory function. BDNF ALS Study Group (Phase III). J Neurol Sci. 1999;169:13–21.
- Gordon PH, Miller RG, Moore DH. ALSFRS-R. Amyotroph Lateral Scler Other Motor Neuron Disord. 2004;5(Suppl 1):90–3.
- Kimura F, Fujimura C, Ishida S, Nakajima H, Furutama D, Uehara H, et al. Progression rate of ALSFRS-R at time of diagnosis predicts survival time in ALS. Neurology 2006;66:265–7.
- Bakker LA, Schroder CD, Tan HHG, Vugts S, van Eijk RPA, van Es MA, et al. Development and assessment of the inter-rater and intra-rater reproducibility of a self-administration version of the ALSFRS-R. J Neurol Neurosurg Psychiatry. 2020;91:75–81.
- Franchignoni F, Mandrioli J, Giordano A, Ferro S, Group E. A further Rasch study confirms that ALSFRS-R does not conform to fundamental measurement requirements. Amyotroph Lateral Scler Frontotemporal Degener. 2015;16:331–7.
- Bedlack RS, Vaughan T, Wicks P, Heywood J, Sinani E, Selsov R, et al. How common are ALS plateaus and reversals? Neurology 2016;86:808–12.
- Harrison D, Mehta P, van Es MA, Stommel E, Drory VE, Nefussy B, et al. "ALS reversals": demographics, disease characteristics, treatments, and co-morbidities. Amyotroph Lateral Scler Frontotemporal Degener. 2018;19:495–9.
- Vasta R, D'Ovidio F, Canosa A, Manera U, Torrieri MC, Grassano M, et al. Plateaus in amyotrophic lateral sclerosis progression: results from a population-based cohort. Eur J Neurol. 2020;27:1397–404.
- Cudkowicz ME, van den Berg LH, Shefner JM, Mitsumoto H, Mora JS, Ludolph A, et al. Dexpramipexole versus placebo for patients with amyotrophic lateral sclerosis (EMPOWER): a randomised, double-blind, phase 3 trial. Lancet Neurol. 2013;12:1059–67.
- Berry JD, Miller R, Moore DH, Cudkowicz ME, van den Berg LH, Kerr DA, et al. The combined assessment of function and survival (CAFS): a new endpoint for ALS clinical trials. Amyotroph Lateral Scler Frontotemporal Degener. 2013;14:162–8.
- Meininger V, Genge A, van den Berg LH, Robberecht W, Ludolph A, Chio A, et al. Safety and efficacy of ozanezumab in patients with amyotrophic lateral sclerosis: a randomised, double-blind, placebo-controlled, phase 2 trial. Lancet Neurol. 2017;16:208–16.
- Northeast Amyotrophic Lateral Sclerosis Consortium. [Available from: https://www.neals.org/.
- Dombernowsky T, Haedersdal M, Lassen U, Thomsen SF. Criteria for site selection in industry-sponsored clinical trials: a survey among decision-makers in biopharmaceutical companies and clinical research organizations. Trials 2019;20:708.
- Walker R, Morris DW, Greer TL, Trivedi MH. Research staff training in a multisite randomized clinical trial: methods and recommendations from the stimulant reduction intervention using dosed exercise (STRIDE) trial. Addict Res Theory. 2014;22:407–15.
- van Eijk RPA, Kliest T, McDermott CJ, Roes KCB, Van Damme P, Chio A, et al. TRICALS: creating a highway toward a cure. Amyotroph Lateral Scler Frontotemporal Degener. 2020;21:496–501.
- Motor Neurone Disease group Australia [Available from: https://www.sydney.edu.au/brain-mind/our-research/forefront-ageing-and-neurodegeneration/motor-neurone-disease.html.
- Gehring M, Taylor RS, Mellody M, Casteels B, Piazzi A, Gensini G, et al. Factors influencing clinical trial site selection in Europe: the survey of attitudes towards trial sites in Europe (the SAT-EU Study). BMJ Open. 2013;3:e002957.
- Sutherland AE, Stickland J, Wee B. Can video consultations replace face-to-face interviews? Palliative medicine and the Covid-19 pandemic: rapid review. BMJ Support Palliat Care. 2020;10:271–5.