2,668
Views
2
CrossRef citations to date
0
Altmetric
Clinical

Using the ALSFRS-R in multicentre clinical trials for amyotrophic lateral sclerosis: potential limitations in current standard operating procedures

, , , , , , , , , & show all
Pages 500-507 | Received 31 Oct 2021, Accepted 05 Dec 2021, Published online: 24 Dec 2021

Abstract

Objective: Uniform data collection is fundamental for multicentre clinical trials. We aim to determine the variability, between ALS trial centers, in the prevalence of unexpected or implausible improvements in the revised ALS functional rating scale (ALSFRS-R) score, and its associations with individual patient and item characteristics.

Methods: We used data from two multicentre studies to estimate the prevalence of an unexpected increase or implausible improvement in the ALSFRS-R score, defined as an increase of 5 points or more between two consecutive, monthly visits. For each patient with a 5-point or more increase, we evaluated the individual contribution of each ALSFRS-R item.

Results: Longitudinal ALSFRS-R scores, originating from 114 trial centers enrolling a total of 1,240 patients, were analyzed. A 5-point or more increase in ALSFRS-R total score was found in 151 (12.2%) patients, with prevalence per study center ranging from 0% to 83%. Bulbar onset, faster disease progression at enrollment, and a lower ALSFRS-R score at baseline were associated with a sudden 5-point or more increase in the ALSFRS-R total score. ALSFRS-R items 2 (saliva), 9 (stairs), 10 (dyspnea), and 11 (orthopnea) were the primary drivers when a 5-point or more increase occurred.

Conclusions: Sudden 5-point or more increases in ALSFRS-R total scores between two consecutive visits are relatively common. These sudden increases were not found to occur with equal frequency in trial centers; which underscores the need for amending existing standard operating procedures toward a universal version and monitoring of data quality during the study, in multicentre research.

Introduction

Amyotrophic lateral sclerosis (ALS) is characterized by unrelenting functional loss over time with extensive variation between patients (Citation1). Currently, the revised ALS functional rating scale (ALSFRS-R) is the most commonly used primary endpoint in clinical trials (Citation2). Regulators (Citation3,Citation4) encourage the use of the ALSFRS-R which had proven validity and reliability (Citation5,Citation6). The ALSFRS-R contains 12 items evaluating different aspects of the patient’s daily physical functioning and symptomatology, and the score has been shown to be related to overall survival time (Citation7).

Despite being widely adopted in both clinical trials and care, rating of the ALSFRS-R in a clinical trial context may not be straightforward and requires specific training (Citation8,Citation9). Item categories may be interpreted differently depending on the rater. For example, the natural history for item 10 (dyspnea) should be a uniformly declining function over time as ALS is a progressive disorder and the ALSFRS-R is intended to monitor disease progression (Citation5). At a certain point in the disease, however, noninvasive ventilation may be initiated, resolving symptoms of dyspnea. This increases the score of item 10, which may falsely indicate a true improvement in the patient’s respiratory function, but also change the interpretation of item 10 as it now reflects the residual presence of symptoms under respiratory support.

The above scenario should be resolved, preferably using standard operating procedures (SOPs), and by training evaluators to ensure adequate and uniform scoring strategies across raters and centers. Nevertheless, training may be suboptimal, or the SOP may overlook certain clinical scenarios, which could result in unnatural sudden changes in ALSFRS-R scores. Long-term improvements in the ALSFRS-R, i.e. reversals, have been reported (Citation10–12), but the prevalence of sudden increases in ALSFRS-R trajectories remains unknown. These may highlight limitations in the current SOPs and scoring strategies, which could impact the accuracy of patient monitoring. In this study, therefore, we aim to determine the prevalence of sudden unexpected increases, or implausible improvements, in ALSFRS-R between two subsequent visits, and evaluate variability between centers in clinical trials. In addition, we explore which patient- and item-related characteristics are associated with the prevalence of these sudden increases in the total score.

Methods

Individual participant data

For this study, we used data from two multicentre clinical trials to estimate the prevalence of sudden increases in ALSFRS-R total scores. The EMPOWER clinical trial was a randomized placebo-controlled clinical trial that evaluated the safety and efficacy of dexpramipexole in patients with ALS (Citation13). A total of 942 patients were enrolled in 80 trial centers across 11 countries between March 2011 and September 2011. The primary outcome was the Combined Assessment of Function and Survival (CAFS), a joint-rank score of the ALSFRS-R and survival time, at 12 months (Citation14). The ALSFRS-R was measured at monthly intervals for at least 12 months. The second trial assessed the safety and efficacy of ozanezumab compared to placebo in patients with ALS (Citation15). A total of 303 patients were enrolled in 34 trial centers across 11 countries between December 2012 and November 2013. The primary outcome was the joint-rank analysis of function (ALSFRS-R) and survival at week 48, with ALSFRS-R scores obtained at monthly intervals. Both studies concluded a lack of efficacy; therefore, anonymised individual patient data from both the placebo and active arms were used in the current study. For both studies, raters and centers were certified by ALSFRS-R outcome measure training, and employed SOP guidance as provided by the Northeast ALS Consortium (NEALS) (Citation16).

Statistical analysis

We considered an increase of 5 points or more between two consecutive monthly visits to be an unnatural, sudden change. The cutoff was based on test-retest reliability data published previously (Citation8), indicating that scores within patients may vary up to 4.3 points due to random variability. Per patient, we calculated the sequential difference between two longitudinal ALSFRS-R measurements. To illustrate, if the ALSFRS-R total score was 43 at screening, 42 at baseline, and 40 at month 1, the sequential difference between visits was −1 (baseline–screening) and −2 (month 1- baseline). Subsequently, we evaluated whether a patient encountered any sequential difference equal to, or larger than +5 points and flagged these patients as having an unnatural, sudden change. Finally, we determined the number of patients with at least one sudden increase of 5 points or more per trial and per trial center. As a sensitivity analysis, we adjusted the sequential difference between visits for the time between two visits, due to the fact that visits may not occur exactly at monthly intervals (adjusted difference = difference/time between visits in months). A patient was subsequently flagged as having an unnatural, sudden change if the adjusted difference was larger than or equal to +5 points per month.

To distinguish the variability in prevalence between centers from random noise and potential underperformance of a particular center, for each center we calculated the probability of observing the number of patients with a 5-point or more increase out of the total number of patients enrolled in that center, given the average background prevalence observed in the other trial centers. For example, one can calculate that the probability of observing 2 patients with a 5-point or more increase out of the 4 enrolled patients per center, with a background prevalence of 15%, based on the binomial distribution, is 0.098. Centers with a probability less than 0.05 were flagged as potential outliers. In case of significant variability between centers, logistic regression models were used to evaluate whether the center prevalence of patients with a sudden increase depended on the center’s number of enrolled patients.

Descriptive statistics were used to compare patients with and without a sudden 5-point or more increase. Baseline data were summarized using the mean and standard deviation (SD) for continuous variables, or number and percentage for categorical variables. Means or proportions were compared using Student’s t or Chi-square tests, respectively. Finally, for each patient with a 5-point or more increase, we evaluated the individual contribution of each ALSFRS-R item by calculating the change per item and expressing it as a proportion of the total change.

Results

In total, 14,297 longitudinal ALSFRS-R scores were analyzed; these originated from 114 trial centers enrolling a total of 1,240 patients. Five patients in the dexpramipexole trial were excluded from the analysis as only one ALSFRS-R measurement was available. In the dexpramipexole trial, the number of enrolled patients per center varied from 1 to 37 per center (median 9 patients per center), whereas in the ozanezumab trial, this ranged from 2 to 23 per center (median 8 patients per center).

Prevalence of sudden increases in the dexpramipexole and ozanezumab trials

Pooled across trials, we identified a total of 151 patients with at least one 5-point or more increase between two consecutive, monthly visits. In the dexpramipexole trial, 123 out of 937 (13.1%) patients had at least one 5-point or more increase, and in the ozanezumab trial, 28 out of 303 (9.2%) patients, resulting in an average prevalence of 12.2% (95% CI 10.4% to 14.2%). This percentage was similar when adjusted for the time between visits (12.1% in dexpramipexole and 9.9% in the ozanezumab trial). Importantly, among trial centers there was extensive variability in the prevalence of sudden increases, illustrated in for the 36 largest trial centers of the dexpramipexole trial. Prevalence per center ranged from 0% to 83% in the dexpramipexole trial and from 0% to 40% in the ozanezumab trial. In Figure e1 (dexpramipexole) and Figure e2 (ozanezumab) we present the observed prevalence for each center.

Figure 1 Center variability in the prevalence of a 5-point or greater increase in ALSFRS-R total score. Raw 12-month ALSFRS-R data from the 36 trial centers with the largest number of enrolled patients in the dexpramipexole trial. Per center, the number of patients with a 5-point or more increase in ALSFRS-R total score are highlighted in red. The percentage per center indicates the proportion of patients with an increase, which ranges from 0% to 83%.

Figure 1 Center variability in the prevalence of a 5-point or greater increase in ALSFRS-R total score. Raw 12-month ALSFRS-R data from the 36 trial centers with the largest number of enrolled patients in the dexpramipexole trial. Per center, the number of patients with a 5-point or more increase in ALSFRS-R total score are highlighted in red. The percentage per center indicates the proportion of patients with an increase, which ranges from 0% to 83%.

Exploring between-Center variability: chance vs. underperformance

To distinguish the variability between centers from random noise and potential underperformance of a particular center, for each center we calculated the probability of observing the number of sudden increases given the average background prevalence observed in the other trial centers and the number of enrolled patients. These probabilities are presented in . To illustrate: the probability of observing 10 sudden increases in 12 enrolled patients is 4 in one-hundred million, which makes it highly unlikely that this number of sudden increases is due to chance, and may suggest underperformance of a trial center compared to the other centers. In total, we identified 13 (16.3%) sites in the dexpramipexole trial that had a probability lower than 5%, suggesting a high likelihood of systematic differences between centers. The variability between centers in the dexpramipexole trial was related to the number of enrolled patients (odds ratio per patient 0.88, 95% CI 0.1678 to 0.2099, p= 0.029, ), where sites that enrolled a higher number of patients had, on average, a lower prevalence of patients with a 5-point or more increase. In contrast, between-center variability in the ozanezumab trial fell within the expected range, suggesting that centers were performing similarly and no clear underperforming centers could be identified.

Figure 2 Probability of observing the number of patients with a 5-point or more increase per trial center. To distinguish the variability in prevalence between centers from random noise and potential underperformance of a particular center, for each center we calculated the probability of observing a particular number of patients with a 5-point or more increase (cases) out of the total number of patients enrolled in that center, given the average background prevalence observed in other trial centers. Centers with a probability less than 5% (dotted line) were flagged as potential outliers.

Figure 2 Probability of observing the number of patients with a 5-point or more increase per trial center. To distinguish the variability in prevalence between centers from random noise and potential underperformance of a particular center, for each center we calculated the probability of observing a particular number of patients with a 5-point or more increase (cases) out of the total number of patients enrolled in that center, given the average background prevalence observed in other trial centers. Centers with a probability less than 5% (dotted line) were flagged as potential outliers.

Figure 3 Number of enrolled patients vs. prevalence of sudden 5-point or more increase. Relationship between the number of enrolled patients per trial center who participated during the dexpramipexole trial, and their association with the prevalence of a sudden 5-point or more increase. Darker dots represent overlapping centers. Solid line: regression line estimate with 95% confidence interval. Dashed line: average prevalence in the dexpramipexole trial. OR: odds ratio; CI: confidence interval; No: number.

Figure 3 Number of enrolled patients vs. prevalence of sudden 5-point or more increase. Relationship between the number of enrolled patients per trial center who participated during the dexpramipexole trial, and their association with the prevalence of a sudden 5-point or more increase. Darker dots represent overlapping centers. Solid line: regression line estimate with 95% confidence interval. Dashed line: average prevalence in the dexpramipexole trial. OR: odds ratio; CI: confidence interval; No: number.

Patient characteristics associated with potential measurement errors

In , we provide the individual characteristics of patients with and without a 5-point or more increase. Patients with high baseline ALSFRS-R scores have fewer 5-point or more increases during follow-up, possibly as they are less likely to gain 5 points or more in their ALSFRS-R total score. Patients with bulbar onset and faster disease progression (expressed as ΔFRS) (Citation7) were more likely to have a 5-point or more increase.

Table 1 Baseline characteristics of patients with and without a 5-point or more increase in ALSFRS-R total score.

Item characteristics associated with potential measurement errors

Finally, in we show the average change scores for each individual ALRFRS-R item in which no sudden increase occurred, averaged over the measurements that were flagged as a 5-point or more increase. Despite differences in observed frequency of sudden increases between the dexpramipexole and ozanezumab trials, items 2, 9, 10, and 11 were consistently identified as the primary drivers of a 5-point or more increase in ALSFRS-R total scores; this may reflect potential scoring difficulties for these questions.

Figure 4 Contribution of individual items to a sudden 5-point or more increase. Mean, proportional contribution of individual ALSFRS-R items to a sudden 5-point or more increase. For example, in the dexpramipexole trial, there were 123 patients with a 5-point or more increase during follow-up with a mean increase in total score of 6.3 points, of which 0.3 points (5%) were due to an increase in Item 1. If each item was equally responsible for the mean increase in total score, one would expect that each item would be accountable for 1/12 (8.3%, dashed line).

Figure 4 Contribution of individual items to a sudden 5-point or more increase. Mean, proportional contribution of individual ALSFRS-R items to a sudden 5-point or more increase. For example, in the dexpramipexole trial, there were 123 patients with a 5-point or more increase during follow-up with a mean increase in total score of 6.3 points, of which 0.3 points (5%) were due to an increase in Item 1. If each item was equally responsible for the mean increase in total score, one would expect that each item would be accountable for 1/12 (8.3%, dashed line).

Discussion

In this study, we show the relatively common occurrence of sudden, large increases in ALSFRS-R total scores between two consecutive monthly measurements. These sudden increases did not occur with equal frequency in trial centers, which underscores the potential presence of an external cause, such as a difference in scoring strategies among trial centers, or dissimilarities between raters. We found several patient- and item characteristics that were associated with the prevalence of sudden increases. Identified patient characteristics were bulbar onset, faster disease progression at enrollment, and a lower ALSFRS-R score at baseline. ALSFRS-R items that were associated sudden increases were items 2 (saliva), 9 (stairs), 10 (dyspnea), and 11 (orthopnea). Given the identified patient- and item related characteristics, an important source of these sudden increases may be related to the initiation of symptomatic interventions, especially for respiratory and bulbar symptomatology. Although the study staff of both clinical trials were well-trained, our results indicate that the current SOPs may leave room for improvement and highlight the potential benefit of real-time monitoring of data quality to ensure SOP conformity.

The unnatural, large, sudden increases in the ALSFRS-R total score, and especially the imbalance in distribution among trial centers, suggest that these increases are most likely the result of a limitation in the ALSFRS-R itself rather than due to a biological mechanism. Given that several items in the bulbar and respiratory domains were marked as important drivers for sudden increases. The vulnerability of these two domains to sudden increases is further substantiated by the finding that the increases occurred significantly more often in patients with low respiratory and bulbar scores at baseline. Especially for these domains several symptomatic treatment options are available, a main challenge for the ALSFRS-R scoring is how to handle symptomatic interventions. Both the dexpramipexole and the ozanezumab study employed SOP guidance as provided by the Northeast ALS Consortium (NEALS) (Citation16). In the NEALS SOP, items 2 and 10 are rated irrespective of treatment, which may cause sudden increases in scores when treatments are initiated. A straight-forward solution could be to score these items 0 as soon as a symptomatic treatment is started. A disadvantage is that, if interest lays in the benefit of the symptomatic treatment, the ALSFRS-R is no longer a feasible endpoint. The ALSFRS-R, however, was developed to monitor disease progression and to quantify the efficacy of experimental treatments that are disease-modifying in clinical trials. From a clinical trial perspective, therefore, it would be preferable if an improvement in ALSFRS-R score resulted only from an experimental intervention. It is thus important to separate the effect caused by a potential symptomatic intervention from the effect caused by the experimental intervention; furthermore, initiation of a symptomatic intervention should reflect natural disease progression.

The initiation of a symptomatic treatment might not only cause a sudden increase in ALSFRS-R (e.g. salivation therapy improving the patient condition from severe drooling to no salivary excess), but may also be a reflection of day-to-day variation in the patient’s symptomatology. In this study, we did not look at small sudden changes in the ALSFRS-R items and subdomains, but given that the random variation for the subdomains ranges from 1.6 to 2.4 points (Citation8), a few items coincidently improving between two consecutive visits could also result in a 5-point or more increase. Just as with the symptomatic interventions, the effect of these natural improvements should be minimized, so that the natural trajectory of the ALSFRS-R becomes a uniformly declining function over time. This highlights not only the importance of facilitating uniform scoring strategies, but also of continuously evaluating the accuracy of the ALSFRS-R items (Citation17,Citation18). A targeted adjustment of the ALSFRS-R SOP might be justified to develop one universal version to prevent differences in scoring. To ensure broad consensus, this requires a collaborative effort between large ALS trial networks, such as the Northeast ALS Consortium (NEALS) (Citation16), Trial Research Initiative to Cure ALS (TRICALS) (Citation19), and the Motor Neurone Disease group Australia (Citation20).

Although sudden increases might be related to limitations in the ALSFRS-R, underperformance of individual trial centers may play a role. By calculating the probability of the proportion of sudden increases, we were able to get an impression of which individual centers could have been flagged with suspected underperformance during trial conduct. The results demonstrate that the number and the degree of deviation of the outlier centers was higher in the dexpramipexole trial, compared to the ozanezumab trial. These differences could very well be due to an improvement in training and refined standardization of the ALSFRS-R scoring strategies, as the ozanezumab study was conducted four years after the dexpramipexole study (in particular in case of overlapping sites or raters). However, the ozanezumab study additionally employed a central in-stream blinded monitoring system during the study, to identify outlier efficacy data values at patient or site level triggering data queries to the sites. Interestingly, we found that high proportions of sudden increases in trial centers of the dexpramipexole trial occurred more often in centers with a low number of enrolled patients. This finding is consistent with existing literature that points out that factors such as reaching enrollment goals may be related to center-related performance in data quality, highlighting the importance of recognizing centers of excellence via disease networks (Citation17,Citation21).

Our study has several limitations. First, a true improvement cannot be entirely ruled out in individual cases (Citation10). For example, dietary supplements or other experimental treatments may have led to a real improvement in function (Citation11). Second, although our analysis indicated that the number of enrolled patients per trial center was an explanatory factor for the occurrence of sudden increases, the available data did not allow us to analyze other center characteristics that were potentially associated with sudden increases, such as previous trial experience. However, our results, supported by previous literature, indicate that preliminary selection and interim assessment of participating trial centers, could potentially contribute to improvement of data quality (Citation17). Finally, the influence of different raters for the same patient, and the influence of unknown placebo effects, as a source of unwanted variability could not be estimated. However, longitudinal scoring by the same rater, possibly supported by video review (Citation22) of expert raters, could contribute to optimizing data quality. Since adjusted SOPs cannot prevent all sources of variation, for example inadequate training of raters, video review and other methods for monitoring of data quality (including real-time monitoring) are likely to be of important added value.

In conclusion, the results of this study suggest that sudden increases in consecutive ALSFRS-R total scores occur relatively frequently in multicentre studies. We found that these sudden increases did not occur with equal frequency in trial centers. In addition, multiple ALSFRS-R items were related to sudden increases, especially score for the items in the bulbar and respiratory domains, which can be impacted by available symptomatic treatments. Patients with a bulbar onset, a low ALSFRS-R baseline score and a faster disease progression were more likely to have a sudden increase. To facilitate adequate and uniform handling of improvements due to symptomatic treatment, a targeted adjustment of the SOP, and corresponding skill-training is warranted, requiring a global effort to define one universal version. In addition, multicentre research could benefit from methodology to monitor for data quality, as well as interim video reviews by expert raters.

Supplemental material

Supplemental Material

Download TIFF Image (862.5 KB)

Supplemental Material

Download TIFF Image (1.6 MB)

Acknowledgements

The authors acknowledge the ALS patients who participated in the dexpramipexole and ozanezumab trials, and their contribution to the search for a cure for ALS. We would also like to thank Biogen and GlaxoSmithKline for sharing their clinical research data.

Declaration of interest

L. Kendall and N. Epstein are current and S.S. Han and A. Lavrov former employees of GlaxoSmithKline and held shares in the company. The other authors report no conflict of interest.

Additional information

Funding

This study was funded by the Netherlands ALS foundation

References