6,242
Views
600
CrossRef citations to date
0
Altmetric
Invited Symposium

St. George's Respiratory Questionnaire: MCID

, F.R.C.P. , Ph.D.
Pages 75-79 | Published online: 24 Aug 2009

Abstract

The SGRQ is a disease-specific measure of health status for use in COPD. A number of methods have been used for estimating its minimum clinically important difference (MCID). These include both expert and patient preference-based estimates. Anchor-based methods have also been used. The calculated MCID from those studies was consistently around 4 units, regardless of assessment method. By contrast, the MCID calculated using distribution-based methods varied across studies and permitted no consistent estimate. All measurements of clinical significance contain sample and measurement error. They also require value judgements, if not about the calculation of the MCID itself then about the anchors used to estimate it. Under these circumstances, greater weight should be placed upon the overall body of evidence for an MCID, rather than one single method. For that reason, estimates of MCID should be used as indicative values. Methods of analysing clinical trial results should reflect this, and use appropriate statistical tests for comparison with the MCID. Treatments for COPD that produced an improvement in SGRQ of the order of 4 units in clinical trials have subsequently found wide acceptance once in clinical practice, so it seems reasonable to expect any new treatment proposed for COPD to produce an advantage over placebo that is not significantly inferior to a 4-unit difference.

Introduction

The St. George's Respiratory Questionnaire (SGRQ) is well established disease-specific health status developed for asthma and COPD. There is now a wide body of published data in COPD patients including numerous large clinical trials of pharmacological Citation[1-6] and non-pharmacological interventions Citation[7&8]. There is an extensive literature that points to its validity, both from cross-sectional studies between patients, and longitudinally within patients [reviewed in Ref. Citation[[9]]].

Health status questionnaires are used to discriminate differences between patients and evaluate changes within patients. These two properties should not be seen as alternatives, since it is useful for an instrument to have both. An estimate of the minimum clinically important difference (MCID) is required for both applications—which raises the question of whether the same MCID can be used. In theory this should be the case. In a progressive condition such as COPD, the evolution of the disease will follow a similar path in different patients, albeit with different time courses. It follows that factors determining changes in the health of an individual patient over time will be the same as those that determine differences in severity between patients. This should hold true even in a disease with multiple components such as COPD, provided the measurements are made in a population of patients large enough to be representative of the patterns and spectrum of severity seen in the condition. Unlike most health status questionnaires, there is evidence that the SGRQ does behave in a similar manner when used to make comparisons between patients or detect changes within patients. This is illustrated in , which is drawn from data obtained during the initial validation of the questionnaire Citation[[10]]. The charts show the proportion of variance in SGRQ score attributable to differences between patients and changes within patients measured using a range of COPD-related variables. In each case, the variance in SGRQ score has been normalised to 100% to illustrate the relative contribution of each variable to the score under the two measurement conditions. The Activity and Impacts component scores of thequestionnaire were chosen for this figure, because they reveal some of the internal detail of the questionnaire and its associations. It can be seen that the component scores have different patterns of association with specific aspects of COPD but, within an SGRQ domain, the pattern is similar whether analysed as differences between patients or changes within patients over time. This observation is very important since it suggests that the MCID estimated between patients should be similar to that measured within patients.

Figure 1. Proportion of variance in SGRQ Activity and Impacts scores (normalised to 100%) derived from multiple regressions against depression score (from the Hospital Anxiety and Depression Scale), presence of daily wheeze (MRC Respiratory Questionnaire), 6-minute walking distance, and MRC Dyspnea score. “Between” indicates comparisons across patients; “Within” indicates longitudinal comparisons within patients.

Figure 1. Proportion of variance in SGRQ Activity and Impacts scores (normalised to 100%) derived from multiple regressions against depression score (from the Hospital Anxiety and Depression Scale), presence of daily wheeze (MRC Respiratory Questionnaire), 6-minute walking distance, and MRC Dyspnea score. “Between” indicates comparisons across patients; “Within” indicates longitudinal comparisons within patients.

The MCID for the SGRQ

Issues around the development of MCIDs for health questionnaires in asthma and COPD have been discussed at length elsewhere, including a description of the processes used for the SGRQ Citation[[11]]. Four methods have been used: expert preference-based, patient preference-based, anchor-based, and distribution-based.

Expert Preference-Based Estimate

This was the initial approach used for the SGRQ, early in its development. Clinicians experienced in the care of COPD patients were asked to make judgments about the size of difference in a number of COPD-related variables. These were chosen from the factors that previous analysis had shown were related to the SGRQ: exercise capacity, dyspnea, frequency of wheeze and cough, and level of depression Citation[[10]]. The clinicians were asked to state what size of difference in each of those variables would constitute, in their view, a clinically significant difference between two groups of patients Citation[[12]]. These judgments were then used to estimate the MCID for the SGRQ, assuming that differences in these factors between patient groups would not occur in isolation of each other. To calculate the MCID, regression equations similar to those used for were used to estimate what would be the difference in SGRQ score if there were simultaneous clinically meaningful differences in the reference variables. The estimate for the Total SGRQ score was 3.9 units, which was rounded to 4 units. This approach was also applied to the Impacts score of the questionnaire since the validation process had shown that associations between this component and the reference COPD variables were very similar to those seen with the Total score Citation[[10]]. The MCID estimate for the Impacts score was also very similar to that for the Total score Citation[[11]].

The decision to round the MCID to the nearest whole number was taken for two reasons. First, this was a new instrument measuring a concept novel to chronic lung disease so the principal objective of the exercise was to establish what order of magnitude of difference or change along its 0–100 scale would be clinically relevant. Second, the MCID estimate depended on multiple anchors that were themselves based on clinicians' unvalidated judgements about what constituted a clinically significant difference between patients. This is an important point. It will become apparent throughout this paper that all estimates of an MCID, regardless of method, are anchored in one or more value judgments, whether made by patient, physician, research worker, statistician, or policymaker.

The approach just described was a cross-sectional analysis. More recently another expert-based approach has been used, in which clinicians were asked to make a global assessment of COPD severity taking into account factors such as: need for concomitant therapy, number and severity of exacerbations, severity of cough, ability to exercise, and amount of wheezing. They were asked to rate overall severity on an 8-point scale with the descriptors: poor, fair, good, and excellent distributed evenly along its length. The assessments were made at baseline and 1 year later. A 1-point improvement in clinician global assessment score over 1 year (the lowest that could be registered) was associated with an improvement in SGRQ score of 4.2 units, whereas a 4.0 unit worsening in SGRQ was associated with a 2-point change in the global assessment score (PW Jones and T Witek, in preparation). Whilst there appeared to be a difference in the global detection of improvement and deterioration in terms of their association with change in SGRQ, this study showed that the lowest detectable changes in overall physician rating made 1 year apart were associated with changes in the SGRQ that corresponded to its established MCID. As with all global measurements of severity, it is not known which factors were taken into account by the clinicians when making this assessment.

Patient Preference-Based Estimates

In two studies, one in asthma Citation[[11]] the other in COPD Citation[[1]], patients' judgment of treatment efficacy at the end of the study were related to the change in SGRQ score over the study period. The COPD study was 16 weeks long, and the asthma study was 1 year. The smallest positive treatment effect recordable with the scaling categories used (“slightly effective” in the asthma study and “effective” in the COPD study) produced SGRQ scores within a few decimal points of 4.0 units. Both studies used a single retrospective estimate oftreatment efficacy, unlike the clinician expert-based preference rating already described, which was the difference in two global estimates of severity made 1 year apart. These are two very different psychometric exercises, so it is remarkable that they produce such consistent results. There must be some underlying common mechanism, but the processes by which clinicians and patients make global judgments are still very poorly understood.

Anchor-Based Estimates

In the terminology of classical test theory, anchor-based estimates are related to criterion validity. Criteria may take the form of health events such as hospital admissions or death. The latter forms a special case since health cannot be measured after the event has occurred, but death as an outcome can be used to test the predictive validity of a health status questionnaire. A study of patients at the time of discharge from hospital with a diagnosis of COPD found that SGRQ scores were predictive of a composite endpoint of death or readmission within 1 year Citation[[13]]. The difference in SGRQ scores at discharge between died/readmitted or survived without admission was 4.8 units (95% Confidence Interval 1.8, 9.4 units). In another study in COPD, SGRQ scores were related to the Medical Research Council (MRC) dyspnea grade Citation[[14]]. In 32 patients who were housebound (MRC Grade 5), the SGRQ scores were 3.9 units (95% CI 1.8; 9.4 units) worse than in 32 patients who had major impairment of daily activity due to dyspnea but were not housebound (MRC Grade 4).

Two recent prospective observational studies carried out largely in male COPD patients, one in Spain Citation[[15]] the other in Japan Citation[[16]], have related SGRQ scores to subsequent mortality. They produced very similar statistically significant estimates of the increased risk of mortality at 1 year associated with a 4-unit difference in SGRQ score between patients at baseline: Spain 4.0% and Japan 3.3%. These observations are important because they demonstrate consistency in this important relationship between different countries. In terms of validating an MCID, mortality rate seems a very attractive anchor, but raises the questions: What is a clinically important increase in mortality risk in COPD? Who makes that judgment? A discussion of these issues is beyond the scope of this paper, but this example illustrates the point that attempts to establish an MCID usually involve a value judgment at some point in the process.

The anchor-based estimates just described were obtained from cross-sectional studies. Recently, data from a large study of pulmonary rehabilitation were used to estimate the change in SGRQ score that was associated with a clinically significant change in another disease-specific questionnaire—the Chronic Respiratory Questionnaire (CRQ) Citation[[17]]. The estimated MCID for the change in SGRQ Total score using the MCID for the CRQ Dyspnea component was 3.05 units with 95% confidence intervals of 0.39 to 5.71 units. This anchor-based estimate for the MCID of the SGRQ was not significantly different from its indicative MCID of 4 units and the authors of this study state that their estimate was “not far from the value of 4 postulated by Jones et al. as the MID of the SGRQ and the confidence intervals of our estimate include 4.”

Distribution-Based Estimates

These are ostensibly judgment-free methods to determine MCID. They calculate a distribution statistic from the between-patient differences in a patient population then use this parameter, or a derivative, as the threshold for clinical significance. Such approaches are not actually judgment-free since a choice is made in selecting one parameter over another. There are no psychometric grounds or theoretical models of symptom or health perception underlying this approach—which is a weakness. The standard error of the estimate (SEE) of a health status score, and half of the standard deviation (SD), have both been proposed as measures of the instrument's MCID Citation[[18]]. To test this suggestion for the SGRQ, data from 11 published studies in COPD (each n > 100) have been extracted. Using the SEE, the mean estimate for the MCID was 1.3 units, and using 0.5 SD it was 8.4 units. The range of values for both parameters was very wide (). There was no correlation between the mean SGRQ score and the magnitude of either distribution statistic. It appears that these distributions for the SGRQ are highly inconsistent across studies—i.e., they have no reliability. This observation, together with the lack of a theoretical underpinning the methodology, make ithard to justify this approach for a health status measure such as the SGRQ.

Figure 2. Frequency distribution of Standard Error of Estimate and half of one Standard Deviation of baseline SGRQ scores from 11 published studies Citation[[1]],Citation[[3]]Citation[5-7]Citation[15&16]Citation[19&20].

Figure 2. Frequency distribution of Standard Error of Estimate and half of one Standard Deviation of baseline SGRQ scores from 11 published studies Citation[[1]],Citation[[3]]Citation[5-7]Citation[15&16]Citation[19&20].

Clinical Trial Results as Anchors

MCID thresholds are now being used to provide a method of interpreting the results of clinical trials and establish whether the treatment is effective, but this is a recent development. Historically, judgments about efficacy of a new treatment have been driven largely by precedence, using data about existing agents for the same disease as an informal benchmark against which to judge the efficacy of the newer treatment. A similar approach may be used for establishing or at least validating MCIDs. The rationale for this is that, whilst a new drug must demonstrate basic efficacy and safety to be granted a license for use, to gain broad acceptance in clinical practice it has to pass many more hurdles. Whilst clinical trial data may encourage physicians to use a new drug initially its continued use will depend on there being sufficient numbers of patients reporting to their doctors that the drug has provided benefit. If clinical experience with the drug suggests that it is worthwhile, physicians will argue for its inclusion in formularies and guidelines. Formulary and guideline committees look at the overall body of evidence when making a judgement as to whether a drug makes a useful new contribution. By the time the drug appears in international guidelines, many different agencies, both formal and informal, will have made a judgment as to whether it is clinically useful. Thus the size of change in SGRQ scores observed in clinical trials of drugs whose value has been proven in practice should provide an index of the level of change to be expected of new treatments.

A meta-analysis of changes seen in all COPD trials using the SGRQ is beyond the scope of this article, but shows the results of three typical randomised placebo-controlled trials of long-acting bronchodilators that are in wide use, and have been incorporated into national and international guidelines for COPD. All three drugs produced a mean advantage over placebo that was very close to 4 units. Their confidence intervals all included 4 units. It is important to note that SGRQ scores were not a factor in obtaining registration of these drugs. This analysis does not mean necessarily that 4 units is the minimum clinically detectable change in SGRQ score, but it is the size of effect obtained with drugs that have subsequently gained widespread clinical acceptance. The latter requirement could in itself provide an anchor for estimating MCIDs.

Figure 3. Effect of treatment on SGRQ score in three placebo controlled trials of long-acting bronchodilators: salmeterol Citation[[1]], formoterol Citation[[19]], and tiotropium Citation[[2]]. Lower score indicates better health. Error bars are 95% Confidence Intervals.

Figure 3. Effect of treatment on SGRQ score in three placebo controlled trials of long-acting bronchodilators: salmeterol Citation[[1]], formoterol Citation[[19]], and tiotropium Citation[[2]]. Lower score indicates better health. Error bars are 95% Confidence Intervals.

Face Validity

In an attempt to explain what a 4-unit change might mean to a patient, a range of scenarios have been generated and presented elsewhere Citation[[11]]. One example is a patient who reports that, compared to their state before starting treatment, he/she can now wash and dress more quickly, walk up stairs without having to stop, and go out for shopping or entertainment. All three of these improvements would have to occur for the patient to achieve a 4-unit improvement in SGRQ score. At an individual patient level, this seems to be a quite a high hurdle. One widely quoted definition of MCID has stood the test of time: “The smallest difference in score which patients perceive as beneficial and would mandate, in the absence of troublesome side effects and excessive cost, a change in the patient's management” Citation[[20]]. Using this humanistic definition, any one of the three improvements described in the scenario above would be clinically significant if the patient thought that it was worthwhile.

Using SGRQ MCID Estimates

All MCID estimates are made in populations of patients and have both sampling and measurement error, whatever the variable being measured. For this reason, they should be used as indicative values than as absolute thresholds. For example, it would not be correct to conclude that a mean change in SGRQ change of 4.0 units was clinically significant, whereas a change of 3.95 units was not. This poses a serious problem, since decisions have to be made as to whether a treatment effect measured in a clinical trial is clinically significant. One solution is to ignore the mean result and carry out a responder analysis to identify the proportion of patients who improve by more than the MCID. Such analyses have been shown to be relatively independent of the precise threshold chosen with the Chronic Respiratory Questionnaire Citation[[21]] and the SGRQ Citation[[11]]. This approach is attractive, but it still requires a value judgment as to what constitutes a clinically significant proportion of patients who benefit. An alternative methodology has been proposed Citation[[11]]. It employs an approach analogous to tests for non-inferiority in equivalence studies and uses the mean change in score together with its confidence interval (). Using this method, in a trial where the mean change is greater than 4 units, the treatment would be judged “equivalent to a clinically significant effect.” In a trial which the treatment effect was statistically significant, and the upper 95% confidence limit included the MCID, the treatment wouldbe judged “not significantly inferior to a clinically significant effect,” even though the mean effect lay below the MCID. This approach does not overcome the fact that the MCID is not an error-free estimate, but it does place the results on a more probabilistic footing than drawing conclusions about efficacy merely on the relative size of the mean treatment effect and the MCID.

Figure 4. A suggested taxonomy for changes in SGRQ score relative to the minimum clinically important difference (MCID). Lower score indicates better health. Error bars indicate 95% Confidence Intervals.

Figure 4. A suggested taxonomy for changes in SGRQ score relative to the minimum clinically important difference (MCID). Lower score indicates better health. Error bars indicate 95% Confidence Intervals.

Summary

All estimates of clinical significance contain sample and measurement error and require value judgments, if not about the threshold itself, then about the anchor used to estimate the threshold. Even the apparently judgment-free distribution-based methods require choices on the part of the developer. Under these circumstances, greater weight should be placed upon the overall body of evidence for an MCID, rather than one single estimate. In terms of the SGRQ, the estimate appears to be consistently around 4 units regardless of assessment method, apart from those based upon distributions. Furthermore, it appears plausible when compared with other markers of COPD severity. Treatments that produce an improvement of the order of 4 units have found wide acceptance once in use, so it seems reasonable to expect any new treatment proposed for COPD to produce an advantage over placebo that is not significantly inferior to a 4-unit difference.

REFERENCES

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.