1,059
Views
56
CrossRef citations to date
0
Altmetric
Invited Symposium

Bridging the Gap: Using Triangulation Methodology to Estimate Minimal Clinically Important Differences (MCIDs)

, Ph.D. & , Ph.D.
Pages 157-165 | Published online: 24 Aug 2009

Abstract

This paper proposes the use of triangulation methodology to derive guidelines for interpreting change scores on health outcome measures. Triangulation integrates results from global ratings with clinical benchmarks of change, statistical estimates of magnitude, and qualitative data from patients and/or clinicians to derive guidelines that are not field-specific or method bound. A case study is presented to illustrate how this methodology can be applied. Secondary analyses were performed on blinded data from 2,971 patients enrolled in three phase IIIa clinical trials to develop guidelines for interpreting change scores on the Breathlessness Diary (BD), a relatively new approach for evaluating dyspnea outcomes in patients with chronic obstructive pulmonary disease. BD scores were examined by disease severity and rescue medication use. In addition, mean BD change scores by physician global ratings of efficacy were juxtaposed with changes in forced expiratory volume (FEV1) and St. George's Respiratory Questionnaire scores. Percent change, effect size, one-half standard deviation, and the standard error of measurement were used as statistical indicators of magnitude. Data from qualitative interviews provided insight into patient perspectives of change in dyspnea. Taking into consideration results across estimation methods, guidelines were developed for defining large, moderate, and small group-level mean changes on the BD. Areas of divergence and convergence across statistical indicators and clinical benchmarks in this case study highlight the importance of using triangulation methodology to derive guidelines that are both empirically sound and clinically relevant.

Introduction

Published reports of clinical trials generally provide sufficient information upon which clinicians or scientists can judge the statistical significance of the treatment effect. Interpreting the clinical relevance of this effect is generally less clear. Any given report may include a discussion of the magnitude of the effect within the context of outcomes observed in similar published studies, if these data are available. More often, however, readers are left to draw their own conclusions based on their empirical or clinical experiences. Clearly this intuitive approach is insufficient for precise decision making and impossible when new outcome measures are used.

The expanded use of patient-reported outcomes, including the growing number of new methods for evaluating symptoms, function, and health-related quality of life in a variety ofpatient populations, has highlighted the need for measure-specific interpretive guidelines. This, in turn, has lead to a growing body of literature addressing methods for estimating and interpreting the magnitude of within-group change and between-group differences. To date, these methods have emphasized patient or physician global ratings of perceived change and statistical estimates of magnitude, with relatively little discussion of clinical benchmarks. For guidelines to be useful for clinical decision making, however, they must be linked to clinical data.

The purpose of this paper is to briefly review some of the known limitations of global and statistical methodologies for deriving interpretive guidelines, specifically estimating the minimal clinically important difference (MCID), and discuss an alternative approach, triangulation methodology, in which multiple estimation procedures are used to arrive at guidelines that are both empirically sound and clinically relevant. A case study, using the Breathlessness Diary, a new measure of dyspnea for clinical trials involving patients with chronic obstructive pulmonary disease (COPD), is used to illustrate how triangulation methodology can be applied.

Background

The term “minimal clinically important difference” or MCID originated in a 1989 Jaeschke et al. Citation[[1]] paper seeking the best estimate of “the smallest difference in score in the domain of interest which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a change in the patient's management (p. 408).” The investigators in that study compared patients' score changes in a longitudinal treatment design with their global assessments of change over time to develop their MCID standards for the Chronic Respiratory Disease Questionnaire Citation[[2]] and the Chronic Heart Failure Questionnaire Citation[[3]].

Despite the huge contribution of the Jaeschke et al. report towards introducing a useful methodology for the development of relevant change thresholds for patient-reported outcome (PRO) measures, there is strong criticism of the validity of patient global assessments Citation[[4]]. Patients' memories of their prior health state are often inaccurate, and empirical evidence demonstrates that the global assessment most strongly correlates with the patient's current health state without completely incorporating the prior health state into the assessment Citation[[5]]. That is, if a patient is feeling poorly today (Time 2), he will most likely provide a global assessment reflecting deterioration over a given time period (Time 1 to Time 2), even if his health state at Time 1 was very poor or just as severe as his current state. Similarly, if a patient is in good health today, the global assessment would likely be that there was improvement compared to Time 1, even if there was the same state of good health at the prior assessment.

In addition to this cognitive limitation, the Jaeschke et al. approach lacks sufficient clinical data, such as physician appraisal of a change in the patient's clinical status and changes in biomarkers or other physiological disease parameters, that would support the use of the word clinically in the MCID term Citation[[1]]. Recent papers, coming primarily out of the statistical field, propose methods for deriving interpretive guidelines for health-related quality of life (HRQL) and related outcomes that rely on distribution-based approaches, such as the use of one-half a standard deviation (SD) Citation[[6]] and the standard error of measurement (SEM) Citation[[7]]. Although methodologically sound, these approaches also do not directly reference the clinical context, again sidestepping the “clinical” in the minimal clinically important difference.

Qualitative data can provide evidence for the development of interpretive guidelines from a clinical perspective, specifically clinician consensus panels and patient interviews. Several recently published examples of the RAND Appropriateness Method Citation[[8]] used consensus among clinical experts in the field to establish clinically important difference thresholds with results showing promise for the further use and advancement of this methodology Citation[9-11]. Patient interviews, including face-to-face discussions or focus group dialogues, can also provide researchers with a wealth of information on the importance of differing change thresholds from the patient's perspective that cannot be directly elicited through standardized questionnaires, such as the perceived impact of a specific treatment and the treatment's trade-offs. These data offer a deeper understanding of the cognitive processes patients use to determine their improvement, stability or decline over the course of treatment. Although they provide valuable interpretive insights, qualitative data lack the precision necessary to derive numeric guidelines for interpreting quantitative metrics of health.

In order to derive guidelines for interpretation of health outcomes that are clinically meaningful and therefore useful for clinical decision making in research, policy, and practice, there must be a clinical component to the guideline development process. In fact, by combining clinical data with statistically based metrics and insight from clinicians and patients, we should be able to arrive at guidelines that are empirically sound and clinically relevant.

In this paper, we discuss a method for estimating the MCID that extends the work of Jaeschke et al. by integrating global ratings with clinical benchmarks of change and statistical methods, including three distribution-based approaches, for estimating magnitude Citation[[12]]. These quantitative approaches can be complemented by qualitative data from clinical experts or patients, to provide additional insight into the factors that must be considered when recommending guidelines for interpretation. The term “triangulation” is used to refer to this methodology, not unlike its use in reference to the combined application of quantitative and qualitative methods in theory development and testing. Perhaps more pointed in this case, however, is its similarity to the term as it has long been used by surveyors, navigators and military strategists to describe the determination of a third point, given the known position of two other points, and additional information (angle, distance,etc.) about the third point's location in relation to the two known sites. The intent of triangulation in MCID research is to use diverse yet complementary methods to arrive at interpretive guidelines that are not field-specific or method bound, and are therefore a more accurate reflection of clinically meaningful group level change.

The following case study demonstrates how triangulation can be used to understand and interpret the meaning of change scores. For this illustration, we selected a new measure of dyspnea for use in patients with chronic obstructive pulmonary disease (COPD), known as the Breathlessness Diary. The measure was derived from the Breathlessness, Cough, and Sputum Scale (BCSS), whose measurement properties and interpretive guidelines have been described previously Citation[12&13]. Briefly, the Breathlessness Diary asks the patient “How much difficulty did you have breathing today?” each day during the course of the observation period or clinical trial. The patient responds by rating his or her breathlessness on a 5-point Likert-type scale, ranging from 0 to 4, with higher scores indicating a more severe dyspnea. Weekly and period (baseline, treatment, and follow-up) scores can be computed by aggregating daily scores over time. In addition, day-to-day variability and change associated with an acute exacerbation can be described.

Case Example

Data Source

The results presented here come from blinded data from 3,643 patients undergoing treatment during the course of three Phase IIIa, multi-center, multi-national, randomized, double-blind, placebo-controlled clinical trials evaluating the safety and efficacy of sibenadet (Viozan), a novel dual dopamine D2 receptor/β2-adrenoreceptor agonist, developed to treat bronchoconstriction and to ameliorate respiratory symptoms in patients with COPD Citation[14&15]. Two of the trials involved a 12-week treatment period (n = 2,440), the third trial was 26 weeks in length (n = 1,203). Although the development of sibenadet was discontinued due to disappointing efficacy findings Citation[[16]], the data were made available for secondary analyses.

Sample

All patients were between 35 and 80 years of age with a history of COPD ≥ 2 years, a smoking history, and a percent predicted forced expiratory volume in one second (FEV1% predicted) between 20 and 70%. Exclusion criteria included a complicating co-morbid condition or need for domiciliary oxygen. Of the 3,643 patients enrolled in the three trials, data from 2,971 were used in the triangulation analyses, based on the existence of the necessary data. For context, a brief overview of the demographic and clinical characteristics of this sample is provided in . A more detailed description is reported elsewhere Citation[[12]].

Table 1.  Baseline Demographic and Clinical Characteristics of Study Samples by Data Source.

Measures and Methods

Breathlessness Diary

The Breathlessness Diary was administered each day during the course of the trials, as part of the BCSS. Daily scores for each patient were aggregated over the baseline period and again over the final 4 weeks of treatment, defined by the end of study or study discontinuation (Citation[[12]]Citation[14&15]). Change scores were expressed as the difference between baseline and end-of-study values.

Clinical Benchmarks

Four clinical variables were used in these triangulation analyses: physician rating of treatment efficacy, forced expiratory volume in one second (FEV1), rescue medication use, and change in mean St. George's Respiratory Questionnaire (SGRQ) score. Physician rating of treatment efficacy was appraised at the end of the trial, on a 5-point scale, from “highly effective” to “made condition worse.”

FEV1 is a standard clinical indicator in this patient population. Although there is no standard for interpreting change in FEV1, it is commonly believed that group-level improvements greater than 100 cc in patients with COPD can be considered clinically meaningful. FEV1 values are also used to classify patients into severity levels, with the ATS guidelines suggesting 4 levels Citation[[17]] and the ERS guidelines recommending 3 levels Citation[[18]]. Mean baseline Breathlessness Diary score was examined by level of disease severity in two of the three trials, referred to as Study 1 and Study 2, as an example of the value of replication exercises in triangulation analysis.

Rescue medication use was defined as the average number of puffs (actuations) per day. Baseline values for a subset of two of the study populations were used. Patients were selected based on their position in the sample distribution (lowest or highest quartile) and classified as “low” and “high” users, respectively. Mean baseline Breathlessness Diary score was examined by rescue medication use, again using data from Study 1 and Study 2 for replication purposes.

The SGRQ is a widely used condition-specific measure of health status or health-related quality of life (HRQL) Citation[[19]].Scores range from 0 to 100 with higher scores indicating poorer health status. Guidelines for interpretation suggest changes in the SGRQ total score of ± 4 points are clinically meaningful, ± 8 are considered moderate, while ± 12 are interpreted as a large change in health status Citation[[20]]. Change was defined from the baseline to end of treatment.

Statistical Methods

Four statistical approaches were used in these triangulation analyses: percent change, effect size, one-half standard deviation, and the standard error of measurement. Change in Breathlessness Diary score was expressed as a percent change from baseline, to provide a standardized metric on a scale of 0 to 100. This approach is commonly used to understand treatment effects involving symptom outcomes, and could therefore be considered a clinical benchmark as well. Effect size (ES) is a statistical metric in which the mean (M) change from baseline (B) to end of treatment (T) is divided by the standard deviation (σ) at baseline [(MT-MB)/σB] to yield a parameter estimate that describes the magnitude of the observed effect. ES is, in Cohen's terms, “an index of degree of departure from the null hypothesis (p10)” Citation[[21]]. Although this standardized metric is potentially useful, the question still remains: how “big” is a “big” ES and how “small” is a “small” ES. Using insight from the behavioral sciences, Cohen offered the following guidelines for interpretation: 0.80 may be considered large; 0.50 moderate, and 0.20 small, with the following caveat, which is also relevant to the MCID dialogue:

The terms ‘small,’ ‘medium’ and ‘large’ are relative, not only to each other, but to the area of behavioral science or even more particularly to the specific content and research method being employed in any given investigation. In the face of this relativity, there is a certain risk inherent in offering conventional operational definitions for these terms for use in power analysis in as diverse a field of inquiry as behavioral science. This risk is nevertheless accepted in the belief that more is to be gained than lost by supplying a common conventional frame of reference which is recommended for use only when no better basis for estimating the ES index is available. (p. 25).

The same year Jaeschke published his paper on the MCID, Kazis et al. Citation[[22]] published their observations on the use of effect size to translate change into a standard unit of measurement to better understand the health effects of treatment. He found Cohen's guidelines to be a useful interpretive tool for understanding the meaning associated with changes in the Arthritis Impact Measurement Scales (AIMS) in response to pharmacologic treatment. The interpretations offered for the ES estimates in our case example are presented with Cohen's caveat and Kazis et al.'s observations in mind. It is also noteworthy that the advantages of triangulation across methods become more clear as the limitations of each respective method are recognized.

The one-half standard deviation (1/2 SD) approach was described in a recent paper by Norman et al. Citation[[6]]. These authors examined the health-related quality of life (HRQL) literature to locate those studies with a reported minimal threshold or MCID that could be compared to the baseline standard deviation of their original samples. All studies involved patients with chronic disease. The ratio of the magnitude of minimal change level and the baseline standard deviation was averaged within studies with more than one domain of interest, and then across 38 studies to yield a mean effect size of 0.495. Norman et al. then compared this mean result with the enduring outcome of George Miller's 1956 report on “The Magic Number Seven Plus or Minus Two” Citation[[23]]. From the Miller paper, a one unit change in a uniform distribution across seven levels (on average, the limits of human discernment) equals to 0.46 of the distribution's standard deviation (1/2.16), very close to the empirical results seen by Norman and colleagues' review of established minimal thresholds among patients with chronic diseases. Norman et al. argued that this represents a threshold of perception, rather than one of clinical relevance and concluded that a reasonable estimate for a minimal perceptible difference across HRQL measures may be ½ SD. The recommendation is undergoing additional testing. Although the extent to which the generalization might hold beyond HRQL measures is unknown at this time, we use the method here as one contributing method to the larger triangulation procedure.

One standard error of measurement (SEM), which has shown to have direct correspondence with MID thresholds for chronic disease HRQL measures (Citation[[7]]Citation[24-26]), was used as the fourth statistical method. The SEM is computed by taking the standard deviation at baseline and dividing it by the square root of 1 minus the reliability estimate of the measure. Because the Breathlessness Diary is one question gathered repeatedly over time, the reliability estimate is based on its reproducibility over time in stable patients Citation[[13]].

Qualitative Interviews

Eleven patients, 6 women and 5 men with COPD who were not part of the clinical trials nor from the clinical trial sites, participated in cognitive debriefing interviews. Patients were instructed to read and respond to the Breathlessness Diary item, among others, and were then asked what factors they considered in selecting their response and what change on this measure they would consider meaningful. The intent was not to gather quantitative data, but to understand the patient's perspective of the instrument as an indicator of breathlessness and a barometer of change over time. An overview of clinical and demographic characteristics of this sample is provided in .

Results

According to the ½ SD methodology discussed above and using a baseline standard deviation of 0.79 points, the MCID (or perhaps more accurately, the minimal perceptible difference) for the Breathlessness Diary would be expected to beapproximately 0.40 points. The MCID based on the SEM is 0.38 points, taking the baseline standard deviation (0.79) and dividing it by the square root of 1 minus the reliability coefficient (0.77) to arrive at this value. Using these values alone, one might conclude that the MID for the Breathlessness Diary is approximately 0.40 points. However no clinical information has informed this estimate. The number is based solely on the distributional, and in the case of the SEM, reproducibility values.

Mean Breathlessness Diary scores by ATS criteria for disease severity for Study 1 and Study 2 are shown in . Differences across groups were statistically significant with post hoc analyses indicating differentiation between the very severe group and all other groups in Study 1, and between the very severe and all other groups and between the moderate and severe groups in Study 2. The purpose of this exercise, however, was not to test the discriminatory power of the Breathlessness Diary per se, but to examine the mean score differences between these groups as an indication of magnitude and as one step in the process of deriving guidelines for interpretation. In both studies, mean score differences between adjacent groups (a ‘jump’ of one classification) was 0.1 to 0.2. Mean differences between moderate and severe and between moderately severe and very severe (a ‘jump’ of 2 classifications) were larger, ranging from 0.3 to 0.5. An analysis of Breathlessness Diary scores by ERS criteria showed mean differences of 0.0 and 0.1 between mild and moderate, for the 2 studies respectively, and 0.4 between moderate and severe in both samples.

Figure 1. Mean breathlessness diary score by disease severity and studyFootnotea.

Figure 1. Mean breathlessness diary score by disease severity and studyFootnotea.

The comparison of mean Breathlessness Diary scores by rescue medication use during the baseline period of Study 1 and Study 2 are shown in . A difference of 0.4 was found in Study 1 while the mean difference in Study 2 was 0.8. Because these rescue medication groups were at the tail ends of therescue medication distribution, and assuming there is a close link between dyspnea and rescue medication use (a reasonable, clinically based assumption), these mean differences in Breathlessness Diary score should be interpreted as large.

Figure 2. Mean breathlessness diary score by rescue medication use and studyFootnoteb.

Figure 2. Mean breathlessness diary score by rescue medication use and studyFootnoteb.

shows mean change in Breathlessness Diary scores from baseline to end of study by physician rating of efficacy, juxtaposed with mean change in SGRQ, FEV1, and statistical indicators of magnitude for the entire analytical population. Treatment considered highly efficacious was associated with mean change in Breathlessness Diary score greater than 0.50, corresponding to nearly 30% improvement in dyspnea and an effect size nearing 0.8, a value Cohen described as large. Highly effective treatment was also associated with a change in SGRQ total score greater than the 8-point change considered a moderate change in health status, and a 140 ml (10%) improvement in FEV1. For patients in whom treatment was considered moderately efficacious, a mean change in Breathlessness Diary of approximately 0.35 was observed, representing an improvement of almost 20% in dyspnea with an effect size of 0.48. Mean change in SGRQ total score exceeded the 4-point guideline for meaningful improvement in health status, while FEV1 improved by 80 ml (6%). The mean value of 0.35 is not unlike those observed between moderate and very severe patients and between high and low rescue medication users. Taken together, these results indicate mean values that exceed 0.5 on the Breathlessness Diary should be interpreted as large and clinically impressive, while mean values of or around 0.35 are associated with noteworthy improvements in patients' overall health-related quality of life and should be interpreted as moderate and clinically significant.

Table 2.  Change from Baseline to End of Study in Mean Breathlessness Diary Scores by Physician Rating of Efficacy, Juxtaposed with Mean Change in SGRQ, FEV1, and Statistical Indicators of Magnitude (N = 2,971).

Moving down the efficacy rating, in search of a minimal clinically important difference, treatment that physicians considered “mildly effective” was associated with an improvement in Breathlessness Diary score of 0.21, a 10% improvement in dyspnea, with an associated effect size of 0.28.In this case, mean change in SGRQ score was less than the 4-point guideline for minimal change, with no change in FEV1. The mean score change in the mildly effective group was similar to cross sectional differences observed between patients with moderate to moderately severe disease, or between patients with severe to very severe disease. Together, these data suggest mean value differences of approximately 0.20 on the Breathlessness Diary may be small and yet clinically meaningful. Additional analyses should be performed to test this value further and determine whether this difference value can be interpreted as minimally clinically important.

The qualitative interviews indicated patients understood the differences in breathlessness associated with each of the response options, with several subjects describing differences in activity associated with each. One patient, for example, stated “To have selected a 3 (“Marked—noticeable when washing or dressing”) instead of 2 (“Moderate—noticeable during light activity”), I would have had to be purposely concentrating on pursed lip breathing.” Others said “To select a 3 instead of a 2, I would have needed to rest between carrying trash barrels this morning,” “To select a 4 (“Severe—almost constant, present even when resting) instead of 3, I'd have to be on oxygen.” Treatment-related improvement was also suggested by the following statement, “To select a 2 instead of a 3 I'd be taking less medication.” When asked specifically what change would they consider meaningful, all felt that any improvement in symptoms would be beneficial. All patients also reported that they would be very pleased with a treatment that would allow them to improve 1 point, although some questioned whether any treatment would allow them to progress to “mild” (in several cases this would be a change of 1 point). Clearly, for individual patients, a 1-point score change would be considered dramatic.

Comments concerning deterioration were different from those associated with improvement. All of the 11 patients interviewed indicated they would wait until reaching level 3 (“marked” dyspnea) before changing treatment at home; 4 patients would reach a 4 before changing treatment at home. Nine patients would wait until reaching level 4 (“severe” dyspnea) before notifying a clinician. These qualitative data suggest guidelines for interpreting deterioration in scores may be different from those for interpreting score improvement with treatment, and point out an important area for further research.

Discussion

This paper describes a process of triangulation to derive guidelines for the clinical interpretation of group level change scores for health indices. A case example has been presented, using a new measure of dyspnea, the Breathlessness Diary, and data from several large clinical trials. Statistical, clinical, and qualitative data were used to show how information can converge on data points indicative of large, moderate, and small group-level mean changes on this instrument. Theses analyses, for example, indicate a group-level mean improvement greater than 0.50 on the Breathlessness Diary can be interpreted as large, clinically impressive improvements in dyspnea. Concomitant clinical changes in pulmonary function and health status together with the large percentage improvement and statistical effect size undergird this interpretation. Mean improvements in Breathlessness Diary scores of approximately 0.35 are moderate and clinically significant, based on the accompanying improvements in patient health status together with the large percentage change and statistical effect sizes. These moderate values approximate statistical estimates of the MCID using the ½ SD and SEM approaches and point out an important limitation to using statistical methods in isolation to arrive at a minimal “clinically” important difference. The clinical data not only clearlyshowed that mean change values of approximately 0.35 are clinically moderate rather than minimal effects, but also suggest that the MCID is below these mean differences.

These triangulation analyses suggested that group-level mean values of approximately 0.20 on the Breathlessness Diary may be a small, but clinically meaningful group change. Although associated mean improvements in overall HRQL were smaller than guidelines for clinical significance with no FEV1 improvement, the percent change and effect sizes were not inconsistent with changes that a clinician and statistician, respectively, might find meaningful. This value was also consistent with between-group differences in dyspnea between patients with moderately severe and severe disease, and between severe and very severe disease using the ATS criteria, although the fact that these are cross-sectional differences rather than change over time should be kept in mind. Whether or not a value of approximately 0.20 is the “minimal” value that can be considered significant requires further research. A logical next step would be to run additional analyses using different categorical cut-points, such as patient perception of efficacy, the convergence of physician and patient perception of efficacy, SGRQ 4- and 8-point changes, and FEV1 changes of 100 mls or more. These results would be compared with results presented above to evaluate consistency. With the additional analyses, the final, proposed guidelines should present values and confidence intervals for interpreting the Breathlessness Diary in terms of large, moderate, and small clinically meaningful changes under clinical trial situations in moderate to severe patients with COPD. This information would be useful for interpreting results using this new measure and help inform various types of decision making.

It is important to note that the purpose of estimating a minimal clinically important difference for any health outcome should never be to arrive at a single threshold or cut-off value for mean differences between treatment groups upon which to base “go–no go” decisions at any level—clinical, corporate, or regulatory Citation[[27]]. Decisions should always be based on a careful weighing of all data involved, including the risks of treatment and the number of patients who stand to benefit. In addition, assuming there is one set of guidelines, or worse yet, one threshold, for an outcome that can be apply equally across all sub-populations and in all study circumstances misrepresents the intent of guideline development. The use of ranges rather than set points will serve the purpose of adding clinical meaning to trial data without unnecessarily negating treatment effects because a mean difference did not meet a target threshold.

Conclusion

The derivation of guidelines for the clinical interpretation of group-level change data on health metrics require both clinically based analyses and statistical methods. Either approach alone is likely to yield an inaccurate representation of the meaning associated with score changes. Triangulation methodology involves the synthesis of clinical, statistical, and qualitative data to arrive at clinically relevant and statistically sound guidelines for interpretation. This paper offers an example of how this methodology can be applied, using the Breathlessness Diary, a new instrument designed to quantify dyspnea in patients with COPD. The discrepancies found between statistical and clinical indicators of magnitude highlight the importance of using multiple methods upon which to develop interpretive guidelines.

Notes

aAmerican Thoracic Society Guidelines based on FEV1, % predicted; circled numbers highlight difference in Breathlessness Diary Score between groups.

bCircled numbers highlight difference in Breathlessness Diary Score between groups.

REFERENCES

  • Jaeschke R, Singer J, Guyatt G. Measurement of health status: ascertaining the minimal clinically important difference. Control Clin Trials 1989; 10:407–415. [PUBMED], [INFOTRIEVE]
  • Guyatt G, Berman L, Townsend M, Pugsley S, Chambers L. A measure of quality of life for clinical trials in chronic lung disease. Thorax 1987; 42:773–778. [PUBMED], [INFOTRIEVE]
  • Guyatt G, Nogradi S, Halcrow S, Singer J, Sullivan M, Fallen E. Development and testing of a new measure of health status for clinical trials in heart failure. J Gen Intern Med 1989; 4:101–107. [PUBMED], [INFOTRIEVE]
  • Norman G, Stratford P, Regehr G. Methodological problems in the retrospective computation of responsiveness to change: the lessons of Cronbach. J Clin Epidemiol 1997; 50(8):869–879. [PUBMED], [INFOTRIEVE], [CSA], [CROSSREF]
  • Guyatt G, Norman G, Juniper E, Griffith L. A critical look at transition ratings. J Clin Epidemiol 2002; 55:900–908. [PUBMED], [INFOTRIEVE], [CSA], [CROSSREF]
  • Norman G, Sloan J, Wyrwich K. Interpretation of changes in health-related quality of life: the remarkable universality of half a standard deviation. Med Care 2003; 41:582–592. [PUBMED], [INFOTRIEVE], [CROSSREF]
  • Wyrwich K, Tierney W, Wolinsky F. Further evidence supporting a SEM-based criterion for identifying meaningful intra-individual changes in health-related quality of life. J Clin Epidemiol 1999; 52(9):861–873. [PUBMED], [INFOTRIEVE], [CSA], [CROSSREF]
  • Brook R, Chassin M, Fink A, Solomon D, Kosecoff J, Park R. A method for the detailed assessment of the appropriateness of medical technologies. Int J Technol Assess Health Care 1986; 2:53–63. [PUBMED], [INFOTRIEVE]
  • Wyrwich K, Fihn S, Tierney W, Kroenke K, Babu A, Wolinsky F. Clinically important differences in health-related quality of life for patients with chronic obstructive pulmonary disease: an expert panel report. J Gen Intern Med 2003; 18(3):196–202. [PUBMED], [INFOTRIEVE], [CSA], [CROSSREF]
  • Wyrwich K, Nelson H, Tierney W, Kroenke K, Babu A, Wolinsky F. Clinically important differences in health-related quality of life for patients with asthma: an expert panel report. J Asthma, Allergy Immunol 2003; 91(2):148–153. [CSA]
  • Wyrwich K, Spertus J, Kroenke K, Tierney W, Babu A, Wolinsky F. Clinically important differences in health status for patients with heart disease: an expert consensus panel report. Am Heart J 2004; 147(4):615–622. [PUBMED], [INFOTRIEVE], [CROSSREF]
  • Leidy N, Rennard S, Schmier J, Jones M, Goldman M. The Breathlessness, Cough, and Sputum Scale: the development of empirically based guidelines for interpretation. Chest 2003; 124(6):2182–2191. [PUBMED], [INFOTRIEVE], [CROSSREF]
  • Leidy K, Schmier J, Jones M, Lloyd J, Rocchiccioli K. Evaluating symptoms in chronic obstructive pulmonary disease: validation of the Breathlessness, Cough and Sputum Scale. Respir Med 2003; 97(suppl A):S59–S70. [PUBMED], [INFOTRIEVE]
  • Celli B, Hapin D, Hepburn R, Byrne N, Keating E T, Goldman M. Symptoms are an important outcome in chronic obstructive pulmonary disease clinical trials: results of a 3-month comparative study using the Breathlessness, Cough and Sputum Scale (BCSS). Respir Med 2003; 97(suppl A):S35–S43. [PUBMED], [INFOTRIEVE]
  • Laursen L C, Lindqvist A, Hepburn T, Lloyd J, Perrett J, Sanders N, Rocchiccioli K. The role of the novel D2/β2-agonist,Viozan (sibenadet HCl), in the treatment of symptoms of chronic obstructive pulmonary disease: results of a large-scale clinical investigation. Respir Med 2003; 97(suppl A):S23–S33. [PUBMED], [INFOTRIEVE]
  • Rennard S. Introduction: the symposium that never occurred: pre-clinical and clinical development of sibenadet. Respir Med 2003; 97(suppl A):S1–S2. [PUBMED], [INFOTRIEVE]
  • American Thoracic Society. Lung function testing: selection of refence values and interpretive strategies. Am Rev Respir Dis 1991; 144:1202–1218.
  • Siafakas N M, Vermeire P, Pride N B, Paoletti P, Gibson J, Howard P, Yernault J C, Decramer M, Higenbottam T, Postma D S, on behalf of the Task Force. Optimal assessment and management of chronic obstructive pulmonary disease (COPD). The European Respiratory Society Task Force. Eur Respir J 1995; 8(8):1398–1420. [PUBMED], [INFOTRIEVE], [CROSSREF]
  • Jones P W, Quirk F H, Baveystock C M, Littlejohns P. A self-complete measure of health status for chronic airflow limitation. The St. George's Respiratory Questionnaire. Am Rev Respir Dis 1992; 145(6):1321–1327. [PUBMED], [INFOTRIEVE]
  • Jones P W. Interpreting thresholds for a clinically significant change in health status in asthma and COPD. Eur Respir J 2002; 19(3):398–404. [PUBMED], [INFOTRIEVE], [CROSSREF]
  • Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillside, NJ: Lawrence Erlbaum Associates, 1988.
  • Kazis L E, Anderson J J, Meenan R F. Effect sizes for interpreting changes in health status. Med Care 198; 927(3 suppl):S178–S189.
  • Miller G. The magic number seven plus or minus two: some limits on our capacity for processing information. Psychol Rev 1956; 63:81–97. [PUBMED], [INFOTRIEVE]
  • Wyrwich K, Nienaber N, Tierney W, Wolinsky F. Linking clinical relevance and statistical significance in evaluating intra-individual changes in health-related quality of life. Med Care 1999; 37(4):469–478. [PUBMED], [INFOTRIEVE], [CROSSREF]
  • Wyrwich K, Tierney W, Wolinsky F. Using the standard error of measurement to identify important intra-individual change on the Asthma Quality of Life Questionnaire. Qual Life Res 2002; 11(1):1–7. [PUBMED], [INFOTRIEVE], [CSA], [CROSSREF]
  • Cella D, Eton D T, Fairclough D L, Bonomi P, Heyes A E, Silberman C, Wolf M K, Johnson D H. What is clinically meaningful change (CMC) on the Functional Assessment of Cancer Therapy—Lung (FACT-L) questionnaire. An analysis of data from ECOG 5592. J Clin Epidemiol 2002; 55:286–295. [CROSSREF]
  • Leidy N. Interpreting health-related quality of life outcomes. Appl Clin Trials 2000; 9(9):26.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.