956
Views
66
CrossRef citations to date
0
Altmetric
Invited Symposium

The Minimally Clinically Important Difference in Generic Utility-Based Measures

, Ph.D.
Pages 91-97 | Published online: 24 Aug 2009

Abstract

Purpose: To evaluate the use of utility-based generic quality of life measures for establishing the minimally clinically important difference (MCID). Background: Utility-based quality of life measures place levels of wellness on a continuum anchored by death (0.00) and optimum function (1.00). Preference measurement studies are used to define the meaning of points along the continuum. Health states that differ by less than 0.03 units cannot be discriminated by panels of judges as different from one another. Thus, 0.03 is a reasonable MCID for these measures. Method: Three published studies of patients with Chronic Obstructive Pulmonary Disease (COPD) reported data on the Quality of Well-being Scale (QWB) before and after pulmonary rehabilitation. One of the studies also randomly assigned patients to lung volume reduction surgery or to maximal medical therapy. These patients were followed for an average of 29 months. Results: All three evaluations of pulmonary rehabilitation showed changes on the QWB in excess of the proposed 0.03 MCID. QWB changes for patients assigned to lung volume reduction surgery were close to the MCID threshold at one year but grew stronger in subsequent years. Using Norman's 0.50 standard deviation method, all three estimates of rehabilitation effectiveness and the outcomes one year following surgery fall below the MCID. Conclusion: Different methods for estimating MCID lead to different conclusions about the meaning of quality of life changes following pulmonary rehabilitation and lung volume reduction surgery. The preference scaling system in generic utility-based quality of life measures provides a metric that is directly interpretable and avoids many of the criticisms of MCID measures. The method is sensitive enough to suggest clinically meaningful benefits of rehabilitation and surgery. Further, quality adjusted life years offer a valuable metric for policy analysis. Utility-based measures of health related quality of life should gain greater use in COPD outcomes research.

Introduction

Clinical medicine has access to a remarkable array of measures. Patients with chronic obstructive pulmonary disease (COPD) are likely to be monitored using measures of arterial blood gases, diffusing capacities, and pulmonary function tests. Although these measures are crucial in charting the clinical course of disease, they are often poorly correlated with outcomes of importance to the patient Citation[[1]]. Increasingly common are measures of exercise capacity, endurance, and quality of life. Although these measures are directly relevant to the daily lives of patients, the scores yielded by the measures are often difficult to interpret. Estimating the clinical importance of differences on functional and quality of life measurements has been elusive Citation[[2]].

The minimally clinically important difference (MCID) has emerged as an important guide to the interpretation of clinical measures. MCID is defined as the smallest change or difference in an outcome measure that is perceived as beneficial Citation[[3]]. Scaling MCID has proven to be challenging and confusing Citation[4-6]. However, discussions of the methodological challenges of finding MCIDs, however overlook thepotential of utility-based generic measures of health related quality of life (HRQOL). These measures are constructed and scaled using units of patient preference. The interpretation of the measures is straightforward and the application of these approaches avoids many of the methodological complications and controversies.

The purpose of this paper is to propose the use of utility-based measures for outcome studies in COPD and to illustrate why the measures produce outcomes that are easily interpreted in units that are meaningful to patients, providers, and policy makers. We begin by introducing the concept of a quality adjusted life year (QALY). Then, the pitfalls in other approaches to the MCID will be discussed. Several lines of evidence will be used to illustrate the benefits of QALYs as outcome measures for studies in COPD.

The Quality-Adjusted Life Year

Traditional measures of health outcome are very general. They include life expectancy, infant mortality, and disability days. The difficulty with these indicators is that they do not reflect most of the benefits of health care. For example, life expectancy and infant mortality are good measures because they allow for comparisons between programs with different specific objectives. In addition, a year of life has clear intuitive meaning. There is little debate about the value of a program that extends the life expectancy by 10 years or a program that significantly reduces infant mortality. The difficulty is that neither life expectance nor infant mortality is sensitive to minor variations in health status. Treatment of most common illnesses may have relatively little effect on life expectancy. Infant mortality, although sensitive to socioeconomic variations, does not register the effect of health services delivered to people who are older than one year.

Survival analysis is an attractive generic measure of health status. Survival analysis gives a unit of credit for each year of survival. Suppose, for example, that a person has a life expectancy of eighty years and dies prematurely at age fifty. In survival analysis, they are scored as 1.0 for each of the first fifty years and zero each year there after. The problem is that years with disability are scored the same as those years in perfect health. For example, a person with severe COPD who is alive is scored exactly the same as someone in perfect health. To address this problem, we have proposed adjusted survival analysis. Using this method, we can summarize outcomes in terms of QALY's. In quality adjusted survival analysis, years of wellness are scored on a continuum ranging from 0 for death to 1.0 for optimum function Citation[[7]].

QALYs are measures of life expectancy with adjustments for quality of life Citation[7-9]. QALYs integrate mortality and morbidity to express health status in terms of equivalents of well-years of life. If a woman dies of COPD at age 50 and one would have expected her to live to age 75, the disease was associated with 25 lost life years. If 100 women died at age 50 (who also had a life expectancies of 75 years) 2,500 (100 women × 25 years) life years would be lost.

Death is not the only outcome of concern in COPD. Many adults suffer from the disease leaving them somewhat disabled over long periods of time. Although still alive, the quality of their lives has diminished. QALYs take into consideration the quality of life consequences of these illnesses. For example, a disease that reduces quality of life by one half will take away 0.5 QALYs over the course of one year. If it affects two people, it will take away 1 year (equal 2 × 0.5) over a one year period. A pharmaceutical treatment that improves quality of life by 0.2 for each of five individuals will result in the equivalent of one QALY if the benefit is maintained over a one-year period. The basic assumption is that two years scored as 0.5 add up to the equivalent of one year of complete wellness. Similarly, four years scored as 0.25 are equivalent to one completely well year of life. A treatment that boosts a patient's health from 0.5 to 0.75 produces the equivalent of 0.25 QALYs. If applied to four individuals and the duration of the treatment effect is one year, the effect of the treatment would be equivalent to one completely well year of life. This system has the advantage of considering both benefits and side-effects of programs in terms of the common QALY units.

An important aspect of these methodologies is the placement of health states on the continuum between death and wellness. This is illustrated with reference to the Quality of Well-being Scale (QWB-SA 1.04), a self-administered version of a previously validated and widely used measure of preference-based general health status Citation[[10]]. The QWB is a comprehensive measure of health-related quality of life that includes several components. First, it obtains observable levels of functioning from three separate scales: Mobility, Physical Activity, and Social Activity. Second, each patient identifies symptoms or problems that may have affected him or her over the past three days from a list of 58 items. Second the observed level of function and the subjective symptomatic complaint are weighted by preference, or the utility for the state, on a scale ranging from 0 (for dead) to 1.0 (for optimum function). The weights have been obtained from independent samples of judges who rated the desirability of observable health states. Several studies have shown that the weights do not vary significantly as a function of demographic variables, including race, income, and gender Citation[[11]]. Although the evidence is mixed, most evidence indicates that the weights do not vary dramatically as a function of prior experience with the rated health state Citation[[11]]. Using this system, it is possible to place the general health status of any individual on the continuum between death and optimal functioning for any specified time. Thus, the score for the QWB results in a summary score, a single number, which ranges from 0 (death) to 1.0 (optimal health).

Current Knowledge Relevant to COPD

Guyatt Citation[[12]] and Deyo et al. Citation[[13]] argue that measures must be evaluated in terms of their responsiveness. Responsiveness is defined as the ability of a measure to detect small but important clinical changes. The term sensitivity is also used asa synonym for responsiveness. However, we prefer the term responsiveness because its meaning is distinct from other meanings in the epidemiological literature. The MCID, which may be larger than a statistically detectable change between populations, has not been established or even evaluated for most HRQOL measures.

Guyatt and et al. Citation[[12]] suggested that the responsiveness of generic measures is related to treatment effect size. Responsiveness can be evaluated using three different methods: effect size (ES), standardized response mean (SRM), and the responsiveness statistic (RS) Citation[[14]]. All three indices characterize strength of change in response to random error. In each case, the numerator is the mean change. The methods differ in the denominator. The ES method uses the standard deviation at baseline, the SRM method uses the standard deviation of change from baseline, and the RS uses the standard deviation of change only for those people deemed not to show clinical response. The magnitude of change in response to the clinical intervention can be evaluated using guidelines originally proposed by Cohen Citation[[15]]. An effect size of 0.20 is considered a small clinical effect, 0.50 is a medium clinical effect, while 0.80 or larger is considered to be a large clinical effect size. Studies typically show that the effect size is smaller for generic measures in comparison to disease-targeted measures.

In contrast to statistical approaches, such as the standard error of measurement or the half standard deviation Citation[[16]], several authors propose that clinical changes be scaled for importance using human judgments. Wells et al. argued that the MCID should be scaled using the clinical judgments of patients Citation[[6]]. Wyrich used a nine person expert panel to suggest meaningful differences on the chronic respiratory questionnaire (CRQ) and the Medical Outcome Study 36 Item Short Form (SF-36) Citation[[17]]. Guyatt et al. made the distinction between population-focused and individual-focused measures Citation[[4]]. Population-focused approaches define the response of an individual in relation to an entire population. For example, a patient with a decline in FEV1.0 from one liter to 70 ml might have a significant probability of QOL loss. Individual-focused approaches require judgments about the smallest changes that patients perceive to be important. These approaches might also consider the proportion of patients who have achieved this MCID.

The poor reliability of clinical judgments has been challenged in a variety of ways Citation[[18]]. There are obvious biases depending on perspective, there are measurement problems, and the conclusions are often based on small samples of judges. Further, the meanings of changes in a particular disease domain are often difficult to interpret in the context of overall health. Policy makers, for example, struggle with the comparison of an MCID for patients with COPD in relation to an MCID for patients with arthritis, heart failure, or other significant chronic illness.

The following section proposes that the QALY be used to define the MCID. This has several advantages. First, QALYs scale clinical changes on a well-defined continuum ranging from 0 for death to 1.0 for full functioning without symptoms. Second, QALYs are constructed using community-based preference studies. Thus, problems of small samples, biased participants and non-representative judges are avoided. A third advantage of QALYs is that they can be readily used for public policy analysis.

Development of MCID for QWB

The original studies that validated the QWB submitted hundreds of case descriptions to judges in order to find values for health states along the 0.0 to 1.0 continuum Citation[[19]]. These studies demonstrated that cases with values closer together than 0.03 units on the 0.0 to 1.0 continuum could not be reliably rated as “different” by the judges. Thus, we believe a difference must be at least 0.03 units on this scale to be clinically meaningful and that the MCID for the QWB is about 0.03 units Citation[[20]]. The remainder of this paper offers evidence that the 0.03 difference on the QWB is a meaningful MCID for clinical and policy studies. We illustrate the point using data from three published studies of COPD patients.

Method

Measure

The self-administered version of the Quality of Well-Being Scale (QWB-SA) is a comprehensive measure of health-related quality of life that includes five sections: acute symptoms, chronic symptoms, self care, mobility, and social activity Citation[[1]]Citation[[10]]Citation[[19]]Citation[[21]]. The observed level of function and the subjective symptomatic complaints are weighted by preference, or the utility for the state, on a scale ranging from 0 (for dead) to 1.0 (for optimum function. The QWB-SA has been used in a wide variety of clinical and population studies Citation[21-24] to evaluate therapeutic interventions in a range of medical and surgical conditions.

Subjects

We report pre and post rehabilitation QWB scores from three published clinical trials. The first trial, reported by Ries and colleagues Citation[[25]], included 119 COPD patients participating in a randomized clinical trial of rehabilitation in COPD. The rehabilitation program consisted of 12 four hour sessions spread over eight weeks. The program emphasized education, physical and respiratory care instruction, psychosocial support, and supervised exercise.

The second trial, also from the Ries group, evaluated 164 female moderate to severe COPD patients participating in a randomized clinical trial of maintenance of rehabilitation benefits Citation[[26]]. The rehabilitation program was essentially the same as in the first Ries et al. study. However all patients participated in rehabilitation and the randomization occurred after pulmonary rehabilitation. We used QWB scores taken before and after the rehabilitation phase.

The third trial was the National Emphysema Treatment Trial (NETT) Citation[[27]]. The NETT trial included 1,218 participants. For the analyses reported here, we deleted three cases with missing QOL data, leaving 1,215 cases for analysis. Participants came from all 17 NETT sites. A more detailed description of the NETT methodology is available. Following screening all participants completed comprehensive pulmonary rehabilitation. A second assessment was completed no more than 21 days prior to randomization. The NETT rehabilitation program was similar to that described by Ries and colleagues Citation[[26]]. In this paper we focus on previously reported pre-post rehabilitation change.

Surgery Effect (NETT)

The major intervention evaluated in the NETT was Lung Volume Reduction Surgery (LVRS). Patients were randomly assigned to either maximal medical therapy or to LVRS. Those assigned to LVRS received the procedure using one of two approaches: video assisted thoracoscopy (VATS) and median sternotomy. Details of the study and its outcomes have been published Citation[[28]].

Results

summarizes the changes on the QOL measures following the rehabilitation phase for the three studies. The table gives the pre-rehabilitation score, the post-rehabilitation score, change scores (pre post), standard deviation for change, and one-half standard deviation values. The NETT study used the QWB-SA while the two Ries studies used the original QWB. QWB and QWB-SA are highly correlated but mean QWB-SA scores are typically 0.10 lower than QWB scores Citation[[29]]. The change in QWB across the three studies was remarkably similar (range − 0.031 to − 0.034). This suggests that, on average, participants improved about 0.03 units on the 0 to 1.0 scales. The final column in gives one-half standard deviation change, which is the value for the MCID recommended by Norman Citation[[16]]. For the QWB, Norman's proposed MCID ranged from 0.058 to 0.079. Using Norman's criterion, the effects of rehabilitation, although statistically significant, fall below the MCID for each of the three studies.

Table 1.  QWB Before and After Rehabilitation in Three Studies.

The QWB methodology uses ratings of health states by independent judges to estimate levels of wellness or severity of disease impact. In a series of studies we have observed the median standard deviation for ratings for case descriptions to be 0.201 Citation[[11]]Citation[[30]]. Assuming this constant standard deviation, we estimated the MCID using three methods: 0.5 standard deviation for change, standard error of the mean, and 95% confidence interval. In this analysis, we used the pre-post rehabilitation change in the NETT trial as the standard deviation of change and we did not vary the estimate by sample size. We use the NETT in this example because it is the largest and best controlled among the three studies. As a result, we used a constant estimate of the Standard Deviation of 0.117 and we repeat in the MCID of 0.058. The MCID as estimated by the standard error of the mean and confidence interval methods are also shown in . Using the confidence interval method, the MCID for the utility-based QWB would be estimated to be less than 0.01 if the sample size was greater than 2,000. Although studies vary in sample size, we picked a base case with a sample size of 200. This yields a confidence interval based MCID of 0.0278. For these discussions, we rounded that number up to 0.03.

Table 2.  MCID for Preferences By Method for Various Sample Sizes.

The NETT trial also combined morbidity and mortality outcomes using the QWB measure. In May of 2001, enrollment in the NETT was halted for a subgroup that the Data Safety and Monitoring Board (DSMB) determined to be at high risk Citation[[31]]. Patients with FEV1.0 levels equal to or less than 20 percent of predicted and either a homogeneous emphysema distribution or less than or equal to 20 percent of the predicted carbon monoxide diffusing capacity were excluded from enrollment. summarizes characteristics of the remaining 1078 lung reduction surgery patients at baseline.Of particular importance is that the QWB scores were almost identical. summarizes the cumulative mean QALYs per person in the surgery and the medical arms of the trial at one, two, and three years of follow-up. At year one, the mean difference is right on the borderline of the 0.03 criterion for an MCID with the QWB. It is noteworthy that the benefit of surgery is less than the observed benefit of rehabilitation. However, rehabilitation was not evaluated in the RCT component of the study. As time from randomization accumulates, differences between the LVRS and medical arm grow larger. Some of this gain is due to differences in mortality while the other component is associated with improvements in quality of life.

Table 3.  Baseline Characteristics of Non-High-Risk Patients After Rehabilitation and Prior to Randomization [Data from Supplementary Appendix 3 Ref. Citation[[28]]].

Table 4.  Cumulative Mean QALYs Per Person (Based on Ramsey et al., 2003).

Discussion

Interpretation and Recommendations

Defining a meaningful clinical difference for patients with COPD has been difficult and elusive. A variety of authors have proposed statistical approaches, such as the standard error of measurement or one-half standard deviation of a change score Citation[[16]]. Others have argued that judgment about the meaning of clinical differences must be assessed from patients or from clinicians Citation[[4]]Citation[[6]]. All of these approaches have encountered significant methodological challenges.

One approach that has not been widely advocated requires the use of utility-based measures of health-related quality of life. These measures are widely advocated for policy analysis and are in common use in clinical research settings Citation[[8]]. An important feature of these methods is that they scale health outcomes on a continuum ranging from death (0.0) through full functioning without symptoms (1.0). Further, the preferences are typically obtained from representative samples of the general population. A component of scale construction considers the smallest difference in health states that can be judged as distinct by human observers. For the Quality of Well-being Scale, this difference is about 0.03 units. Thus, 0.03 units along the continuum between death and optimum function might be considered the MCID. There have been only a few other attempts to estimate MCID for utility-based measures. Walters and Brazier Citation[[32]] estimated MCID anchored by SF-36 global rating scores for a variety of medical conditions. For COPD patients, they estimated that the MCID for the EQ-5D was 0.011 while the MCID for the SF-6D was 0.37. Although many different conditions were considered, the MCID for the SF-6D ranged from 0.011 to 0.097, with a mean of 0.043. Thus, the median MCID is similar to our estimate MCID of 0.03 for the QWB. The analyses summarized in this paper suggest that changes following pulmonary rehabilitation exceed the MCID in three separate evaluations when using the proposed threshold. However, using other MCID methods, each study would fail to meet the MCID criterion. The authors believe the preference based MCID is both clinically justifiable and sensitive.

The generic nature of the QWB and related scales offers many advantages for clinical research. First, these measures offer a meaningful metric that is anchored at one end by death and at the other end by wellness. The difference of 0.03 units is a small but significant difference along this continuum. A second advantage is that the measures scale outcomes in relation to death. In fact, mortality is combined with morbidity in the calculation of the index. This offers a distinct advantage in clinical trials that have mixtures of mortality and non-mortality outcomes. The third advantage is that utility-based outcome measures can be used to estimate the benefits of interventions in terms of equivalents of life years. A patient would need to achieve an MCID of 0.03 units for about 33 years for the treatment to produce a benefit equal to a full year of healthy life. Thirty-three patients who achieved the MCID benefit over the course of one year would collectively experience the benefit of one full year of life.

In addition to these features, QALYs offer significant advantages for public policy analysis. Since outcomes are expressed in generic meaningful units, comparisons can be made between very different investments in health care. Typically, these analyses consider costs in addition to QALY benefits. summarizes the cost per QALY for a variety of interventions. Coronary artery bypass graft surgery in comparison to medication, for example, produces a QALY at a relatively low cost. LVRS on the other hand, produces QALYs but does so at a cost higher than other interventions.

Table 5.  Comparison of Cost/QALY for Different Programs in COPD (2002 Dollars).

In criticizing disease-specific approaches, many investigators believe that all diseases and disabilities affect overall quality of life by affecting functioning and self-perceptions of health status. In fact, the purpose of quality of life measurement is not to identify clinical information limited to the disease. Instead, it seeks to determine the impact of the disease on general function. For example, a low FEV1.0 may be associated with shortness of breath, weakness, and increased risk of mortality. Medications used to control COPD might cause headaches, irritability, and general confusion Citation[[33]]. By focusing too specifically on clinical correlates of disease, it is argued, the general impact is overlooked. Even though some disease-targeted measures cover a fairly wide range of domains and include items that are important to patients, the measures might still fail to capture unanticipated effects of the disease or its treatment. Conversely, general quality of life measures adequately capture a wide variety of dysfunctions associated with pulmonary diseases. This dysfunction might be in many different systems and be reflected in symptoms such as confusion, tiredness, sexual impotence, and depression, or others. These outcomes may not be specific to the disease or condition. Thus, the question of whether disease specific questionnaires are required for each condition remains open Citation[34&35].

The method for estimating the MCID proposed here is fundamentally different than methods proposed by other authors. As Norman (post conference correspondence) argues, placing two health states side by side and asking judges to estimate whether they are different from one another is a fundamentally different task than asking patients or clinicians how much clinical change has occurred. Indeed, it is likely that the side-by-side comparisons will yield smaller MCIDs. That was the case with data we presented here. Since there is no standard methodology for estimating MCID, the future standard must emerge from continuing research and debate.

In summary, QALYs may offer a neglected opportunity to estimate the clinical value of interventions. Estimating the MCID from QALYs can be derived directly from the methodology. We encourage greater use of these methods for outcomes research involving COPD patients.

REFERENCES

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.