78
Views
6
CrossRef citations to date
0
Altmetric
Original Research

Estimation of utility values from visual analog scale measures of health in patients undergoing cardiac surgery

, &
Pages 21-27 | Published online: 10 Jan 2014

Abstract

Introduction

In health economic evaluations, mapping can be used to estimate utility values from other health outcomes in order to calculate quality adjusted life-years. Currently, no methods exist to map visual analog scale (VAS) scores to utility values. This study aimed to develop and propose a statistical algorithm for mapping five dimensions of health, measured on VASs, to utility scores in patients suffering from cardiovascular disease.

Methods

Patients undergoing coronary artery bypass grafting at Aalborg University Hospital in Denmark were asked to score their health using the five VAS items (mobility, self-care, ability to perform usual activities, pain, and presence of anxiety or depression) and the EuroQol 5 Dimensions questionnaire. Regression analysis was used to estimate four mapping models from patients’ age, sex, and the self-reported VAS scores. Prediction errors were compared between mapping models and on subsets of the observed utility scores. Agreement between predicted and observed values was assessed using Bland–Altman plots.

Results

Random effects generalized least squares (GLS) regression yielded the best results when quadratic terms of VAS scores were included. Mapping models fitted using the Tobit model and censored least absolute deviation regression did not appear superior to GLS regression. The mapping models were able to explain approximately 63%–65% of the variation in the observed utility scores. The mean absolute error of predictions increased as the observed utility values decreased.

Conclusion

We concluded that it was possible to predict utility scores from VAS scores of the five dimensions of health used in the EuroQol questionnaires. However, the use of the mapping model may be inappropriate in more severe conditions.

Introduction

In health economic evaluations, the recommended measure of health effects is quality adjusted life-years, which enables the comparison of interventions across disease areas.Citation1,Citation2 However, clinical trials are frequently initiated without including questionnaires measuring preference-based health-related quality of life. Instead, nonpreference-based measures of health are often utilized and this renders it difficult to estimate health state utility values. One solution that is gaining popularity is prediction of utility values from the nonpreference-based measures of health. This is frequently called mapping and the technique requires an algorithm based on the statistical association between the tools.Citation3 Mapping techniques have been applied in more than a quarter of technology appraisals submitted to the National Institute of Clinical Excellence.Citation4 Most mapping techniques predict the utility values from disease-specific questionnaires.Citation3 However, sometimes health effects are merely measured using a visual analog scale (VAS) instead of disease-specific questionnaires. The VAS is one of the most frequently used methods for assessing pain intensity,Citation5,Citation6 and has also been applied in the assessment of depression, anxiety, and mobility.Citation7,Citation8 Currently no method has been developed for predicting utility scores from such VAS scores.

This study aimed to develop a mapping model to predict a single utility score from five VAS scores rating patients’ mobility, self-care, ability to perform usual activities, pain, and anxiety and depression. Such a model may be applied to map the partial effect of reducing patients’ pain measured on a VAS to utility scores, under the ceteris paribus assumption, ie, holding all other factors fixed.

We chose to estimate our mapping model by administering our questionnaire to patients undergoing coronary artery bypass grafting (CABG), for two reasons: 1) clinical trials investigating surgical interventions frequently use the VAS when assessing outcomes; and 2) patients undergoing CABG vary widely in the severity of their health conditions pre- and postoperatively. This heterogeneity in their responses makes them ideal respondents.

Methods

Patients

Data were prospectively collected between August 25, 2011 and May 25, 2013, from patients recruited from the cardiothoracic surgical ward at Aalborg University Hospital, Denmark. Eligible patients were more than 18 years of age, were able to read Danish, and had coronary artery disease requiring elective CABG. Approximately 250 CABG procedures are performed at Aalborg University Hospital every year. We divided the dataset in two, by random sampling, such that 60% of the patients were included in the estimation sample and the remaining 40% of the patients were included in the validation sample.

Questionnaires

For the purpose of developing the mapping-model, all patients were asked to complete the three level version of the EuroQol 5 Dimensions questionnaire (EQ-5D) and to rate their perceived health today on five VAS items (patients’ self-reported mobility, self-care, ability to perform usual activities, pain, and presence of anxiety or depression). Each VAS item was given an introductory title and a short statement representing “no problems” at 0 mm and a short statement representing “worst imaginable problems” at 100 mm.

Patients were asked to fill out the EQ-5D as many as three times. The first time was prior to their admission, the second time was 5 days postsurgery, and the third time was upon the arrival for their follow-up visit at the outpatient clinic. Only patients seen at the outpatient clinic at Aalborg University Hospital were asked to fill out the third questionnaire. In the analyses, each questionnaire was treated as an independent measurement in order to obtain diversity in the severity of health states.

Analyses

For all mapping models, the dependent variable was the utility score calculated using the Danish time trade-off values.Citation9 These values for the EQ-5D range from 1 to −0.624 where 1 indicates perfect health, 0 indicates death, and a value below zero is a health state perceived to be worse than death. We fitted two mapping-models using the 60% estimation sample. The first mapping model was fitted using age, sex, and the five VAS scores as explanatory variables. In the second mapping model, the squared terms of the five VAS scores were included. The squared terms were added in the second mapping model because the relationship between explanatory dimensions and utility scores may not be linear in nature.Citation10,Citation11 Although dimensions may not be additive,Citation11,Citation12 interaction terms were not considered as they may restrict the use of the mapping models to situations where all five VAS scores are measured. Excluding the interaction terms allows the mapping-models to be used in situations where only one or two of the VAS scores are measured. Both mapping models were initially fitted using random intercepts generalized least squares (GLS) models. Least squared estimation was chosen because of its straightforward interpretation and frequent use in mapping models.Citation3 The random effects part was chosen to handle the fact that some patients had multiple observations. However, if least squared estimation is used in the presence of large proportions of subjects scoring utility values of 1, the bounded nature of the utility value may result in implausible predictions outside of the existing range of the scale.Citation3,Citation13 In such situations, researchers have proposed using the Tobit model or censored least absolute deviation (CLAD) regression methods.Citation3,Citation14 If the proportion of patients at the upper ceiling is small the marginal coefficients from the random effects GLS should suffice.Citation15 Nevertheless, to accommodate the possibility that the observed ceiling effect might reduce the performance of our random effects GLS models, random effects Tobit regression and CLAD regressions were also fitted. The random effects Tobit regression handles the bounded nature of the utility scores and may therefore be an appropriate alternative to least squares estimation.Citation11,Citation14,Citation16 However, a random effects Tobit model will produce biased results when faced with heteroscedasticity or nonnormality.Citation11,Citation16 CLAD regression will produce consistent results even if faced with heteroscedasticity or nonnormality.Citation17 As such, the CLAD regression may seem the optimal choice. However, the downside to a CLAD mapping model is that it is a median model.Citation13 Typically, economic evaluations use health valuations based on mean models. To assess the performance of the mapping models, we calculated the root mean squared error (RMSE) and mean absolute error (MAE) to compare the models.Citation3,Citation13 Errors were subsequently reported for the following subsets of observed utility values: utility <0; 0≤ utility <0.25; 0.25≤ utility <0.5; 0.5≤ utility <0.75; 0.75≤ utility ≤1.

We predicted the utility values in the validation sample using the second (full) random effects GLS mapping model to assess if the estimates were reliable. For both the estimation sample and the validation sample, Bland–Altman plots were used to assess agreement between observed and predicted values. All statistical analyses were performed in Stata version 12.1 (StataCorp LP, College Station, TX, USA). The CLAD regression was performed using user-written programs for Stata.Citation18 The questionnaires were entered once in EpiData version 3.1 (freeware product by EpiData Association, Odense, Denmark).

Results

A total of 238 patients were invited to enter the study. Of these, 24 patients declined the invitation, 16 did not meet inclusion criteria, and an additional 8 were excluded due to non-response (). Therefore, 382 questionnaires from 190 patients were analyzed in the study. Because only patients seen at the outpatient clinic at Aalborg University Hospital filled out the third questionnaire, only 55 patients answered all three questionnaires. The baseline and intraoperative characteristics of the patients are shown in . No differences in patient characteristics were observed between the estimation dataset and validation dataset.

Figure 1 Flowchart describing the inclusion of patients.

Abbreviations: n, number; CABG, coronary artery bypass grafting; EQ-5D, EuroQol 5 Dimensions questionnaire; VAS, visual analog scale.
Figure 1 Flowchart describing the inclusion of patients.

Table 1 Patients’ baseline and intraoperative characteristics

shows the five VAS items and the phrases used at both ends of the scale. The mean utility and VAS scores are reported in . The number of patients rating themselves in full health was somewhat limited, with observed utility scores ranging from −0.495 to 1 ().

Figure 2 Questionnaire used to assess health on visual analog scales.

Figure 2 Questionnaire used to assess health on visual analog scales.

Figure 3 Observed EQ-5D utility scores in the complete dataset.

Abbreviaton: EQ-5D, EuroQol 5 Dimensions questionnaire.

Figure 3 Observed EQ-5D utility scores in the complete dataset.Abbreviaton: EQ-5D, EuroQol 5 Dimensions questionnaire.

Table 2 Mean utility and VAS scores for all observations

The first mapping model included age, sex, and the five VAS scores measuring health (GLS 1 in ). Although the functional form of the explanatory variables was restricted to the additive level, the model explained approximately 63% of the variation in the EQ-5D utility score. In the second random effects GLS mapping model (GLS 2), the quadratic terms of the five VAS scores measuring health were added. Comparing the MAE and the RMSE between the two random effects GLS models, we found that the GLS 2 mapping model yielded the best fit to the estimation sample. Approximately 65% of the variation in EQ-5D utility scores was explained by the GLS 2 mapping model. The variables used in the GLS 2 model were also used to fit mapping models using random effects Tobit and CLAD regressions. The random effects Tobit mapping model was inferior to the GLS 2 mapping model, as it yielded higher MAE and RMSE values. The CLAD mapping model had a slightly lower MAE and a higher RMSE than the GLS 2 mapping-model had. In general, the mapping models performed better at higher observed EQ-5D utility scores (). In it can be seen that the reduction in performance in more severe health states is due to an over prediction of the EQ-5D utility score.

Figure 4 Mean observed and predicted EQ-5D utility scores in the validation sample.

Notes: The graph shows the agreement between the observed EQ-5D utility score and the mean of the predicted score. The observed health states are ordered on the x-axis according to their severity as valued by the Danish time trade-off tool.
Abbreviation: EQ-5D, EuroQol 5 Dimensions questionnaire.
Figure 4 Mean observed and predicted EQ-5D utility scores in the validation sample.

Table 3 Results of the estimated mapping-models

Table 4 Mean absolute error of mapping-models at subsets of observed EQ-5D utility scores

Bland–Altman plots of agreement between observed and predicted values of EQ-5D utility scores for the GLS 2 model are shown in . shows agreement in the estimation sample and shows agreement in the validation sample. In both samples, the mapping-model overestimated the EQ-5D utility scores for patients with low observed EQ-5D utility scores. The Bland–Altman 95% limits of agreement from the validation sample (−0.212, 0.240) were similar to those from the estimation sample (−0.233, 0.232). A slight bias of 0.014 was observed in the validation sample.

Figure 5 Bland–Altman plots of agreement between observed and predicted EQ-5D utility scores.

Notes: (A) Agreement in the estimation sample. (B) Agreement in the validation sample. The x-axis depicts the mean of the observed value and the predicted value, and the y-axis shows the difference (observed minus predicted). The lines show the mean difference, ie, the estimated bias, and the 95% limits of agreement (±1.96 SD of the mean difference).
Abbreviations: EQ-5D, EuroQol 5 Dimensions questionnaire; SD, standard deviation.
Figure 5 Bland–Altman plots of agreement between observed and predicted EQ-5D utility scores.

Discussion

The GLS 2 mapping-model showed promising ability to predict mean utility scores. Our findings indicate that VAS scores for mobility, self-care, usual activity, pain, and mood could be used for obtaining estimates of utility among patients with cardiovascular disease. As much as 65% of the variability in utility scores could be explained, which is quite high compared with existing mapping models.Citation3 Goldsmith et al predicted EQ-5D utility scores in a group of patients with coronary artery disease using demographic and clinical outcome variables and explained 48%–49% of the variability in utility scores and found RMSE and MAE to be 0.170 and 0.122, respectively.Citation19 Longworth et al developed a model to map EQ-5D utility scores from clinical indicators for patients with stable angina and were able to explain 37% of the variation and showed an RMSE of 0.4764 (RMSE = √Mean Squared Error =√0.227).Citation20 Our mapping-model may therefore be viewed as a reasonable method for mapping VAS scores to EQ-5D utility scores.

To illustrate the use of the GLS 2 mapping-model, imagine a comparison of a new, less-invasive surgical method with the conventional open surgical method in a health economic evaluation. The new surgical method reduced pain measured on a VAS scale by 10 mm, from 40 mm to 30 mm. For simplicity, assume that all other health outcomes are unchanged. The GLS 2 mapping-model is then applied to map the partial effect of reducing patients’ pain, under the ceteris paribus assumption, ie, holding all other factors fixed:

Utility gain=(βpain*VASnew+βpain squared*VASnew2) (βpain*VASconventional+ βpain squared*VASconventional2)Utility gain=(0.00120*300.0000069*302)(0.00120*400.0000069*402)Utility gain=(0.0422)(0.0590)=0.0168

In this example, it was estimated that a 10 mm reduction in pain measured on a VAS, from an average of 40 mm to an average of 30 mm, would increase patients’ utility with 0.0168. Because the GLS 2 mapping-model contains squared terms, the utility gain from a 10 mm reduction on a VAS of pain depends on the severity of the pain, ie, the utility gain from a reduction in pain from 40 mm to 30 mm is not the same as from 60 mm to 50 mm. If the GLS 1 mapping model had been used, a 10 mm reduction on a VAS of pain would yield the same utility-gain regardless of the initial level of pain. The uncertainty in the estimated utility gain could be modeled using the standard errors of the coefficients from the mapping model in a probabilistic sensitivity analysis.

Strengths and limitations

Because we treated all measurements as independent observations, the observed utility scores covered a wide range, especially because patients temporarily felt worse 5 days after CABG. Naturally, this approach increases the sample size somewhat artificially, and the standard errors presented in should therefore only be used to estimate approximate confidence intervals. However, by using each measurement as independent we ensured that the mapping model is valid for all stages of the illness. The wide range of VAS scores and utility scores also enabled us to assess the GLS 2 mapping model’s predictive ability at a subset of observed utility scores. This analysis showed that the reliability of our predictions declined as observed values decreased. This is a frequent limitation of mapping-models;Citation3,Citation11,Citation19 however, it implies that the mapping models presented in this study may have a limited ability to predict utility scores for more severe conditions. Users of the mapping model should therefore be cautious when applying it in populations with large numbers of patients in poorer health. In such situations, alternative methods should be considered. The poor performance of the mapping model in patients with a more severe health condition is likely caused by a combination of the following: 1) the low number of patients with a low EQ-5D utility score in our dataset; 2) our exclusion of interaction terms and other covariates to ensure the mapping model could be used if researchers only had one of the five VAS measures; or 3) a larger intersubject variation in the use of the VAS scales for more severe conditions. The latter is supported by the fact that severe pain measured on a 11-point numeric rating scale could be from seven and upwards for some patients, while severe pain measured on a 100 mm VAS could be from 35 mm and upwards for others.Citation6 In another study, the over-prediction of EQ-5D utility scores was shown to be worsened by the N3-term, which is added if severe problems were reported in at least one dimension.Citation11 However, the Danish time trade-off values for EQ-5D do not contain such a jump and therefore the N3 term cannot be contributing to the over prediction among the severely ill in our analyses.Citation9

The mapping models presented in this study were fitted using patients undergoing cardiac surgery. Cost-effectiveness analyses within this disease area have previously mapped health outcomes measured on VAS scales to utility scores.Citation21,Citation22 However, such attempts have also been made in other disease areas.Citation23 Therefore, future work might include validating the mapping model for different patient groups and assessing the performance of an independent sample.

Conclusion

We conclude that it is possible to predict utility scores from VAS scores of the five dimensions of health used in the EuroQol questionnaires. However, the predictive power decreased as observed utility scores declined. The use of the mapping model may therefore be inappropriate in more severe conditions.

Author contributions

All authors made substantial contributions to data generation and analysis, drafting or critical revision of the manuscript, and approval for the final version to be published.

Acknowledgments

The authors wish to thank Janni Stougaard Larsen, Hanne Lise Jespersen, Hanne Hoejland, Mette Stenstrop, and Birgitte Annette Wested for their assistance with the data collection.

Disclosure

The authors report no conflicts of interest in this work.

References

  • National Institute for Health and Care Excellence (NICE)Guide to the Methods of Technology Appraisal 2013London, UKNICE2013 Available from: http://www.nice.org.uk/media/D45/1E/GuideToMethodsTechnologyAppraisal2013.pdfAccessed October 12, 2013
  • DrummondMFSculpherMJTorranceGWO’BrienBJStoddartGLMethods for the Economic Evaluation of Health Care Programmes3rd edNew York, NYOxford University Press2005
  • BrazierJEYangYTsuchiyaARowenDLA review of studies mapping (or cross walking) non-preference based measures of health to generic preference-based measuresEur J Health Econ201011221522519585162
  • ToshJCLongworthLJGeorgeEUtility values in National Institute for Health and Clinical Excellence (NICE) Technology AppraisalsValue Health201114110210921211492
  • HjermstadMJFayersPMHaugenDFEuropean Palliative Care Research Collaborative (EPCRC)Studies comparing Numerical Rating Scales, Verbal Rating Scales, and Visual Analogue Scales for assessment of pain intensity in adults: a systematic literature reviewJ Pain Symptom Manage20114161073109321621130
  • WilliamsonAHoggartBPain: a review of three commonly used pain rating scalesJ Clin Nurs200514779880416000093
  • McDowellIMeasuring Health: A Guide to Rating Scales and Questionnaires3rd edNew York, NYOxford University Press2006
  • KiaiiBMoonBCMasselDA prospective randomized trial of endoscopic versus conventional harvesting of the saphenous vein in coronary artery bypass surgeryJ Thorac Cardiovasc Surg2002123220421211828277
  • Wittrup-JensenKULauridsenJGudexCPedersenKMGeneration of a Danish TTO value set for EQ-5D health statesScand J Public Health200937545946619411320
  • BrazierJUsherwoodTHarperRThomasKDeriving a preference-based single index from the UK SF-36 Health SurveyJ Clin Epidemiol19985111111511289817129
  • RowenDBrazierJRobertsJMapping SF-36 onto the EQ-5D index: how reliable is the relationship?Health Qual Life Outcomes200972719335878
  • FeenyDFurlongWTorranceGWMultiattribute and single-attribute utility functions for the health utilities index mark 3 systemMed Care200240211312811802084
  • LongworthLRowenDMapping to obtain EQ-5D utility values for use in NICE health technology assessmentsValue Health201316120221023337232
  • SullivanPWAre utilities bounded at 1.0? Implications for statistical analysis and scale developmentMed Decis Making201131678778922067428
  • PullenayegumEMTarrideJEXieFO’ReillyDCalculating utility decrements associated with an adverse event: marginal Tobit and CLAD coefficients should be used with cautionMed Decis Making201131679079922067429
  • SullivanPWGhushchyanVMapping the EQ-5D index from the SF-12: US general population preferences in a nationally representative sampleMed Decis Making200626440140916855128
  • PowellJLLeast absolute deviations estimation for the censored regression modelJ Econom1984253303325
  • JolliffeDKrushelnytskyyBSemykinaACensored least absolute deviations estimator: CLADStata J200010581316
  • GoldsmithKADyerMTBuxtonMJSharplesLDMapping of the EQ-5D index from clinical outcome measures and demographic variables in patients with coronary heart diseaseHealth Qual Life Outcomes201085420525323
  • LongworthLBuxtonMJSculpherMSmithDHEstimating utility data from clinical indicators for patients with stable anginaEur J Health Econ20056434735316193322
  • RaoCAzizODeebaSIs minimally invasive harvesting of the great saphenous vein for coronary artery bypass surgery a cost-effective technique?J Thorac Cardiovasc Surg2008135480981518374760
  • OddershedeLAndreasenJJBrockiBCEhlersLEconomic evaluation of endoscopic versus open vein harvest for coronary artery bypass graftingAnn Thorac Surg20129341174118022450070
  • OddershedeLPetersenSSKristensenAKPedersenJFReesSEEhlersLThe cost-effectiveness of venous-converted acid-base and blood gas status in pulmonary medical departmentsClinicoecon Outcomes Res201131721935326