1,345
Views
16
CrossRef citations to date
0
Altmetric
ORIGINAL ARTICLE

When do new biomarkers make economic sense?

Pages 90-95 | Published online: 01 Jun 2010

Abstract

Cost-effectiveness and cost-utility studies are commonly used to make payment decisions for new drugs and expensive interventions. Such studies are relatively rare for evaluating the cost-utility of clinical laboratory tests. As medical costs continue to increase in the setting of decreased resources it is likely that new biomarkers may increasingly be examined with respect to their economic benefits in addition to clinical utility. This will represent an additional hurdle for routine use of new biomarkers. Before reaching the final economic hurdle new biomarkers will still need to demonstrate clinical usefulness. Thus a new biomarker will never make economic sense if it is not clinically useful. Once diagnostic accuracy and potential clinical usefulness is established there are several types of economic studies that new biomarkers may undergo. The most common of these are cost-utility studies which estimate the ratio between the cost of an intervention or test and the benefit it produces in the number of years gained in full health. The quantity used most often to describe this is amount of money per quality adjusted life year (QALY) gained. The threshold for being considered cost-effective is generally USD 50,000 per QALY gained. Examples of biomarkers that have been subjected to economic analyses will be provided.

Introduction

If a biomarker is not useful for patient care and clinical decisions it will likely not make economic sense to pursue it. Therefore, the hurdles to determine if a biomarker is clinically useful will be discussed briefly before considering economic analysis of biomarkers.

Is a Biomarker clinically useful?

The process of determining the usefulness of any laboratory test, including new biomarkers, has a hierarchy of four levels of studies/evidence that must be examined before the test is adopted for clinical practice. Demonstration of adequate performance at each level in this hierarchy is necessary but not adequate to clear the next hurdle [Citation1]. First, are the technical/analytic issues such as imprecision, bias, analytical measurement interval, recovery, linearity, etc. fit for purpose? In other words, does the test work in a manner that will give reproducible and accurate results. Second, is determining the biologic factors that can affect test result interpretation relevant?. If within- and between individual variabilities are too great or if blood concentrations of the marker are very dependant on factors other than disease (e.g., diet, exercise, smoking) then a biomarker may not be useful [Citation2]. If the first two levels cannot be cleared there is clearly no economic sense in pursuing a test that is not analytically acceptable.

The next level to assess a new biomarker is diagnostic accuracy. This is the step that generally is most familiar and comfortable for laboratory medicine practitioners.” These investigations address diagnostic sensitivity/specificity, positive and negative predictive values and likelihood ratios. While the concepts and calculations for these test properties are easy to understand there are many issues with regard to study design and patient selection that can bias the results of such studies and lead to overoptimistic estimations of diagnostic performance [Citation3,Citation4]. Recent STARD guidelines that define Standards for Reporting of Diagnostic Accuracy (STARD) can assist with proper study design [Citation5]. When multiple studies reach different conclusions about the diagnostic accuracy of a marker tools such as the summary receiver operating curve can help determine the “true” diagnostic accuracy of a marker [Citation6].

The fourth hurdle for a new biomarker is clinical usefulness/utility. Questions to consider are what changes in patient management will occur from results of the test and what clinical outcomes can be expected as a result of implementation. The endpoints for “outcomes” can be as simple and easily measurable as length-of-stay [Citation7] or blood product usage [Citation8] or as complex and long-term as predicting future risk, morbidity free-survival or length of survival. Outcome studies are best addressed by randomized control trials but these are relatively rare in laboratory medicine for several reasons [Citation9]. First, randomized control trials are expensive. Second, there can be multiple steps between the results of a biomarker test and the outcome being measured. Third, the length of time between the biomarker test and the outcome may be years or even decades making conclusions about utility difficult and very expensive. Fourth, it is not always clear that interpretation of test results nor implementation of appropriate actions is consistent. Alternatives to randomized control trials have been used to support the utility of some tests, particularly those with screening and prognostic claims [Citation9]. Case-control, observational and patient cohort studies are often used to determine the clinical value of a biomarker when randomized control trials are not feasible or too expensive. To help assimilate the available evidence and make recommendations about the clinical utility of a biomarker, government agencies and professional societies have developed means to describe the quality and significance of evidence. While these hierarchies of evidence are by no means restricted to clinical laboratory tests they are relevant for making recommendations about the use of biomarkers. One example is the hierarchy of evidence established by the United States Preventive Services Task Force (USPSTF) which is an advisory group to the Agency for Healthcare Research and Quality which is a part of the United States Department of Health and Human Services which supports research designed to improve the outcomes, cost-effectiveness and quality of health care [Citation10]. The USPSTF describes three levels of research design (good, fair, poor) when making recommendations about drugs, interventions and biomarkers. Good indicates that the evidence includes consistent results from well-designed, well-conducted studies in representative populations that directly assess effects on health outcomes whereas poor indicates that the evidence is insufficient to assess the effects on health outcomes because of limited number or power of studies, important flaws in their design or conduct, gaps in the chain of evidence, or lack of information on important health outcomes [Citation10]. Using these levels of evidence the USPSTF has a series of recommendation grades ranging from A (high certainty of substantial benefit) to D (zero or negative benefit). If a biomarker does not achieve a grade of A or B it is unlikely that it will make economic sense as it is unlikely to be adopted or recommended by decision making agencies. Similar levels of evidence and recommendation grades are used in other countries by agencies that make decisions about whether the test will be paid for. One example is the National Institute for Health and Clinical Excellence (NICE), an agency of the National Health Service in the United Kingdom [Citation11].

Recommendations for the utility of a test are not static reflecting the fact that some outcomes can take many years to ascertain after initial implementation of a screening test. Two very recent, and controversial, examples of this are the changes in recommendations from the USPSTF for mammography screening for breast cancer [Citation12] and long-term outcome studies from PSA screening for prostate cancer [Citation13]. While both screening procedures increase the rate of cancer detection long-term studies now suggest that there is no improvement in mortality and that the harm caused by false positive results outweighs the gains [Citation12–14]. These are examples of screening tests that may have made economic sense at one time but this is now clouded by these recent long term outcome studies. Indeed, in the UK, NICE does not recommend routine PSA screening. What these studies raise is the need for biomarkers that indicate the aggressiveness and or prognosis of these cancers [Citation14]. Such new biomarkers may make economic sense.

Before discussing cost analysis approaches it is important to consider additional factors once diagnostic accuracy is determined to be acceptable. If a biomarker is used as a surrogate endpoint for interventions (e.g. PTH in chronic kidney disease patients, LDL in statin therapy) or to monitor treatment (e.g., HbA1c in diabetes) cost studies are not usually considered as the biomarker is part of an overall treatment plan and represent a small cost when compared to the total costs of an intervention. Similarly, if the biomarker is an indicator of whether a particular therapy will be effective (Her2-neu status and response to Herceptin in breast cancer) the cost of the biomarker is usually trivial compared to the therapeutic and cost studies may not be necessary for the test independent of studies of the actual intervention. Another biomarker indicating effectiveness of an expensive chemotherapy is the methylation status of the methylguanine-DNA methyltransferese (MGMT) gene in glioma patients. NICE has determined that treatment of glioma with the drug temozolomide is only cost-effective in patients whose MGMT methylation status predicts a positive response [Citation15]. Examples like these may not be the case for all “personalized” biomarkers linked to therapy even when the test has excellent analytic properties. The CMS has recently decided not to reimburse for genetic testing of CYP2C9 and VKORC1 status for patients receiving warfarin [Citation16]. AHRQ has determined that there is insufficient evidence to perform Factor V Leiden mutation testing to assess risk of thrombosis in patients with previous thrombotic events or even in relatives of patients known to possess the Factor V Leiden mutation [Citation17]. In both cases, the recommendations against screening were not cost related but due to insufficient evidence of improved outcomes.

If a biomarker is proposed to screen for a disease it is important to consider disease prevalence and the consequences of false positive results even when the diagnostic accuracy of the test is excellent. For instance, the sensitivity and specificity of CA 125 for ovarian cancer are close to 100% and 99%, respectively. Nevertheless CA 125 is not recommended for screening the general population [Citation18]. The prevalence of ovarian cancer is 40 in 100,000 women resulting in a positive predictive value of only 4% when specificity is 99%. Thus, for every 4 true positive tests there would be 96 false positive tests. Indeed, in an international randomized trial of CA 125 screening and transvaginal ultrasound a total of 566 surgeries were performed from a cohort of 39,000 women during the first year of screening but only 31 cancers were identified [Citation18]. When considering the economic sense of a screening biomarker disease incidence must be taken into account if there is potential harm from false positive results.

Economic analysis of biomarkers

An additional hurdle of evidence increasingly faced by many drugs or interventions is whether there is an economic benefit to their use. Cost-effectiveness has been called the fourth hurdle in healthcare following safety, efficacy and quality [Citation15]. Studies examining the economic benefit (or lack thereof) are common for expensive drugs or interventions and are being directly used to make payment decision recommendations in the UK by NICE as well as other countries with national health programs. In the United States the Center for Medicare and Medicaid Services (CMS) is currently not permitted to use cost as a basis for reimbursement decisions (except for prostate and colorectal cancer screening) [Citation19]. However, private insurers are not prevented from using cost-effectiveness to make payment decisions. As healthcare costs continue to spiral upwards at the same time resources are limited, such information will be more and more relied upon to make decisions about payment in addition to clinical utility. This will be particularly true for expensive innovative treatments but can be expected for new biomarkers as well.

Studies that examine economic benefit studies for laboratory tests are not nearly as common as those for more expensive treatments and interventions. The reasons for this are many but include some of the same reasons discussed above about the small number of randomized control trials for biomarkers. In addition, only about 2% of the healthcare budget is spent on clinical laboratory testing making the laboratory a minor target for cost saving efforts. Laboratory budgets are often independent of other healthcare costs resulting in decisions about laboratory costs being made in a vacuum [Citation20]. Nevertheless, it can be expected that more and more laboratory tests will be subjected to cost studies in future. A large registry of medical cost-effectiveness studies can be found at Tufts-New England Medical Center Cost-Effectiveness Registry which shows an exponential growth of cost-effectiveness studies but few examining biomarkers [Citation21].

Types of economic analyses [Citation20]

Cost minimization. This is the simplest category of economic evaluation and is a determination of which of two or more tests (or interventions) that will produce the same outcome is the least expensive. An example might be determining which of two tests for cardiac troponin I from two different manufacturers is the least expensive to the laboratory. Another example might be determining the cost per test for performing tests at the point of care versus the central laboratory. When implementation of a new test or procedure results in cost minimization a full cost-effectivness analysis, need not be done. Another term for this is cost-savings and it is important to note that few new tests, drugs or procedures actually result in cost savings [Citation22]. Indeed, of all studies for preventive measures and treatments in the Tufts-New England Medical Center Cost-Effectiveness Registry less than 20% result in cost savings [Citation21,Citation22].

Cost-effectiveness analysis. These studies look at the most efficient way to use a fixed amount of resources to obtain the largest effect. The effect is usually a natural unit such as a life year but can also be numbers of clinical events such as myocardial infarctions prevented, strokes prevented or thrombotic events [Citation20].

Cost-benefit analysis. Here the costs of the benefit are compared to the costs of the test and or intervention. Assigning a monetary value to the benefit is difficult and requires equating a year of life to some monetary amount.

Cost-utility analysis. This type of analysis estimates the ratio between the cost of an intervention or test and the benefit it produces in the number of years gained in full health. Cost is expressed in monetary units (dollars, Euros, etc.) and benefits are usually expressed in a manner that permits health years to be quantified even when they are less than perfect health. The quantity used most often is the quality adjusted life year (QALY). The QALY is calculated by examining the number and quality of life years gained by an intervention. Cost-utility and cost-effectiveness are similar and often used interchangeably.

QALY. An example of how QALY is calculated can be found from the NICE website [Citation23]. In this example a patient with a severe, life-threatening illness will live for one year with a quality of life of 0.4 if he receives standard treatment. If he receives a new drug he will live for 1.25 years with a quality of life of 0.6. Thus, standard treatment provides 0.4 QALY and the new treatment provides 0.75 QALY (1.25 × 0.6). The new treatment provides an additional 0.35 QALY. If the new treatment costs USD 10,000 and standard treatment costs USD 3,000 then the cost of the new treatment is USD 20,000 per QALY (USD 7,000÷0.35 = USD 20,000). The latter value is often called the incremental cost-effectiveness ratio (ICER) or the difference in costs and benefits of two different interventions. In the case of biomarkers it could be the difference in costs and benefits of two markers (e.g. one old and one new marker) or the difference between no test (e.g. not screening for a disease) and the use of a biomarker to screen for a disease.

While common, the use of QALY is controversial due to its subjective nature and the fact that perfect health is hard to define and assigning a “weight” to life between death (0) and perfect health (1) is difficult [Citation24,Citation25]. Several methods are used to assign this weight including the visual analogue scale (VAS), time-trade-off (TTO) or standard gamble (SG). In the VAS respondents are asked to rate a state of health on a scale of 0 – 100 with 100 representing perfect health. In the TTO approach respondents are asked about their preference for remaining alive in ill health for a period of time or choosing an intervention that provides perfect health but a shorter life span. The SG approach asks participants if they prefer remaining alive in ill health for a period of time or an intervention that has a chance of restoring them to perfect health or killing them. Regardless of the approach, assigning “weights” for quality of life is very subjective and will depend on the population being surveyed. Nevertheless, QALY is the most common “currency” for determining cost-effectiveness. On their website NICE states that in general an ICER of GBP 20 – 30,000 per QALY gained is the threshold for payment decision recommendations [Citation23] while in the U.S. the number most frequently mentioned is USD 50,000 per QALY gained [Citation26].

Modelling approaches

Many cost-utility studies use decision tree analysis to examine the prognosis or transitions between clinical states following implementation of drugs or interventions and following test results. Possible outcomes following the branch points in such a tree include clinical events (e.g., thrombosis), death, length of survival and quality of life. Simple decision trees do not do well at estimating a risk that is ongoing over time and when events or transitions between clinical states may occur multiple times. For this an approach called Markov modelling is increasingly used in cost-effectiveness studies [Citation27]. These studies utilize data from the literature and software to model the timing and frequency of transitions between clinical states. The ability to estimate when and how often clinical states occur more accurately reflects what happens in society than a simple decision tree. Most Markov models also incorporate discounting in estimating cost-effectiveness such that both later costs and benefits have less impact than earlier ones. For instance, if an adverse event occurs immediately it would have a greater impact on QALY than one that occurs 10 years later [Citation28]. NICE applies discounting of 3.5% per year equally to costs and effects at 3.5% [Citation28]. However, it is important to recognize that how discounting is applied can affect the estimated ICER and thus affect payment decisions. For instance, discounting costs at a rate greater than benefits will result in a lower ICER whereas discounting costs at a lesser rate will have the opposite effect. Some argue that NICE's decision to apply the discount equivalently will result in fewer interventions being paid for [Citation28].

In economic models it is also important to provide a sensitivity analysis. This is a process of changing the variables in the model to determine which have the most impact on the ICER. Examples include cost of the test or procedure, frequency of progression between clinical states, sex or age of the patients, and time between screening tests. Sensitivity analyses can help identify the patient subpopulation(s), frequency of testing and desired cost of a biomarker test that will result in the most favourable ICER.

Example cost-effectiveness studies of biomarkers

TSH Screening for Mild Thyroid Failure. Undiagnosed subclinical thyroid disease has a relatively high prevalence in women over the age of 35 (4 – 17%) [Citation29]. Early diagnosis and treatment is hypothesized to decrease secondary effects such as myxedema, hypercholesterolemia (and thus cardiovascular disease) and improve quality of life [Citation29]. Using a computerized Markov model decision tree Danes et al. examined costs of TSH testing, frequency of testing, various probabilities of transitioning between clinical states, costs of therapy and age and sex of patients in their sensitivity analysis [Citation30]. The authors found that screening every five year for mild thyroid failure beginning at the age of 35 using a TSH assay that cost USD 25 resulted in an ICER of USD 9223 per QALY in women and USD 22,595 in men compared to no screening. The two largest variables in the sensitivity analysis were cost of the TSH assay and the “weight” assigned to the health state for mild thyroid failure. A cost of USD 50 per test would negate a desirable ICER as would changing the “weight” of life quality from 0.90 to 1.0 for undiagnosed hypothyroidism. These examples point out the value of sensitivity analysis when determining whether a new biomarker will make economic sense and what the target cost per test will likely need to be.

B-type natriuretic peptide for left ventricular dysfunction. Heart failure is one of the most common diagnoses of hospitalized patients over the age of 65 and has a very poor prognosis. Early treatment with ACE inhibitors is cost-effective at around USD 5,600 per QALY [Citation30]. Screening all individuals over the age of 55 for asymptomatic left ventricular by echocardioagraphy is not cost effective but screening with a B-type natriuretic peptide (BNP) test and performing echocardiography if the BNP is abnormal resulted in an ICER of USD 22,300 and USD 77,700 per QALY for men and women, respectively [Citation31]. AHRQ has stated that screening with BNP and echocardiography is cost-effective in all populations over 55 at USD 19,000 per QALY compared to no screening [Citation31]. In their recommendations a baseline prevalence of 2.7% and a cost of USD 30 per BNP test was used.

Serological markers for celiac disease. Celiac disease is a chronic inflammation of the small intestine resulting from an immune response to gluten leading to gastrointestinal symptoms (diarrhoea, abdominal pain, bloating, constipation and indigestion). It is an under recognized disease with only about 10% estimated as being properly diagnosed [Citation32]. The lack of diagnosis and treatment can lead to complications or unnecessary complicated procedures. The prevalence of celiac disease in adults and children is estimated at 0.07 – 1.6% (NICE) with individuals having type 1 diabetes, autoimmune thyroid disease and Downs syndrome having a higher prevalence. Diagnosis is made via biopsy obtained by endoscopy but serological testing for IgA autoantibodies to transglutaminase (tTG), and endomysial (EMA) have good diagnostic accuracy and can exclude biopsy [Citation32]. In one of the few NICE guidelines that directly addresses the use of a biomarker(s) for screening, NICE Clinical Guideline 86 recommends screening patients with GI symptoms, and asymptomatic individuals with the above predisposing conditions using serological markers. NICE CG 86 examined seven serological testing algorithms plus endoscopy. Compared with no screening the ICER for all seven serological algorithms followed by endoscopy for positive results was GBP 4,000 – 5,000 per QALY which is well within NICE thresholds for cost-effectiveness. This compared favourably to screening with endoscopy alone which had an ICER of GBP 12,500 per QALY [Citation32].

Theoretical marker for esophageal adenocarcinoma. The annual incidence of esophageal adenocarcinoma (EAC) in the US is about 6,500 and is rising faster than other cancers [Citation33]. Patients with symptomatic EAC have a poor prognosis. Gastroesophageal reflux (GERD) increases the risk of EAC and the presence of Barrett's esophagus increases risk another 40 fold [Citation33]. The current screening approach is to perform endoscopy in patients with GERD and if Barrett's esophagus is present repeat every three years thereafter and more frequently if dysplasia is found. No biomarkers currently exist for EAC. In this study the cost-effectiveness of a theoretical biomarker for EAC was examined using a Markov model using data from the literature about the incidence of EAC and transition states between the clinical states of GERD, Barrett's, dysplasia, cancer and death [Citation33]. Fixed costs for endoscopy, espophagectomy, postsurgical care, cancer care and clinic visits were incorporated into the model. The authors concluded that a theoretical biomarker with 80% sensitivity and 95% specificity for EAC would be cost-effective with an ICER of less than USD 50,000 per QALY. Sensitivity analysis examined biomarker sensitivity, specificity and cost. At a biomarker cost of USD 1,000 the sensitivity and specificity of the biomarker would need to be 90% and 97%, respectively, to meet the USD 50,000 per QALY threshold.

Summary

To assess whether a new biomarker will make economic sense it must clear the traditional hurdles to determine if the test will be clinically useful. Because of the long timeframes to examine some outcomes the true economic value of some screening tests may not be known for a very long time even when diagnostically accurate. Cost-effectiveness modelling has become common for drugs and interventions and payment decisions are being made by national health systems and private insurers based on these studies. The most common “currency” in cost-utility/effectiveness outcomes is the QALY and the most often cited threshold for payment decisions is ∼ GBP 30,000 or USD 50,000 per QALY gained. Cost-effectiveness studies and formal guidelines from payers are not common for clinical laboratory tests but examples are present and it can be expected that in today's environment of trying to control all costs throughout the healthcare system it is likely that new biomarkers will be subjected to such analysis. When considering a new theoretical biomarker it may be important for stakeholders to perform proactive cost-effectiveness analysis to help determine if a new biomarker will make economic sense from the payer's and society's perspective.

Declaration of interest: The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the paper.

References

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.