546
Views
4
CrossRef citations to date
0
Altmetric
ORIGINAL ARTICLE

Assessing the impact of biomarkers on patient outcome: An obligatory step

&
Pages 85-89 | Published online: 01 Jun 2010

Abstract

Payers for healthcare increasingly require evidence about health outcomes of medical interventions. Outcomes research uses various study designs to provide such evidence, with the highest level of evidence provided by randomized controlled trials (RCTs). Among published studies of biomarkers, however, relatively few determine the relationship of biomarker testing to outcomes, and only a small fraction of those studies are RCTs, and fewer still follow the CONSORT standards for reporting of trials. Outcomes studies of biomarkers are difficult to carry out. During an outcomes study, clinicians may be expected to use the results of the test (e.g., troponin) along with other information (e.g., symptoms of an acute coronary syndrome) to decide about use of another intervention (such as cardiac catheterization) that is hoped to improve an outcome (e.g., mortality rate) at some time in the future. Studies of diagnostic tests frequently lack evidence that test results were acted upon at all, much less according to a defined protocol. The potential for a biomarker to improve outcomes depends upon a wide range of variables. These variables include the diagnostic accuracy of the test and the effectiveness of the therapeutic intervention, both of which will, predictably, vary with the patient population studied. Thus outcomes studies performed in one patient population leave unanswered questions regarding outcomes in other populations. The questions are infinite, but resources are finite. Simulation modelling studies are attractive as an adjunct to patient studies to address multiple patient variables and multiple treatment approaches without the expense of multiple clinical studies.

“…with few exceptions, there is very little evidence that use [of diagnostic tests] results in improved health.”

Michael S. Lauer, MD, in [1] Director of the National Heart, Lung, and Blood Institute’s Divisions of Prevention and Population Sciences and of Cardiovascular Diseases

“There are a lot of technologies, services and treatments that have not been unequivocally shown to improve health outcomes in a definitive manner.”

Dr. Barry Straube, in [2] Chief Medical Officer, Medicare

The requirement to demonstrate value of medical interventions has never been greater. In the United States for example, the on-going battle over healthcare reform has reignited debate about the benefits and costs of medical interventions. The new battle cry is for more “comparative effectiveness research”, that is, research on what works and what doesn't. Payers (government, insurers, patients, employers) want to know which interventions lead to better outcomes such as decreased morbidity (and days lost from work), decreased mortality, and lower cost [Citation1–3]. The evidence that use of laboratory tests (or imaging tests [Citation4]) improves outcomes is at best meagre, and far less than the evidence (which itself is inadequate) on the effect of drugs on outcomes. For biomarkers, the subject of this paper, there is no question that the focus on outcomes will affect efforts to introduce and use new biomarker tests. In this paper, we will (Citation1) review the concept of outcomes studies, (Citation2) examine some ways in which they are performed and some associated difficulties, and (Citation3) explore the possible utility of simulation modelling as an adjunct to direct studies of patient outcomes.

What are outcomes and how are they studied?

Evaluations of laboratory tests may encompass several dimensions of test performance (). Outcomes studies are arguably the most important. Outcomes have been defined as results of medical interventions in terms of health or cost [Citation5]. The term “patient outcomes” is sometimes used as a synonym, but may be better reserved for results that are perceptible to the patient [Citation5]. Outcomes studies deal with the relationships between interventions and outcomes.

Table I. Common test characteristics that are amenable to study.

Outcomes studies must be distinguished from studies of prognostic accuracy of tests (). Studies of prognostic accuracy ask the question: Does the result of a test predict an outcome of interest? By contrast, outcomes studies ask the question: Is use of an intervention (e.g., measurement of a biomarker) associated with improved outcomes? Examples of test features and related outcomes are presented in .

Table II. Examples of test attributes and outcomes that are amenable to study in outcomes studies.

Studies of outcomes of the use of a biomarker can use several designs. These include randomized controlled trials (RCTs, discussed further below), studies using historical controls and observational studies. Not infrequently, results of studies using different designs have led to different conclusions about the utility of a given intervention. Some of the most heated debates about diagnostic tests, such as those about the utility of mammography and prostate-specific antigen, have arisen from studies of differing designs that reached varying conclusions. Often observational studies suggest that an intervention has value, but cannot be confirmed by subsequent RCTs.

Challenges in performing and reporting outcomes studies of biomarker testing

While sounding simple at first glance, outcomes studies of laboratory tests are in fact difficult [Citation6]. Examples of difficulties are listed in . Key among them is the remoteness of outcomes from the performance of the test: For example, outcomes may be several steps, often many steps, removed from the outcome that is to be measured, and, in addition, clinicians may use results of tests inconsistently. Thus, the ability of biomarker measurements to improve outcomes depends upon factors that are not under the control of the analyst. Moreover, no biomarker will lead to improved outcomes if there is no therapy that is effective in preventing or treating the condition for which the biomarker is intended to be used. These difficulties have been noted by many observers as was reviewed by one of us over a decade ago [Citation6].

Table III. Problems in the performance of outcomes studies of diagnostic tests.

Table IV. Guidance on performance of test evaluations.

Success of outcomes studies typically requires prespecified and agreed reactions to results of the biomarker testing. Thus, for example, when examining the use of a test (e.g., colonoscopy) for a precancerous condition (adenomatous polyps), a positive result must consistently be followed by an intervention (removal of the polyp) that is designed to decrease the risk of the adverse outcome being studied (colon cancer).

The design of outcomes studies of therapies has been long studied. The approaches to those studies are better established and simpler than for studies of biomarkers. Outcomes studies have been a key activity in the area of pharmaceuticals, where for many years the usual expected study has been a randomized controlled trial (RCT). Patients are randomized to receive the new drug or another intervention (typically a placebo or a competing therapy) and prespecified outcomes (including adverse effects) are measured and evaluated statistically. The process of approval of therapeutic interventions by regulatory agencies largely depends on data from RCTs, since RCT results represent the highest form of evidence that can be derived from a single study. Thus it is relevant to consider RCTs in discussing outcomes studies for biomarkers.

The critical importance and power of RCTs of drugs led to the development of the CONSORT Statement [Citation7] to improve reporting of RCTs. To avoid problems in reporting evaluations of drugs, CONSORT focuses on the reporting of both the design and the results of trials. Thus key items include the inclusion and exclusion criteria for participants who are studied, a statement of outcome measures, methods of randomization and concealment of allocation, techniques for masking (“blinding”) if done (sometimes difficult for biomarker studies), participant flow (a flow diagram is strongly recommended), and effects of the intervention (with confidence intervals). Recently, extensions of CONSORT cover special types of clinical trials including non-inferiority and equivalence trials, but they are not focused on outcomes of biomarker testing. Nonetheless, much if not all of CONSORT is completely relevant to RCTs of biomarkers and can serve as a reminder of information to include when reporting them.

Clinical trials of therapeutic interventions have proven to have other problems related to incomplete reporting of results and suppression of results that are unfavourable to sponsors. One reaction to this, spearheaded by editors of key medical journals, has been that the design of clinical trials must be registered on a publicly accessible database. This makes clear the primary endpoints and identifies that data that are to be collected and thus should be available at the time of publication of the results. Editors have agreed to not publish results of trials that were not registered. To avoid selective reporting and other threats to communication of results, this model should be followed for outcomes studies of biomarkers. Editors of journals in the field of clinical chemistry have a role to play in this endeavour.

Two examples of outcomes studies in laboratory medicine

Although outcomes studies of biomarkers are difficult to do, they can be done and they have been around for a while. A 1976 study [Citation8] published in Clinical Chemistry investigated the relationship of multiphasic biochemical screening of patients being admitted to hospital. Screening had no effect on length of stay, but did increase cost. The topic of routine testing before medical interventions received renewed interest in 2000 when Schein reported in the New England Journal of Medicine an RCT of 18,819 patients scheduled for cataract surgery [Citation9]. No benefit of the testing could be identified. As with the 1976 study, however, there were no specified actions to be taken in response to the test results. Thus it is not clear that any actions (which might have been useful) were taken. If no actions were taken, no benefits could occur. (I do not wish to imply that routine screening is valuable. The point is that this very large and well conducted study does not answer the question of whether screening would have benefit if specific interventions were taken in response to test results.) Compounding the problem of unspecified or absent therapeutic responses to test results, the designs of randomised comparisons of medical tests are sometimes invalid and not always efficient [Citation10].

Questions are infinite, resources are finite

Diagnostic and prognostic tests, including biomarkers, are used in multiple patient populations and often are used for a variety of purposes. Studies of diagnostic accuracy, for which we have many more examples than there are examples of outcomes studies of diagnostic testing, have demonstrated that indices of diagnostic accuracy (such as likelihood ratios) vary with the patient population that is studied. Thus, tests appear to perform better diagnostically in patients with advanced disease and to perform less well when the non-disease group has symptoms that resemble those in the affected group. Study design also markedly affects results. In outcomes testing, if a test has poor diagnostic accuracy in the studied group, it is predictable that the test has limited ability to improve outcomes. Furthermore, in outcomes studies we must add to this problem the fact that the effectiveness of the treatment initiated by performance of a given test is likely to vary with the population. For example, a drug may be less effective in men than in women, more effective in Asians than others or totally ineffective or even dangerous in children but quite acceptable in adults between the ages of 40 and 50.

Assessment of the ability of a biomarker test to improve outcomes becomes an overwhelming task if one considers all of the possible combinations of patient groups, primary and secondary outcomes of interest, the adverse effects of treatments, etc., not to mention the techniques used to measure the biomarker, sample handling and a host of variables of which laboratorians are all-too-well aware. Addressing all these issues for even one biomarker would require an enormous number of studies with vast numbers of participants. In sum, the questions are infinite and resources are finite.

Simulation modelling as a prelude and adjunct to RCTs

Simulation modelling has features that make it attractive as an adjunct or even an alternative to RCTs. The effect of testing can be simulated when (Citation1) the diagnostic accuracy of a test has been carefully studied (for at least one well-defined patient group) and (Citation2) the effect of correct treatment of diagnosed patients has been similarly well studied in that patient group. Simulation is earning a place at FDA in evaluation of devices, and results of simulation studies of patient outcomes are being considered in deliberations of guidelines groups and committees of national and international standards groups.

One rather early area of simulation studies was an attempt to answer the question, how analytically accurate must glucose meters be to allow improved patient outcomes. No clinical studies addressed this question in patients: Clinical studies appeared too expensive to do, and there was little incentive for market leaders (who had the cash) to fund such studies. Moreover, physicians and patients seemed not to even consider the question: the meter gave a number and that was that. Not surprisingly, recommended quality specifications (analytical goals), other than those based on biological variability, were not evidence-based; recommended total error limits ranged from 5 to 20%, an enormous difference. What was of interest was to find evidence at the highest level of the hierarchy of quality specifications: How could the analytical quality requirement be related to medical use of the test?

In the 1990s, we considered the fact that a major medical use of meters was for patients to adjust their insulin doses. The insulin dose was determined by the measured glucose concentration. We reasoned that meter errors could lead to selection of a dose of insulin that was inappropriate for the true glucose concentration in the patient. Moreover, we realized that the rate of inappropriate insulin doses (and the magnitude of the error in the insulin dose) could be determined by simulation modelling as a function of a meter's bias and imprecision. The simulation we chose [Citation11] consisted of generating 10,000–20,000 numbers representing true glucose concentrations. These concentrations were then measured by a virtual meter with a defined imprecision and bias. Then the insulin doses appropriate for each true glucose concentration and its corresponding “measured” glucose were compared and the discrepancies tabulated.

The simulation studies [Citation11] predicted that, with the then-current meters, insulin doses were in error about 25% of the time. To achieve a rate of insulin-dose-errors below 5% required that both bias and imprecision be less than 1.0–1.5%. No existing meters approached this specification. Although interesting (and perhaps disturbing), these studies do not tell us about outcomes. Perhaps errors in insulin dose are not important to outcomes.

More recent simulations have taken us a step closer to studying outcomes of meter errors [Citation12]. In these studies we simulated the use of measurements of glucose concentration to adjust insulin infusion rates in protocols for tight glucose control in ICUs. Here we have another clinical scenario that is ideal for modelling: The result of the test (glucose concentration) dictates the action (infusion rate for insulin). Moreover, the effect of the insulin infusion rate on the plasma glucose can be predicted from physiological models that incorporate the patient's insulin sensitivity, rate of gluconeogenesis, etc. Sensitivity analyses can be done to investigate the effect of changes in the insulin sensitivity, rate of gluconeogenesis, etc, and also changes in the protocol for selection of the insulin infusion rate.

For these new studies, we used statistical Monte Carlo methods like those used in our studies of home meters: “Measure” glucose concentration with defined bias and imprecision. Choose insulin infusion rate based on result. Calculate the change in glucose concentration based on model of glucose control. Again measure plasma glucose concentration, etc.

By repeating these steps hourly for 100 hours for several thousand modeled patients, and then repeating the process for meters with multiple combinations of bias and imprecision, the model predicted the effect of meter error on variables in the modelled patients such their as mean glucose concentration over 100 hours, the frequency of hypoglycemia (definitely an outcome that patients do not like), and variability of the true glucose concentration (which has been reported to affect patient outcomes) [Citation12]. The obvious next step is to incorporate in the model the published effects of poor glucose control on patient outcomes such as mortality and infection rates.

This modelling exercise had several attractive features, including the following. After the program was written, numerous scenarios could be studied rapidly and at low cost on a personal computer. The model predicted non-intuitive effects that warrant study in the ICU and should be considered in design of an RCT (if one is ever to be funded). The modelling showed how different protocols were more or less sensitive to meter errors. It also suggested limits of total analytical error that allowed reasonable control of plasma glucose concentration. These experiences lead us to believe that use of simulation for study of biomarkers is limited only by our imaginations and ability to devise ways to relate biomarker testing to outcomes in a computer model.

In a recent study, Karon and co-workers used a modelling approach similar to that in our 2001 study to examine what meter accuracy was needed for tight glucose control [Citation13]. This study tabulated the full distribution of glucose meter results derived over a 6-month period in two intensive care units by clinical application of the Mayo Clinic algorithm for tight glucose control, and then simulated various combinations of meter bias and imprecision for “true” glucose results derived from this distribution. Such a simulation approach not only estimates the overall effects of bias and imprecision on insulin dosing accuracy, but does so while taking into account how commonly the simulated glucose results will be encountered in practice.

Conclusions

In summary, outcomes studies are becoming more common in laboratory medicine. They can assess the potential utility of tests, such as biomarkers, and the analytical quality requirements for the tests. The studies require careful attention to principles of study design and reporting, and CONSORT provides guidance. Most importantly, outcomes studies are required to determine the value of tests in terms that are important to patients, society or both. Ideally, only tests that improve outcomes will be paid for. We conjecture that the day is coming when this will be the case. Finally, we believe that simulation modelling can be an important adjunct to trials in patients. Modelling can inform design of RCTs, explore the likely effect of variation in patient groups and in test performance, and model millions of patients at a modest cost.

Declaration of interest: The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the paper.

References

  • Mitka M. Task force: nontraditional markers add little to heart risk assessment. JAMA 2009;302:2192–3.
  • Abelson R. New York Times. March 13, 2008. www.nytimes.com/2008/03/13/business/13scan.html?pagewanted=print (Accessed April 19, 2010).
  • Peter J, Neumann PJ, Tunis SR. Medicare and medical technology—the growing demand for relevant outcomes. N Engl J Med 2010;362:377–9.
  • Mitka M. Research offers only a limited view of imaging's effect on patient outcomes. JAMA 2010;303:599–600.
  • Bissell MG. Laboratory-related measures of patient outcomes: an introduction. Washington: AACC Press, 2000194.
  • Bruns DE. Laboratory-related outcomes in health care. Clin Chem 2001;47:1547–52.
  • Begg C, Cho M, Eastwood S, Horton R, Moher D, Olkin I, . Improving the quality of reporting of randomized controlled trials. The CONSORT statement. JAMA 1996;276: 637–9.
  • Durbridge TC, Edwards F, Edwards RG, Atkinson M. Evaluation of benefits of screening tests done immediately on admission to hospital. Clin Chem 1976;22:968–71.
  • Schein OD, Katz J, Bass EB, Tielsch JM, Lubomski LH, Feldman MA, . The value of routine preoperative medical testing before cataract surgery. N Engl J Med 2000;342:168–75.
  • Bossuyt PM, Lijmer JG, Mol BW. Randomised comparisons of medical tests: sometimes invalid, not always efficient. Lancet 2000;356:1844–7.
  • Boyd JC, Bruns DE. Quality specifications for glucose meters: assessment by simulation modeling of errors in insulin dose. Clin Chem 2001;47:209–14.
  • Boyd JC, Bruns DE. Monte Carlo simulation in establishing analytical quality requirements for clinical laboratory tests: Meeting clinical needs. Methods Enzymol 2009;467: 411–33
  • Karon BS, Boyd JC, Klee GG. Glucose meter performance criteria for tight glycemic control estimated by simulation modeling. Clin Chem 2010;56:(in press).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.