2,324
Views
24
CrossRef citations to date
0
Altmetric
ORIGINAL ARTICLE

Benchmarking Controlled Trial—a novel concept covering all observational effectiveness studies

Pages 332-340 | Received 18 Dec 2014, Accepted 04 Mar 2015, Published online: 12 May 2015

Abstract

The Benchmarking Controlled Trial (BCT) is a novel concept which covers all observational studies aiming to assess effectiveness. BCTs provide evidence of the comparative effectiveness between health service providers, and of effectiveness due to particular features of the health and social care systems. BCTs complement randomized controlled trials (RCTs) as the sources of evidence on effectiveness. This paper presents a definition of the BCT; compares the position of BCTs in assessing effectiveness with that of RCTs; presents a checklist for assessing methodological validity of a BCT; and pilot-tests the checklist with BCTs published recently in the leading medical journals.

Key messages
  • The Benchmarking Controlled Trial (BCT) is a novel concept which covers all observational studies aiming to assess effectiveness.

  • BCTs assess difference in effectiveness between single or a set of intervention(s), between clinical pathways, or between interventions targeting health care system factors with an aim to increase effectiveness.

  • Published BCTs have currently several methodological limitations, some of which could be avoided, and others should be acknowledged.

  • BCTs support both clinical and policy decisions, and should be given a high priority in research and in improvement activities.

Introduction

The experimental studies, randomized controlled trials (RCTs), provide the least biased information of the efficacy of medical interventions and create the basis for systematic reviews on effectiveness of interventions (Citation1). However, RCTs mostly assess effectiveness of interventions in ideal settings, and they focus on specific interventions rather than considering how effective is the whole clinical pathway (from the first treatment through all interventions during e.g. a 1-year follow-up time)—the latter is crucial for overall effectiveness. Thus there is a need for valid observational data on actual performance in routine settings, particularly as all educational, research, and leadership activities in medicine are intended to advance the health of the general population and care of ordinary patients (Citation2,Citation3).

The first aim of this paper is to assess the need for the new concept of Benchmarking Controlled Trials (BCTs), provide a definition of the BCT, and to present the two main categories (clinical, and health and social care system-related), and the respective subcategories of BCTs. The second aim is to present a checklist for assessing the methodological validity of a BCT and to point out methodological differences between RCTs and BCTs. The third aim is to pilot-test the checklist with BCTs published recently in the leading medical journals.

Methods

The previous international recommendations on how to report observational studies and systematic reviews of them (Citation4,Citation5) provide guidance on studies that investigate associations between exposures and health outcomes and address three types of observational studies: cohort, case-control, and cross-sectional studies. The author's idea was that there is a need for a framework which starts from the study question of effectiveness in observational settings. When the aim is to assess effectiveness of interventions, there are two options: experimental design (randomized controlled trials) or observational design. This paper concentrates on observational designs, and presents a comprehensive framework for them within the novel concept of Benchmarking Controlled Trials (BCTs).

When assessing effectiveness in an observational (real-world) setting, the index and comparator groups must have a priori as similar groups of patients as possible in order to allow adjusting for the potential baseline incomparability. Therefore, the comparisons have to be made between peers treating similar patients and thus there is always an element of benchmarking involved. This is the reason for the concept Benchmarking Controlled Trial. In addition, using e.g. a term such as observational controlled trial would probably have connotations that do not coincide with the present new idea (Citation6).

Differentiating the two main BCT categories—clinical and health care system determinants for effectiveness—was based on the author's idea that the requirement for baseline comparability for the clinical comparisons is, indeed, equally much needed when studying interventions aimed to make changes in the health care system (and through these changes increase effectiveness of interventions).

The pertinent clinical subcategories were consequently: 1) effectiveness of a particular single or set of interventions during a limited time frame (like surgery, or 3 months’ rehabilitation period) and 2) effectiveness of the whole clinical pathway from start (e.g. acute myocardial infarction) through all various health (and social) care interventions (diagnostic, treatment, rehabilitation; primary, secondary, tertiary care) which happen during e.g. a 1-year follow-up time. The health care system intervention subcategories were defined further according to recent literature () (Citation7). For health care system interventions no universally established categories exist, but, regardless of what they are, any change in the health care system aiming to increase effectiveness falls into the category of a BCT.

Figure 1. Categories and subcategories of Benchmarking Controlled Trials (BCTs). Randomized Controlled Trials (RCTs) constitute the category of experimental effectiveness studies (shown in the figure only to illustrate that all effectiveness studies are either BCTs or RCTs).
Figure 1. Categories and subcategories of Benchmarking Controlled Trials (BCTs). Randomized Controlled Trials (RCTs) constitute the category of experimental effectiveness studies (shown in the figure only to illustrate that all effectiveness studies are either BCTs or RCTs).

The checklist for methodological validity issues of BCTs, as well as the appraisal of methodological issues inherent to BCTs, was based on the author's previous work with randomized controlled trials and observational studies (Citation8–13), and with methodological issues in RCTs and observational effectiveness studies, including work within the Cochrane Collaboration Back Review Group (Citation1,Citation9,Citation12,Citation14–16). Previous checklists for observational studies and systematic reviews of them were also utilized (STROBE (Citation4), MOOSE (Citation5)), as well as scientific literature on particular characteristics of observational studies relevant in the assessment of effectiveness of interventions (Citation17).

For piloting the checklist, the 10 most recent BCTs published in the leading medical journals (New England Journal of Medicine, Lancet, Journal of American Medical Association, British Medical Journal, and Annals of Internal Medicine) were identified through a PubMed search and by the author searching the articles directly from the journals. The search terms were: benchmarking, registries, effectiveness, and name of the journal. All the included articles had to have an observational design, and aim to assess effectiveness of an intervention directed to patients or directed to the health care system. Five articles assessing clinical features and five assessing health care system-related features as determinants of effectiveness from January 2010 to October 2014 were included. Data extraction was rechecked, and errors were corrected by the author to reach the final appraisal.

Results

Definition and categories of the Benchmarking Controlled Trial

There is a clear need for the new concept Benchmarking Controlled Trial (BCT) as there is no previous systematic guidance on methodological issues in planning and reporting an observational effectiveness study (Citation4,Citation5). Furthermore, the idea of the author that, in addition to clinical interventions, any intervention directed to the health care system must be studied in a BCT is a new one. The term benchmarking is accurate because all comparisons have to be between peers and thus include an element of benchmarking. Furthermore, the results of BCTs should be exploited in the effort to increase effectiveness using the comparative data between peers—which is benchmarking (Citation2).

A BCT is defined as an observational study aiming to provide non-biased estimates of comparative differences in outcomes and costs in real-world circumstances due to a single or a set of intervention(s) or throughout the clinical pathway between two or more health service providers for a well-defined group of patients; or an observational study aiming to provide evidence of the comparative effectiveness of the health care system or parts of it among a well-defined group of patients. Data on disadvantaged patient groups should be included always when feasible, because their prognosis often differs from that of non-disadvantaged groups. Therefore, inability to control for the differences between disadvantaged and non-disadvantaged populations may lead to biased estimates. Furthermore, data on prevailing inequality will be go unnoticed.

The study question in BCTs should ideally be defined according to the PICO principle (patient, intervention, comparison intervention, and outcome) taking into consideration interventions during the whole clinical pathway. The health care service providers can be individuals, health care units, hospitals, health care districts, or countries.

Features of BCTs in the two main categories (clinical effectiveness and factors related to the health care system) and in their subcategories are presented in . illustrates the categories and subcategories of BCTs covering all observational study designs on effectiveness. In order to illustrate the entity of effectiveness studies, also RCTs are shown in the picture, as well as their subcategories explanatory (ideal circumstances) and pragmatic (ordinary health care circumstances). It must be emphasized that although pragmatic RCTs provide evidence on effectiveness in routine settings, they seldom cover the whole clinical pathway, and generalizability to other settings is limited.

Table I. Categories, subcategories, and characteristics of Benchmarking Controlled Trials (BCTs).

Characteristics of the checklist and methodological issues in BCTs

The main categories and their subcategories of methodological issues in BCTs are presented in . The pilot-testing of the checklist shows also main contents of the 10 studies.

Table II. Methodological characteristics of Benchmarking Controlled Trials (BCTs) in 10 studies published between January 2010 and October 2014 in leading medical journals. Assessment is based solely on each particular paper; if information is not reported, the issue is assessed as unclear. Each characteristic is recorded as yes, partial, unclear, or no; yes indicates that the criterion has been met.

It is noteworthy that there is an overall methodological difference between experimental trials and benchmarking trials. In experimental trials (RCTs) the data collection in each treatment arm is determined in a uniform way, and researchers’ obligation is that the conduct of an RCT adheres to the protocol. In observational settings—comparing different service providers—the accrual of the data may not be determined beforehand as strictly as in an RCT, or the quality assurance during data gathering may not be as rigorous. Therefore validity assessment in BCTs must usually be undertaken separately for all the health care service providers. Even if there has been uniform instructions on how to collect the data, the success of doing so may differ between the providers.

Another notable methodological issue is that when assessing the comparative effectiveness of a particular intervention or the whole clinical pathway in BCTs, appropriate baseline adjustment is a major challenge. Obtaining proper information of the interventions during the clinical pathway is also most important for two reasons: Firstly, to get further evidence supporting the plausibility of differences in effectiveness estimates, and secondly to have information to be used for improving the treatment processes.

When assessing the effectiveness of interventions targeting the health care system there are four major challenges. Firstly, sufficient data are needed to obtain information indicating whether the health care system factors (e.g. related to an economic incentive) may have led to selection of patients and thus to differences in baseline characteristics. The second challenge is to obtain data of the patients’ clinical pathways to know in what degree the intervention targeting the system may have changed the way patients are treated. The third challenge is to adjust for differences in baseline characteristics between the comparators, and analyze differences in treatment processes as mediators of the effects posed by the health care system factors. The fourth challenge is to try to document all the effects the intervention causes to the health care system including unintended unfavorable effects. However, this major challenge of observing a complex system goes beyond the present treatise.

A big difference between benchmarking controlled trials (BCTs) and randomized controlled trials (RCTs) is selection of patients. In the former, patients entering the study in each treatment arm may differ due to selection, while in the latter random allocation to treatment arms (regardless of selection) leads often to comparable treatment groups. To decrease potential for selection bias in BCTs, a two-step procedure is suggested: 1) eligibility criteria should be chosen so that they lead to a homogeneous patient population (e.g. only patients having their first-time acute ischemic stroke will be included) (Citation13,Citation18), and 2) the residual baseline differences have to be statistically adjusted. Instrumental variables may be feasible in some cases to compensate partially for the lack of randomization (Citation19), and the propensity score method may enhance baseline comparability in BCTs (Citation8). Exploitation of a natural experiment may provide an excellent opportunity to increase baseline comparability in BCTs; e.g. in a previous study the health effects of becoming unemployed were studied in a situation when due to nationwide recession suddenly half of construction workers become unemployed, and the allocation to unemployment occurred mainly by chance (Citation20).

Concerns of sufficient clinical information and validity of the data are usually greater in BCTs than in RCTs—particularly if the data for a BCT have been gathered retrospectively, and thus no a-priori protocol has been used. A high number of dropouts is a validity concern for both RCTs and BCTs, as well as the importance of using valid outcome measures. Selective outcome reporting by researchers within a RCT may lead to biased conclusions, but in BCTs selective reporting may occur also during the data collection—often undertaken by the health care providers themselves. There are a number of statistical analysis issues that are characteristic to BCTs ().

Pilot-testing of the checklist

All the 10 articles were from the New England Journal of Medicine and Lancet, as eligible studies were not found from the other journals () (Citation21–30).

In the five studies assessing clinical effectiveness, the diagnoses included treatments for selected cancers, non-cardiac surgery, bariatric surgery, rupture of an aortic aneurysm, and acute myocardial infarction. The main outcomes were mortality in four studies, and complication rates in one study. In the five studies assessing effectiveness in relation to health care system-related factors, the indications were more varied than in the clinical effectiveness studies and included a set of surgical indications (two BCTs), a set of indications treated conservatively, intensive care patients, and ambulatory care patients. The determinants for the outcome were the size of the centers providing the service, quality improvement program, presence of a night-time intensivist in the hospital, pay-for-performance, and workload and qualifications of nurses. The main outcomes were mortality in four studies, and health care spending and quality of care in one study.

Concerning methodological issues in the 10 studies several limitations were observed. No study provided a description of patients’ clinical path prior to eligibility for the study. No study exploited an opportunity provided by a natural experiment. Valid diagnostic information at baseline was presented by four studies with a clinical research question, and in two of the studies with a health care system-related objective. There were deficiencies in other clinical baseline factors; and factors indicating lifestyle or environment were lacking in all the studies. Information of diagnostics and treatment procedures was lacking altogether in one clinical study and in three studies with focus on the health care system. No study assessed outcomes among disadvantaged patient groups. No study utilized instrumental variables, and only two studies provided power calculations for determining size of the study sample.

Discussion

This paper presents a novel concept, the Benchmarking Controlled Trial (BCT). There are several new ideas involved, particularly 1) that an element of benchmarking is always involved when making observational comparisons in real-world circumstances, and 2) that assessment of effectiveness due to any health care system intervention faces the same methodological challenges as clinical comparisons. Because of the risk for more than one connotation for one concept, a new term of e.g. observational controlled trial did not seem to be appropriate (Citation6).

In those BCTs which pursue evidence on clinical effectiveness, information of baseline patient characteristics, of diagnostic procedures and treatments, and of the outcomes is needed for the comparisons between providers. If baseline imbalances between patients treated by different providers can be satisfactorily adjusted for, also comparisons based on treatment outcomes may be justified (Citation31). If feasible, all clinically important patient-relevant outcomes should be documented. However, it is most important to obtain data also of the treatment processes—how well these concord with current scientific evidence (Citation32). Benchmarking controlled trials should aim to assess quality (appropriate interventions), effectiveness and costs of services, as well as issues related to potential inequality in obtaining services shown effective (Citation3).

In BCTs which pursue evidence on effectiveness due to health care system-related factors, there must be a homogeneous target population, and if there are several diagnoses, they should preferentially be differentiated and evidence presented separately for each diagnosis. If there is insufficient data of the diagnoses and related baseline characteristics, the evidence on effectiveness may remain very uncertain.

Previous checklists for advancement of better reporting of observational studies give guidance for studies aiming to assess causal relationship between exposure and outcome. The checklist developed for and described in this paper is intended for supporting planning, conducting, reporting, and peer reviewing manuscripts of observational studies assessing effectiveness of interventions, the BCTs.

The pilot-testing of the checklist using recent articles published in leading medical journals showed a wide variety of methodological strengths and limitations in the original studies. No study provided a description of patients’ clinical path before entering the study. Description of baseline characteristics was deficient or even lacking, causing uncertainness in between-group comparability. Information of diagnostics and treatment procedures was scarce. Instrumental variables were not utilized, and power calculations were rare.

Conclusions

The new concept of the BCT provides guidance for studies assessing comparative effectiveness between single or sets of interventions, between clinical pathways, or between health care systems or factors related to the system. Benchmarking controlled trials cover the whole area of observational effectiveness research.

A checklist for assessing the methodological validity of BCTs has here been subjected to preliminary pilot-testing, but should be properly validated. However, the checklist can readily be used in planning, conducting, reporting, and appraising BCTs.

Current BCTs seem to have several methodological limitations, some of which could be avoided in planning and conducting phases of the studies, and others should be acknowledged in discussion.

Benchmarking controlled trials—supporting both clinical and policy decisions—should be given a high priority in research, and their results should be used in improvement activities provided they have sufficient methodological rigor and generalizability. The proposed methodology is suggested also for non-scientific quality improvement and benchmarking undertakings.

Funding: No outside funding.

Declaration of interest: The author declares no support from any organization for the submitted work; no financial relationships with any organization that might have an interest in the submitted work; and no other relationships or activities that could appear to have influenced the submitted work.

References

  • Furlan AD, Pennick V, Bombardier C, van Tulder M;Editorial Board, Cochrane Back Review Group. 2009 updated method guidelines for systematic reviews in the Cochrane Back Review Group. Spine (Phila Pa 1976). 2009;34:1929–41.
  • Malmivaara A. Real-effectiveness medicine-pursuing the best effectiveness in the ordinary care of patients. Ann Med. 2013;452:103–6.
  • Malmivaara A. On decreasing inequality in health care in a cost- effective way. BMC Health Serv Res. 2014;14:79.
  • von Elm E, Altman DG, Egger M, Pocock SJ, Gotzsche PC, Vandenbroucke JP, et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: guidelines for reporting observational studies. Int J Surg. 2014;12:1495–9.
  • Stroup DF, Berlin JA, Morton SC, Olkin I, Williamson GD, Rennie D, et al. Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group. JAMA. 2000;283:2008–12.
  • Larsen KR, Voronovich ZA, Cook PF, Pedro LW. Addicted to constructs: science in reverse? Addiction. 2013;108:1532–3.
  • Klazinga N, Li L. Comparing health services outcomes. In: Papanicolas I, Smith P, editors. Health system performance comparison. An agenda for policy, information and research. 1st ed. Maidenhead, England: Open University Press, McGraw-Hill Education; 2013. p. 157–82.
  • Häkkinen U, Malmivaara A. The PERFECT project: measuring performance of health care episodes. Ann Med. 2011;43(Suppl 1):S1–3.
  • Sihvonen R, Paavola M, Malmivaara A, Itala A, Joukainen A, Nurmi H, et al. Arthroscopic partial meniscectomy versus sham surgery for a degenerative meniscal tear. N Engl J Med. 2014;370:1260–1.
  • Viljanen M, Malmivaara A, Uitti J, Rinne M, Palmroos P, Laippala P. Effectiveness of dynamic muscle training, relaxation training, or ordinary activity for chronic neck pain: randomised controlled trial. BMJ. 2003;327:475.
  • Torkki M, Malmivaara A, Seitsalo S, Hoikka V, Laippala P, Paavolainen P. Surgery vs orthosis vs watchful waiting for hallux valgus: a randomized controlled trial. JAMA. 2001;285:2474–80.
  • Malmivaara A, Hakkinen U, Aro T, Heinrichs ML, Koskenniemi L, Kuosma E, et al. The treatment of acute low back pain—bed rest, exercises, or ordinary activity? N Engl J Med. 1995;332:351–5.
  • Malmivaara A, Meretoja A, Peltola M, Numerato D, Heijink R, Engelfriet P, et al. Comparing ischaemic stroke in six European countries. The EuroHOPE register study. Eur J Neurol. 2015;22:284–91.
  • Croft P, Malmivaara A, van Tulder M. The pros and cons of evidence-based medicine. Spine. 2011;36:E1121–5.
  • Sihvonen R, Paavola M, Malmivaara A, Jarvinen TL. Finnish Degenerative Meniscal Lesion Study (FIDELITY): a protocol for a randomised, placebo surgery controlled trial on the efficacy of arthroscopic partial meniscectomy for patients with degenerative meniscus injury with a novel ‘RCT within-a-cohort’ study design. BMJ Open. 2013;33.
  • Malmivaara A, Koes BW, Bouter LM, van Tulder MW. Applicability and clinical relevance of results in randomized controlled trials: the Cochrane review on exercise therapy for low back pain as an example. Spine (Phila Pa 1976). 2006;31:1405–9.
  • Vandenbroucke J. When are observational studies as credible as randomised trials? Lancet. 2004;363:1728–31.
  • Peltola M, Juntunen M, Häkkinen U, Rosenqvist G, Seppälä TT, Sund R. A methodological approach for register-based evaluation of cost and outcomes in health care. Ann Med. 2011;43:S4–13.
  • Vandenbroucke JP. Observational research, randomised trials, and two views of medical science. PLoS Med. 2008;53:e67.
  • Leino-Arjas P, Liira J, Mutanen P, Malmivaara A, Matikainen E. Predictors and consequences of unemployment among construction workers: prospective cohort study. BMJ. 1999;319:600–5.
  • Coleman MP, Forman D, Bryant H, Butler J, Rachet B, Maringe C, et al. Cancer survival in Australia, Canada, Denmark, Norway, Sweden, and the UK, 1995-2007 (the International Cancer Benchmarking Partnership): an analysis of population-based cancer registry data. Lancet. 2011;377:127–38.
  • Pearse R, Moreno RP, Bauer P, Pelosi P, Metnitz P, Spies C, et al. Mortality after surgery in Europe: a 7 day cohort study. Lancet. 2012;380:1059–1065.
  • Birkmeyer JD, Finks JF, O’Reilly A, Oerline M, Carlin AM, Nunn AR, et al. Surgical skill and complication rates after bariatric surgery. N Engl J Med. 2013;369:1434–42.
  • Karthikesalingam A, Holt PJ, Vidal-Diez A, Ozdemir BA, Poloniecki JD, Hinchliffe RJ, et al. Mortality from ruptured abdominal aortic aneurysms: clinical lessons from a comparison of outcomes in England and the USA. Lancet. 2014;383:963–9.
  • Chung SC, Gedeborg R, Nicholas O, James S, Jeppsson A, Wolfe C, et al. Acute myocardial infarction: a comparison of short-term survival in national outcome registries in Sweden and the UK. Lancet. 2014;383:1305–12.
  • Finks JF, Osborne NH, Birkmeyer JD. Trends in hospital volume and operative mortality for high-risk surgery. N Engl J Med. 2011;364: 2128–37.
  • Song Z, Safran DG, Landon BE, He Y, Ellis RP, Mechanic RE, et al. Health care spending and quality in year 1 of the alternative quality contract. N Engl J Med. 2011;365:909–18.
  • Wallace DJ, Angus DC, Barnato AE, Kramer AA, Kahn JM. Nighttime intensivist staffing and mortality among critically ill patients. N Engl J Med. 2012;366:2093–101.
  • Sutton M, Nikolova S, Boaden R, Lester H, McDonald R, Roland M. Reduced mortality with hospital pay for performance in England. N Engl J Med. 2012;367:1821–8.
  • Aiken LH, Sloane DM, Bruyneel L, Van den Heede K, Griffiths P, Busse R, et al. Nurse staffing and education and hospital mortality in nine European countries: a retrospective observational study. Lancet. 2014;383:1824–30.
  • Hakkinen U, Iversen T, Peltola M, Seppala TT, Malmivaara A, Belicza E, et al. Health care performance comparison using a disease-based approach: the EuroHOPE project. Health Policy. 2013;112-2:100–9.
  • Hermans MP, Elisaf M, Michel G, Muls E, Nobels F, Vandenberghe H, et al. Benchmarking is associated with improved quality of care in type 2 diabetes: the OPTIMISE randomized, controlled trial. Diabetes Care. 2013;36:3388–95.