1,799
Views
0
CrossRef citations to date
0
Altmetric
Editorial

There is no single gold standard study design (RCTs are not the gold standard)

Pages 267-270 | Received 10 Jan 2023, Accepted 12 Apr 2023, Published online: 19 Apr 2023

1. Introduction

Randomized controlled trials (RCTs) are often referred to as the gold standard of biomedical research [Citation1]. However, there is no single gold standard of study design, especially when it comes to studies of drug safety. The gold standard design is whatever method provides the information you need in the most reliable way in view of what is currently known or can be obtained [Citation1].

2. Body of text

2.1. Oversimplified hierarchy

As an epidemiologic expert witness who addresses drug safety information in federal court cases, I have on multiple occasions been confronted with a misleading narrative that promotes a hierarchy of evidence based on study type. This hierarchy places RCTs at its top, followed in sequence by cohort studies, case-control studies, and case reports [Citation2]. The reasoning behind this scheme is that the ‘better’ designs have lower risks of bias than the ‘lesser’ designs and therefore provide more reliable evidence. However, this rationalization boils down the complexities of drug safety research into a simplistic hierarchy based on only four study design elements while ignoring many other important study features. The remainder of this essay addresses why use of this or any other rigid evidence hierarchy is ill-advised.

2.2. Randomization is no panacea

Randomization is an important tool for mitigating baseline confounding in RCTs. However, it does not free a study of all other sources of error, such as those that might occur from post-randomization confounding and selection bias [Citation3]. Other forms of error that may adversely affect trials include inaccurate case ascertainment, inadequate blinding, differential loss to follow-up, noncompliance, inadequate follow-up, failure to adequately consider time-dependencies, failure to consider all contributing factors, and, of course, random error. In the words of pioneering biostatistician Bernard Cornfield, ‘randomization by itself is insufficient. We must indicate the specific variables we wish to control and must devise the specific experimental procedures to control them’ [Citation4].

If randomized study were truly perfect, we would expect consistent results from RCTs addressing the same question. However, contradictory results from RCTs are not uncommon. Ioannides found that 14 (36%) of 39 highly cited RCTs were either contradicted or shown to have a weaker effect in RCTs that attempted to replicate their results [Citation5]. This is not to suggest that the problem of irreproducibility is unique to experimental studies; nonexperimental studies fail this type of replication test at an even greater frequency [Citation5]. The point here is that contradictory results from even highly cited RCTs are not unusual.

I recently published a review of an RCT that was initially designed to determine whether a particular anticoagulant (rivaroxaban) prevented major cardiovascular events in high-risk patients, and whether a particular proton pump inhibitor (PPI) decreased the risk of major bleeding in the anticoagulant users [Citation6]. The trial initially designed to test these questions was sound for its initial purposes. However, data were repurposed to assess whether the PPI randomly assigned in this study was associated with previously unanticipated adverse effects, including whether it caused chronic kidney disease [Citation7]. With respect to chronic kidney disease, the study failed on several accounts: it lacked a case definition for chronic kidney disease, kidney function was unmonitored during the trial, prevalent cases were not excluded before the trial was begun, prior PPI use (before the trial) and over-the-counter PPI use (during the trial) were unconsidered, discontinuation of the study drug during follow-up was common, and duration of follow-up was limited (median follow-up 3 years) [Citation8]. These limitations reduced the utility of the data for testing whether PPIs cause chronic kidney disease, even though PPI use had been randomized at the start of the trial.

As noted earlier, even high impact RCTs are not immune to error. An insightful article with the provocative title Why All Randomised Controlled Trials Produce Biased Results identified multiple unrecognized biases in the 10 most cited RCTs in the medical literature [Citation9]. Among these limitations were inadequate randomization, initial sample selection bias, failure to collect data on all relevant variables, neglect of contributing factors during follow-up, incomplete blinding, unauthorized use of the study drug in the control group, too small a sample, limitations in quantitative measurements, and failure to verify all study assumptions.

In addition, RCTs regularly include design features that bolster internal validity while limiting generalizability. One such limitation is found with intention-to-treat (ITT) analysis. While ITT provides reassurances to prevent selective follow-up, it also minimizes differences in observed effects, biasing results toward the null. Moreover, RCTs are conducted under highly controlled conditions in specialty clinics within narrowly defined patient populations with relatively simple disease. That is, trials usually have admissibility criteria that exclude patients with specific comorbidities that may accompany the condition in real clinical situations. While RCT admissibility criteria may increase the rigor and internal validity of the study, they can also limit external validity [Citation10].

Of course, we must also consider that for many questions, RCTs are not feasible, practical, or timely. Such is the case when the adverse event in question is serious, rare, and occurs only after an extended induction period. And, of course, there are many situations for which it would be unethical to randomly expose participants to potential harms. RCTs can only be ethically pursued when there is balanced doubts (‘equipoise’) as to risks and benefits. Thus, many questions of drug safety are just beyond the reach of RCTs. Under these circumstances, retrospective cohort studies and case-control studies may provide viable alternatives to RCTs.

2.3. Studies of intended effects (efficacy studies) vs. studies of unintended effects (safety studies)

The extent to which randomization is required in a study depends on many factors. One such factor is whether the study addresses intended effects (efficacy) or unintended effects (safety). This issue was clarified in a 1983 landmark paper by Miettinen in which he coined the terms ‘confounding by indication’ and ‘confounding by contraindication’ [Citation11]. Confounding by indication/contraindication occurs when therapeutic choices are closely linked to prognostic factors for the outcome. When adverse effects are not widely anticipated, this link does not exist and confounding by indication/contraindication is unlikely [Citation12].

Even when there are suspected links between the study drug and an adverse effect, the effects of confounding by indication/contraindication can be weakened if not entirely precluded through observational study methods such as the careful use of active controls, prudent use of study base exclusion criteria, closely matching subjects based on propensity scores, and analytic regression techniques. While RCTs are uniquely superior in evaluating the anticipated benefits of a drug, these advantages are not as extreme when studying adverse reactions. In fact, observational studies may provide more accurate estimates for the incidence of adverse events in actual practice [Citation12].

2.4. RCTs vs. observational studies

Observational studies (cohort and case-control studies) have contributed greatly to our understanding of adverse drug effects. Properly conducted observational studies can even produce quantifications of efficacy that are, on average, similar those of RCTs [Citation13]. Contrary to popular opinion, well-designed cohort and case-control studies do not systematically overestimate the magnitude of associations reported by RCTs [Citation13,Citation14].

But what about situations in which results from observational studies differ from those of RCTs? Can we assume that the RCTs were correct while the observational studies were biased? Evidence suggests otherwise. Consider, for example, results from the randomized Women’s Health Initiative trial on postmenopausal combined hormone use and the observational Nurses’ Health Study on the same topic [Citation15,Citation16]. The former (the trial) found a similar or slightly higher incidence of coronary disease in the treatment group than in the control group. In contrast, the latter (the observational study) found a slightly reduced risk associated with treatment. While it is facile to conclude this difference was due to confounding in the observational study, Hernan and colleagues demonstrated that the discrepancies were largely explained by differences in (a) the distributions of the times since menopause among the participants in the two studies, (b) differences in lengths of follow-up, and (c) the use of ITT in the trial but not the observational study [Citation17]. Thus, the discrepant results were due to differences not directly related to randomization.

2.5. Cohort vs. case-control studies

The standard evidence hierarchy also assumes that cohort studies are superior to case-control studies [Citation2]. This, too, is an oversimplification. Some of this oversimplification can be traced to the non-standardized use of ‘retrospective/prospective’ terminology. I believe this misunderstanding is also do to what Miettinen referred to as the ‘trohoc fallacy’ [Citation18].

A contemporary usage of the term ‘prospective’ refers to studies in which the exposure measurement comes before the disease ascertainment, so that knowledge of disease status cannot influence classification of the exposure [Citation19]. Accordingly, a study can be prospective even when events occurred in the chronologic past, using historical data for instance. Thus, case-control studies can be either prospective or retrospective.

The ‘trohoc fallacy’ assumes that case-control studies are inherently ‘backward looking’ by comparing prior exposures in cases and controls [Citation18]. This is false. A more meaningful conceptualization of the function of the control series in case-control studies is as a stochastic estimate of exposed-to-nonexposed person-time in the underlying study population (incidence density sampling). This permits a meaningful statistical estimate of the rate ratio in the source population – no rare disease assumption required [Citation20]. It also provides a scientific basis for selecting controls in case-control studies. In effect, properly conducted case-control studies provide the same estimates of risk as properly conducted cohort studies.

2.6. Case reports

What of the lowly case report? Because premarketing RCTs are conducted in a limited number of volunteers who are followed for relatively short periods of time, it is not possible to know about all potential adverse effects of a drug before it is marketed. Therefore, many serious adverse drug reactions are discovered only after a drug is released for general consumption. An essential element of discovering previously unrecognized adverse reactions postmarketing is the ‘lowly’ case report. In fact, case reports have provided the primary evidence for approximately two-thirds (66.1%) of withdrawals of medicinal products from the market launched since 1950 [Citation21]. The remainder of withdrawals have relied on systematic reviews (2.1%), randomized studies (8.7%), non-randomized studies (10.5%), and mechanism-based reasoning (12.6%) [Citation21]. These historical percentages may no longer fully reflect current practices because they include information from the 1950s. Note that the milestone RCT on streptomycin for the treatment of pulmonary tuberculosis was not completed until 1948 [Citation22]. Nonetheless, case reports remain an important and necessary element of adverse drug reaction discovery and explanation.

3. Conclusion

Rescher defines oversimplification as ‘simplifying matters beyond the warrant of the functional requirements of the particular situation at hand’ [Citation23]. Throughout this essay, I have tried to make the point that the ‘RCT gold standard’ and its associated hierarchy are oversimplifications that do not serve the functional requirements of drug safety research. This is not to say that there aren’t times when oversimplifications might be appropriate and even necessary. For example, oversimplification may be required in a scientific discipline when it has not yet developed the theories or sophistication to better reflect the particular situation [Citation23]. However, in my opinion, this is not the case in today’s epidemiology, where more sophisticated means of assessing bias exist [Citation24].

An additional example in which oversimplification may be necessary is in teaching situations where it may function as ‘a heuristic way station to more adequate subsequent treatment’ [Citation23]. However, it is my experience that the RCT gold standard paradigm is used beyond the entry-level classroom in venues such as medical journals, the formulation of medical protocols, and in courtrooms. In these practice venues, it is best to ditch the oversimplifying cliché of evidence hierarchies and instead judge each study on its own merits. To do otherwise does a disservice to discourses about drug safety and, ultimately, to patients.

Abbreviations

ITT=

Intention-to-treat;

RCT=

randomized controlled trial;

PPI=

proton pump inhibitor

Declaration of interests

BB Gerstman serves as an expert witness on behalf of the plaintiffs in Proton-Pump Inhibitor Multi-District Litigation 2789, United States District Court, District of New Jersey. He has received no fees for writing this commentary and has not shared this manuscript before its publication. The author has no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.

Reviewer disclosures

Peer reviewers on this manuscript have no relevant financial or other relationships to disclose.

Additional information

Funding

This paper was not funded.

References

  • Cartwright N. Are RCTs the gold standard? BioSocieties. 2007 Mar 01;2(1):11–20. DOI:10.1017/S1745855207005029
  • Ho PM, Peterson PN, Masoudi FA. Evaluating the evidence: is there a rigid hierarchy? Circulation. 2008 Oct 14;118(16):1675–1684. PubMed PMID: 18852378; eng. DOI:10.1161/circulationaha.107.721357
  • Hernán MA, Hernández-Díaz S, Robins JM. Randomized trials analyzed as observational studies. Ann Intern Med. 2013 Oct 15;159(8):560–562. PubMed PMID: 24018844; PubMed Central PMCID: PMCPMC3860874. eng. DOI:10.7326/0003-4819-159-8-201310150-00709
  • Cornfield J. Statistical relationships and proof in medicine (Edited by Ernest Rubin). Am Stat. 1954 Dec 5;8:19–21.
  • Ioannidis JPA. Contradicted and initially stronger effects in highly cited clinical research. JAMA. 2005;294(2):218–228.
  • Eikelboom JW, Connolly SJ, Bosch J, et al. Rivaroxaban with or without aspirin in stable cardiovascular disease. N Engl J Med. 2017 Oct 5;377(14):1319–1330. PubMed PMID: 28844192. DOI:10.1056/NEJMoa1709118
  • Moayyedi P, Eikelboom JW, Bosch J, et al. Safety of proton pump inhibitors based on a large, multi-year, randomized trial of patients receiving rivaroxaban or aspirin gastroenterology. Gastroenterology. 2019;157(3):682–691. PubMed PMID: 31152740; eng. DOI:10.1053/j.gastro.2019.05.056
  • Gerstman BB. Proton pump inhibitors and chronic kidney disease: reevaluating the evidence from a randomized controlled trial. Pharmacoepidemiol Drug Saf. 2021 Sep 9;30(1):4–8. PubMed PMID: 32909330; eng. DOI:10.1002/pds.5101
  • Krauss A. Why all randomised controlled trials produce biased results. Ann Med. 2018 Jun;50(4):312–322. PubMed PMID: 29616838; eng. DOI:10.1080/07853890.2018.1453233
  • Walach H, Loef M. Using a matrix-analytical approach to synthesizing evidence solved incompatibility problem in the hierarchy of evidence. J Clin Epidemiol. 2015 Nov;68(11):1251–1260. PubMed PMID: 26148834; eng. DOI:10.1016/j.jclinepi.2015.03.027
  • Miettinen OS. The need for randomization in the study of intended effects. Stat Med. 1983 Apr;2(2):267–271. PubMed PMID: 6648141. DOI:10.1002/sim.4780020222
  • Vandenbroucke JP. When are observational studies as credible as randomised trials? Lancet. 2004 May 22;363(9422):1728–1731.
  • Concato J, Shah N, Horwitz RI. Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med. 2000 Jun 22;342(25):1887–1892. PubMed PMID: 10861325; PubMed Central PMCID: PMCPMC1557642. eng. DOI:10.1056/nejm200006223422507
  • Papanikolaou PN, Christidi GD, Ioannidis JP. Comparison of evidence on harms of medical interventions in randomized and nonrandomized studies. CMAJ. 2006 Feb 28;174(5):635–641. PubMed PMID: 16505459; PubMed Central PMCID: PMCPMC1389826. eng. DOI:10.1503/cmaj.050873
  • Group for the Women’s Health Initiative Investigators W, WHI. Risks and benefits of estrogen plus progestin in healthy postmenopausal women: principal results from the Women’s Health Initiative randomized controlled trial. JAMA. 2002 Jul 17;288(3):321–333. DOI:10.1001/jama.288.3.321
  • Stampfer M, Colditz G, Willett W, et al. Postmenopausal estrogen therapy and cardiovascular disease. Ten-year follow-up from the nurses’ health study. N Engl J Med. 1991 Sep 12;325(11):756–762.
  • Hernán MA, Alonso A, Logan R, et al. Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease. Epidemiology. 2008;19(6):766–779. PubMed PMID: PMC3731075. DOI:10.1097/EDE.0b013e3181875e61
  • Miettinen OS. The “case-control” study: valid selection of subjects. J Chron Dis. 1985;38(7):543–548. PubMed PMID: 4008595. DOI:10.1016/0021-9681(85)90039-6
  • Rothman KJ, Greenland S, Lash TL. Modern Epidemiology. Third ed. Baltimore: LWW; 2008.
  • Miettinen O. Estimability and estimation in case-referent studies. Am J Epidemiol. 1976;103(2):226–235. PubMed PMID: 0001251836. DOI:10.1093/oxfordjournals.aje.a112220
  • Onakpoya IJ, Heneghan CJ, Aronson JK. Post-marketing withdrawal of 462 medicinal products because of adverse drug reactions: a systematic review of the world literature. BMC Med. 2016 Feb 04;14(1):10. PubMed PMID: PMC4740994. DOI:10.1186/s12916-016-0553-2
  • Medical Research Council. Streptomycin treatment of pulmonary tuberculosis. Br Med J. 1948;2(4582):769–782. PubMed PMID: 18890300; eng. DOI:10.1136/bmj.2.4582.769
  • Rescher N. Oversimplification. Belgrade Philosophical Annual. 2014;27(27):85–91.
  • Savitz DA, Wellenius GA, Trikalinos TA. The problem with mechanistic risk of bias assessments in evidence synthesis of observational studies and a practical alternative: assessing the impact of specific sources of potential bias. Am J Epidemiol. 2019;188(9):1581–1585.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.