5,009
Views
0
CrossRef citations to date
0
Altmetric
Editorials

Response to “Why all randomized controlled trials produce biased results”

, , ORCID Icon &
Pages 545-548 | Received 19 Jun 2018, Accepted 16 Aug 2018, Published online: 12 Sep 2018

In April 2018, the Annals of Medicine published a provocative piece titled “Why all randomized controlled trials produce biased results” [Citation1]. While the strengths and limitations of RCTs remain a fertile discussion topic, the piece in question leaves much to be desired. In many places, the piece ignores prior literature that explains and/or refutes several key points. Absent a formal response, this piece may create confusion to understanding of RCTs in the research community.

The piece begins by stating that trials are perceived to be exempt from strong theoretical assumptions, methodological biases, and the influence of researchers. Indeed, many researchers and consumers have limited understanding of the complexities in trial design and analyses. However, considering the thousands of peer-reviewed manuscripts, dozens of textbooks, semester-long courses and even professional societies dedicated to various aspects of trial design, conduct, and analysis, this statement also trivializes the existence of an entire subfield of methodological research and innovation in clinical trials. Furthermore, trials that have been published for long enough collect large numbers of citations also would have occurred before modern innovations in trial design and reporting, making the “10 most cited trials of all time” a poor choice to comprehensively assess quality of trial design, conduct, reporting, and the degree of bias present in trial results.

The piece states that the simple-treatment-at-the-individual-level limitation is a constraint of RCTs not yet thoroughly discussed and notes that randomization is infeasible for many scientific questions. This, however, is not relevant to the claim that all RCTs produce biased results; it merely suggests that we should not use randomized controlled trials for questions where they are not applicable. Furthermore, the piece states that randomized trials cannot generally be conducted in cases with multiple and complex treatments or outcomes simultaneously that often reflect the reality of medical situations. This statement ignores a great deal of innovation in trial designs, including some very agile and adaptable designs capable of evaluating multiple complex treatments and/or outcomes across variable populations.

Some of these innovations include Bayesian statistical approaches which allow continuous monitoring, adaptive-enrichment designs that enable trial populations to evolve based on early feedback, pragmatic trials, platform trials, and designs which permit analysis of multiple dynamic or complex endpoints [Citation2–7]. Consider the I-SPY 2 trial [Citation8], which allows treatment combinations to be added, dropped, and combined with others while being tested simultaneously across different target populations (defined by biomarkers), graduating to the next stage when the Bayesian predictive probability of success exceeds the prespecified threshold. The STAMPEDE trial [Citation9–14] in prostate cancer is another example of a multiarm, multistage trial evaluating multifactorial treatment combinations in prostate cancer. For an even more ambitious proposal that would fuze several of these principles into a single trial, consider the JAMA article by Angus [Citation15] proposing the randomized, embedded, multifactorial adaptive platform (REMAP) trial. These topics have been discussed in such widely read journals as JAMA, Lancet and the New England Journal of Medicine [Citation16] as well as the more topical Trials, Clinical Trial, and Contemporary Clinical Trials.

Despite the assertion that the achieving-good-randomization assumption is a foundational and strong assumption of RCTs, there is no requirement for baseline balance in all covariates to have a valid statistical inference from the trial. This myth has been discussed by accomplished statisticians such as Altman [Citation17], Senn [Citation18–21], Zhao and Berger [Citation22] repeatedly over several decades. Allowing this statement to stand unchallenged merely perpetuates the myth of balance as a necessary condition for valid inference in RCTs.

Paraphrasing the Zhao and Berger paper, under proper random treatment assignment, distributions of all baseline covariates among treatment groups are random. Therefore, random baseline covariate imbalances and random treatment assignment must be accepted or rejected together. It is important to remember that the fundamental goal of randomization in clinical trials is preventing selection bias; this should not be undermined by excessively forced balancing of baseline covariates among treatment groups. As Senn has discussed previously, the standard probability calculations applied to clinical trial results already make an allowance for the fact that the treatment groups will almost certainly be imbalanced.

The piece pursues the balance question further with an untenable suggestion that researchers should create multiple randomization schedules and, if the first randomization does not yield a balanced allocation, simply randomize again to achieve better balance. Even if we suspend the debate over balance as a required condition for valid inference in RCTs, this proposal is not practical for the majority of RCTs in medicine, in which most trials require a treatment decision at a specific point in the disease course (e.g. patients admitted to the hospital having a stroke or heart attack, patients that are being treated for acute dehydration, patients with active cancer that need to begin therapy to retard progression of disease, pregnant women). Several trials mentioned in the original piece illustrate this clearly. The trial of acute ischemic stroke patients [Citation23] began enrolling in January 1991 and completed enrollment over three years later; sadly, acute stroke is not a condition that can wait three years to give out the randomization assignments. The trial of intensive insulin therapy in critically ill patients [Citation24] began enrolling in February 2000 and completed enrollment in January 2001 after an interim analysis suggested early stopping for efficacy, no less – another subject of which the piece fails to acknowledge and for which the get the full patient cohort and randomize them all before doing anything approach has no solution. The trial of colorectal cancer [Citation25] enrolled from September 2000 to May 2002, stopped at an interim analysis that stopped enrollment in one of the treatment arms for safety reasons). In none of those settings would it have been possible to enroll the entire cohort, check for balance, and then re-randomize the patients.

Finally, the piece failed to acknowledge much prior discussion of strategies to achieve balance, including several proposals that would be feasible in the clinical situations discussed above. Techniques such as stratified randomization [Citation26], minimization [Citation27,Citation28] and covariate-adaptive randomization [Citation29–39] have been published and discussed for many years as options to enforce some degree of balance on selected covariates if/when that is considered essential. In particular, consideration of the literature on covariate-adaptive randomization is essential for any discussion of covariate balance and its implication in RCTs.

The piece appears to suggest that the solution to selection bias in RCTs is just making sure that everyone involved with the trial is sufficiently blinded. As previously noted in Senn’s 2013 article, full blinding is achieved with the help of randomization; it is not clear how blinding everyone involved with the trial is even possible without randomly assigning the treatments. The piece incorrectly says that small trials are more likely to be biased than large trials. The first confusing element is the focus on total enrollment; as any trialist knows, total enrollment is not the only design factor that determines whether a trial is sufficiently large for precise estimates of treatment effect. The distribution of primary outcome and effect size are also critical (and arguably more so) in determining whether a trial is large enough to answer its primary question. Some trials may be adequately sized with 200 patients, whereas other may be inadequately sized even with 2,000 patients.

Moreover, the problem is not actually a problem of bias but rather a problem of precision. Large trials can provide a more precise estimate of the main effect(s) being tested; smaller trials run the risk of being underpowered and thus unable to make a precise statement about whether the main effect size(s) of interest are statistically or clinically significant. Here, a statement about trial results merits clarification: “An example is that the stroke trial with 624 participants in total reports that at three months after the stroke, 54 treated patients died compared to 64 placebo patients – with the main outcome thus being just a difference of 10 deaths.” While technically true – 54 is 10 less than 64 – it is a poor representation of the purpose of statistical analyses performed on the results of RCTs. The very reason for performing probability calculations with RCT results is to determine how likely the observed difference(s) were under an assumed null hypothesis that there was no difference between the treatments. The difference in the absolute number of events in each arm is only relevant insofar as it informs this calculation. It is very possible for a small absolute difference in the number of events to be relatively strong evidence for the superiority of one treatment. The probability is the relevant statistic, not the absolute number of events.

The piece discusses the unique time period assessment bias to mean that most researchers only collect baseline and endline data points and assess one average outcome instead of another. This is, in many cases, incorrect, since there are many different types of trial outcomes: one may analyze yes/no survived to hospital discharge outcomes, time-to-event outcomes for longer-term survival or freedom from some event, and continuous variables that are often collected as repeated-measures such as changes in blood pressure, cholesterol, or functional capacity. One obvious example of trials that do this are Phase 2 trials of lipid-lowering medications, which typically collect far more than just baseline and endline data points, but rather collect lab measurements repeatedly over the study period to get an accurate picture of the change in lipid profile over time [Citation40].

The piece criticizes trials having unequal follow-up time for participants, but there are three important considerations that were not acknowledged: (a) in medical trials this is often a necessary condition to carry out the trial – otherwise, there would be no way to assess time-to-event or survival outcomes; (b) there are statistical approaches designed to account for this (survival analyses and time offsets being the most obvious examples) and (c) there is no reason to believe this impedes valid inference as long as an appropriate statistical approach is used. For example, in survival analyses, neither Kaplan-Meier curves nor Cox models are disturbed by unequal follow-up time across treatment arms.

In discussion about the background-traits-remain-constant assumption the piece ignores the extensive literature and historical discussion of mediation analyses [Citation41–47] in RCTs; statistical modeling may be applied to estimate the direct effect of treatment versus other indirect effects acting through intermediate variables measured post-randomization. Whether this is advisable remains a subject of debate among statisticians.

The average treatment effects limitation is another much-discussed issue. Concerns about this are typically overblown as estimates of relative risk, odds, or hazard are generally transportable across subgroups (within the same basic treatment population, that is; we are not referring to nonsense scenarios like performing a heart transplant to improve survival in a patient that needs a hip replacement). Furthermore, as discussed earlier in this article, modern innovations in trial design (such as Bayesian adaptive-enrichment designs) allow the trialist to use ongoing data to determine which patients appear to benefit the most and/or those in whom therapy is futile or even harmful if there is significant evidence of a heterogeneous treatment effect.

We are not intended to suggest that RCTs are unimpeachable. Quite the opposite: RCTs must be planned with careful consideration of the requisite assumptions, monitored with extreme rigor, and analyzed properly to ensure valid statistical inference from the results. We also acknowledge that when RCTs are impractical or unavailable, we must utilize non-RCT evidence to support decisions and draw conclusions about the world around us. We simply prefer that discussion of RCTs be adequately informed and concentrate on legitimate questions and solutions. We hope that this leads to better-informed discussion of true strengths and weaknesses of RCT’s in practice.

Disclosure statement

No potential conflict of interest was reported by the authors.

References

  • Krauss A. Why all randomised controlled trials produce biased results. Ann Med. 2018;50:312–322.
  • Greenhouse JB, Wasserman L. Robust Bayesian methods for monitoring clinical trials. Stat Med. 1995;14:1379–1391.
  • Zhou X, Liu S, Kim ES. Bayesian adaptive design for targeted therapy development in lung cancer—a step toward personalized medicine. Clin Trials. 2008;5:181–193.
  • Simon N, Simon R. Adaptive enrichment designs for clinical trials. Biostatistics. 2013;14:613–625.
  • Saville BR, Connor JT, Ayers GD, et al. The utility of Bayesian predictive probabilities for interim monitoring of clinical trials. Clin Trials. 2014;11:485–493.
  • Berry SM, Connor JT, Lewis RJ. The platform trial: an efficient strategy for evaluating multiple treatments. JAMA. 2015;313:1619–1620.
  • Saville BR, Berry SM. Efficiencies of platform clinical trials: a vision of the future. Clin Trials. 2016;13:358–366.
  • Harrington D, Parmigiani G. I-SPY 2-A glimpse of the future of phase 2 drug development? N Engl J Med. 2016;375:7–9.
  • Sydes MR, Parmar MK, James ND, et al. Issues in applying multi-arm multi-stage methodology to a clinical trial in prostate cancer: the MRC STAMPEDE trial. Trials. 2009;10:39.
  • Sydes MR, Parmar MK, Mason MD, et al. Flexible trial design in practice - stopping arms for lack-of-benefit and adding research arms mid-trial in STAMPEDE: a multi-arm multi-stage randomized controlled trial. Trials. 2012;13:168.
  • Attard G, Sydes MR, Mason MD, et al. Combining enzalutamide with abiraterone, prednisone, and androgen deprivation therapy in the STAMPEDE trial. Eur Urol. 2014;66:799–802.
  • James ND, Sydes MR, Clarke NW, et al. Addition of docetaxel, zoledronic acid, or both to first-line long-term hormone therapy in prostate cancer (STAMPEDE): survival results from an adaptive, multiarm, multistage, platform randomised controlled trial. Lancet. 2016;387:1163–1177.
  • Parmar MK, Sydes MR, Cafferty FH, et al . Testing many treatments within a single protocol over 10 years at MRC Clinical Trials Unit at UCL: multi-arm, multi-stage platform, umbrella and basket protocols. Clin Trials. 2017;14:451–461.
  • Sydes MR, Spears MR, Mason MD, et al. Adding abiraterone or docetaxel to long-term hormone therapy for prostate cancer: directly randomised data from the STAMPEDE multi-arm, multi-stage platform protocol. Ann Oncol. 2018;29:1235–1248.
  • Angus DC. Fusing randomized trials with big data: the key to self-learning health care systems?. JAMA. 2015;314:767–768.
  • Bhatt DL, Mehta C. Adaptive designs for clinical trials. N Engl J Med. 2016;375:65–74.
  • Altman DG. Comparability of randomised groups. The Statistician. 1985;34:125–136.
  • Senn SJ. Covariate imbalance and random allocation in clinical trials. Stat Med. 1989;8:467–475.
  • Senn SJ. Testing for baseline balance in clinical trials. Stat Med. 1994;13:1715–1726.
  • Senn S . Controversies concerning randomization and additivity in clinical trials. Stat Med. 2004;23:3729–3753.
  • Senn SJ. Seven myths of randomisation in clinical trials. Stat Med. 2013;32:1439–1450.
  • Zhao W, Berger V. Imbalance control in clinical trial subject randomisation – from philosophy to strategy. J Clin Epidemiol 2018;101:116–118.
  • Marler J. Tissue plasminogen activator for acute ischemic stroke. N Engl J Med. 1995;333:1581–1588.
  • Van Den Berghe G, Wouters P, Weekers F, et al. Intensive insulin therapy in critically ill patients. N Engl J Med. 2001;345:1359–1367.
  • Hurwitz H, Fehrenbacher L, Novotny W, et al. Bevacizumab plus irinotecan, fluorouracil, and leucovorin for metastatic colorectal cancer. N Engl J Med. 2004;350:2335–2342.
  • Zhao W, Weng Y. Block urn design – a new randomisation algorithm for sequential trials with two or more treatments and balanced or unbalanced allocation. Contemp Clin Trials. 2011;32:953–961.
  • Taves DR. Minimization: a new method of assigning patients to treatment and control groups. Clin Pharmacol Ther. 1974;15: 443–453.
  • Pocock SJ, Simon R. Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial. Biometrics. 1975;31:103–115.
  • Ciolino J, Zhao W, Martin R, et al. Quantifying the cost in power of ignoring continuous covariate imbalances in clinical trial randomisation. Contemp Clin Trials. 2011;32:250–259.
  • Hoehler FK. Balancing allocation of subjects in biomedical research: a minimization strategy based on ranks. Comput Biomed Res. 1987;20:209–213.
  • Stigsby B, Taves DR. Rank-minimization for balanced assignment of subjects in clinical trials. Contemp Clin Trials. 2010;31:147–150.
  • Frane JW. A method of biased coin randomisation, its implementation, and its validation. Drug Inf J. 1998;32:423–432.
  • Endo A, Nagatani F, Hamada C, et al. Minimization method for balancing continuous prognostic variables between treatment and control groups using Kullback-Leibler divergence. Contemp Clin Trials. 2006;27:420–431.
  • Su Z. Balancing multiple baseline characteristics in randomised clinical trials. Contemp Clin Trials. 2011;32:547–550.
  • Lin Y, Su Z. Balancing continuous and categorical baseline covariates in sequential clinical trials using the area between empirical cumulative distribution functions. Statist Med. 2012;31:1961–1971.
  • Ma Z, Hu F. Balancing continuous covariates based on kernel densities. Contemp Clin Trials. 2013;34:262–269.
  • Soares JF, Wu CFJ. Some restricted randomization rules in sequential designs. Commun Stat Theor Methods. 1983;12:2017–2034.
  • Chen YP. Biased coin design with imbalance tolerance. Communicat Stat Stoch Models. 1999;15:953–975.
  • Berger VW, Ivanova A, Deloria-Knoll M. Minimizing predictability while retaining balance through the use of less restrictive randomization procedures. Statist Med. 2003;22:3017–3028.
  • Baruch A, Mosesova S, Davis JD, et al. Effects of RG7652, a monoclonal antibody against PCSK9, on LDL-C, LDL-C subfractions, and inflammatory biomarkers in patients at high risk of or with established coronary heart disease (from the Phase 2 EQUATOR Study). Am J Cardiol. 2017;119:1576–1583.
  • Robins J. Correcting for non-compliance in randomised trials using structural nested mean models. Commun Statist Theor Methods.1994;23:2379–2412.
  • Ten Have T, Joffe M, Cary M. Causal logistic models for non-compliance under randomised treatment with univariate binary response. Statist Med. 2003;22:1255–1284.
  • Robins J, Rotnitzky A. Estimation of treatment effects in randomised trials with non-compliance and a dichotomous outcome using structural mean models. Biometrika. 2004;91:763–783.
  • Ten Have T, Joffe M, Lynch K, et al. Causal mediation analyses with rank preserving models. Biometrics. 2007;63:926–934.
  • Lynch KG, Cary M, Gallop R, et al . Causal mediation analyses for randomized trials. Health Serv Outcomes Res Methodol. 2008;8:57–76.
  • Kraemer HC, Wilson GT, Fairburn CG, et al . Mediators and moderators of treatment effects in randomized clinical trials. Arch Gen Psychiatry. 2002;59:877–883.
  • Emsley R, Dunn G, White IR. Mediation and moderation of treatment effects in randomised controlled trials of complex interventions. Stat Methods Med Res. 2010;19:237–270.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.