ABSTRACT
This paper seeks to raise awareness among educational researchers and practitioners of some significant weaknesses and internal contradictions of randomised control trials (RCTs). Although critiques throughout the years from education scholars have pointed to the detrimental effects of this experimental approach on education practice and values, RCTs are considered the gold standard for assessing the impact of education policies and interventions. By drawing on the approach of immanent critique, we elucidate substantial argumentative gaps between the assumptions and applications – that is, between the theory and reality – of RCTs in empirical research. This kind of analytic exercise complements existing critiques from outside the experimental discourse based on moral and epistemic principles. The present paper, in contrast, contributes to the literature by highlighting internal limitations and contradictions that can be seen by probing the logic espoused by those who are proponents of RCTs. In fleshing out our argument, we seek to encourage more informed and critical engagement by educators, policymakers, and researchers, among other stakeholders, when they are confronted with proposals for education programmes and reforms supported by findings from RCTs.
Acknowledgement
We thank Steve Klees, Hikaru Komatsu and anonymous reviewers for their valuable feedback on earlier versions of our manuscript. Any remaining errors are solely ours.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Notes
1. The question of how large is large enough is subject to debate in the literature. RCT practitioners usually attempt to solve this question by using power calculations, a process that, provided other statistical assumptions, computes a sample size with a ‘specified probability of declaring as significant a particular difference or effect’ (Johnson, Citation1999, p. 767).
2. As Banerjee and Duflo (Citation2009), Nobel laureates in Economic Sciences for their contributions to experimental methods in development economics, put it, ‘[the] fact that the basic experimental results (…) do not depend on the theory for their identification means that a “clean” test of theory (i.e. a test that does not rely on other theories too) may be possible’ (p. 172, emphasis added).
3. It is worth mentioning that RCT users often resort to supplementary techniques to complement matching exercises to minimise selection bias created by attrition. These include, for example, intention to treat analysis, last observation carried forward analysis, multiple imputations or analyses of the worst-case scenario (Negida, Citation2017). However, these all rely on assumptions derived from observed variables. In response, Deaton (Citation2010) asserts that ‘[t]here is nothing wrong with such fixes in principle (…) but their application takes us out of the world of ideal RCTs and back into the world of everyday econometrics and statistics’ (p. 447).
4. From a statistical perspective, we refer to the similarity in the distributions of residuals. Residuals represent the disparity between anticipated outcomes based on the treatment (e.g. the expected impact of an intervention on school performance) and the actual results observed in the data. In simpler terms, residuals can be considered as the difference between econometric predictions and the actual outcomes, meaning that they contain information from all the non-observed factors (omitted from the statistical model) that might also affect the observed result.
5. We recognize that non-parametric tests, instead of t-tests, might help solve some of the problems described below. The problem with this possible solution to the problem is that using non-parametric tests still entails assuming non-normality in the distribution of errors in econometric models which contradicts the assumption underlying randomisation.
6. Tikly’s (Citation2015) reflection on the underlying foundations of the teaching excellence discourse also illustrates this point: ‘In empiricist accounts [as represented by RCT studies] are often implicitly informed by a range of normative assumptions despite claims to the contrary. For example, although the evidence relating to the use of incentives and performance-related pay for teachers is mixed, research into the use of [payment-based] incentives to improve teachers’ performance is encouraged (…) despite a range of evidence that improved teacher motivation is more likely the outcome of multiple causes arising from a number of structural conditions within education systems’ (p. 243).
Additional information
Notes on contributors
Juan David Parra
Juan David Parra is Assistant Professor of the Institute of Education Studies from Universidad del Norte (Colombia). He has participated in several evaluation studies, being one of the precursors of realist evaluation (in education) in Latin America. In 2022 he joined an international consortium led by the University of Notre Dame as a technical advisor of USAID’s Supporting Holistic and Actionable Research in Education (SHARE) initiative to advance education learning priorities in low-middle income countries.
D. Brent Edwards
D. Brent Edwards Jr. is Professor and Graduate Chair in the Department of Educational Foundations at the University of Hawaii. He has published widely on the global governance of education, education policy and political economy, focusing on middle- low-income countries. He is the principal investigator for a three-year $913,000 project funded by the Dubai Cares Foundation entitled, “Crisis Management for Disaster Risk Reduction in Education Systems: Learning from the Elaboration and Integration of Technology-Focused Strategies in El Salvador, Honduras, and Colombia.”