1,282
Views
8
CrossRef citations to date
0
Altmetric
Articles

What is meant by “rigour” in evidence-based educational policy and what’s so good about it?

Pages 63-80 | Published online: 02 Jun 2019
 

ABSTRACT

Across the evidence-based policy and practice (EBPP) community, including education, randomised controlled trials (RCTS) rank as the most “rigorous” evidence for causal conclusions. This paper argues that that is misleading. Only narrow conclusions about study populations can be warranted with the kind of “rigour” that RCTs excel at. Educators need a great deal more information to predict if a programme will work for their pupils. It is unlikely that that information can be obtained with EBPP-style rigour. So, educators should not be overly optimistic about success with programmes that have been “rigorously” tested. I close with a plea to the EBPP community to take on the job of identifying and vetting the information educators need in practice.

Disclosure statement

No potential conflict of interest was reported by the author.

Notes

1. When it comes to evaluating particular studies, some sites will lower the evaluation given for studies that employed higher up designs (like RCTs) but conducted them badly.

2. Clearly this is not the case in all contexts. After all, the fourth meaning in the Compact Oxford English Dictionary (2003) for “rigorous” is “harsh or severe”, as in “rigorous imprisonment” or “a rigorous system of fasts”. In the context of education, the use of the term in the US Common Core State Standards has been criticised for just this connotation (e.g., Yatvin, Citation2012).

3. Where of course what is “right” depends on the kind of study and the kind of result, and nothing is ever established with certainty.

4. There are also further sources of confounding that are routinely taken into account, like differential dropout. What is not, and cannot be, taken account of are the causal factors that are unknown. Also, there are often so many sources of confounding possible that we have trouble controlling even for those that are known, at least known in the abstract. For an analogy, consider advice in a physics experiment control for all sources of force other than the one under study. But just what in this experimental setting are those?

5. Or differ in whatever systematic way the inference entails.

6. Sites do suggest caveats, for instance, assuming the claim is restricted to certain domains (often only implicitly specified) or holds “ceteris paribus” (as is typical with all but the most fundamental claims).

7. See, for example, the text Meta-Analysis: A Comparison of Approaches by Ralph Schulze (Citation2004), which explains “[the rise in the interest in meta-analysis] was associated with calls for more procedural and statistical rigor in the preparation of literature reviews. It is this rigor that still most prototypically marks the difference between traditional reviews and meta-analysis” (p. 4). Also meta-analyses and systematic reviews generally appear at the very top of ranking schemes for “quality of evidence”; Cochrane Collaboration meta-analyses and systematic reviews are widely credited with adhering to high standards of methodological rigor; and so forth.

8. Or, more carefully, that the treatment is orthogonal to both the net effect of all the factors that interact with it in producing the outcome plus the net effect of all those causes that act separately from it.

9. An anonymous referee wishes to add “the rationales and assumptions underpinning the approach(es) to be used”. One can indeed sensibly decide to count these as part of what characterises a method. But these will not play any role in the discussion here so I will not lay these out for RCTs. The referee also suggests adding to the procedure random sampling from a target population. This would class a narrower set of experiments as RCTs. My own view is to take these to be different but nested methods. I would not want to restrict “RCT” to the narrower set since then most education experiments would not count, nor most in most other areas of EBPP.

10. In education, for instance, there is the WWC (Citation2005) Key Items To Get Right When Conducting a Randomized Controlled Trial in Education, and in the UK, the National Foundation for Educational Research’s A Guide to Running Randomised Controlled Trials for Educational Researchers (Hutchison & Styles, Citation2010) or David Torgerson and Carole Torgerson’s Designing Randomised Trials in Health, Education and the Social Sciences: An introduction (Citation2008).

11. For more on this, see Deaton and Cartwright (Citation2018).

12. Of course, the procedures we use for assignment may not genuinely “randomise”. Also, recall, we are here talking about orthogonality at baseline, not balance.

13. Made up to avoid the almost inevitable controversies that surround real cases I might cite as examples.

14. As the BMJ Best Practice EBM Toolkit describes, “GRADE (Grading of Recommendations, Assessment, Development and Evaluations) … the most widely adopted tool for grading the quality of evidence and for making recommendations with over 100 organisations worldwide officially endorsing GRADE” (https://bestpractice.bmj.com/info/toolkit/learn-ebm/what-is-grade/).

15. Though, note the discussion there focuses on “methodological limitations of a study, imprecision, inconsistency and indirectness” with examples of widely acknowledged types, like failures of “allocation concealment and blinding”, not other kinds of on-the-ground differences that we might worry about in real educational trials.

16. To put this a bit more carefully: no probabilistic dependence with the net effect of the other causal factors.

17. For instance, in education, see Harris and Jones (Citation2018) or Keddie (Citation2019). For a book-length discussion in another domain (international HIV-AIDs policy), see Seckinelgin (Citation2017). For an account of what must hold for the same ATE to obtain in different populations, see Cartwright and Hardie (Citation2012), and for a more general account of transportability of causal results see Bareinboim and Pearl (Citation2012, Citation2013).

18. A discussion of this can be found in the article “Bayesian Epistemology” in The Stanford Encyclopedia of Philosophy (Talbott, Citation2008).

19. See the Stanford Encyclopedia of Philosophy article “Confirmation theory” (Cupri, Citation2015), for discussion.

20. For discussion, see the Stanford Encyclopedia of Philosophy article “Karl Popper” (Thornton, Citation2018).

21. Or an agglomerated average across a collection of study populations.

22. See Mill (1836/Citation1967, pp. 309–340) and Mill (1843/Citation1973).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 235.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.