ABSTRACT
Effect size is the basis of much evidence-based education policymaking. In particular, it is assumed to measure the educational effectiveness of interventions. Policy is being driven by the influential work of John Hattie, the Education Endowment Foundation, and others, which is grounded in this assumption. This article demonstrates the assumption is false and notes that, when criticized, proponents either attempt to inoculate themselves by listing (without checking) assumptions or use the specious reasoning that, however flawed their argument, no-one has disproved their conclusions.
KEYWORDS:
Disclosure statement
No potential conflict of interest was reported by the author.
ORCID
Adrian Simpson http://orcid.org/0000-0002-3796-5506
Notes
1. Throughout this paper, “effect size” will stand for “standardised effect size” (as it does in the literature cited). While not exempt from all critique, raw effect size (where, e.g., differences in scores are not scaled by a measure of variance) is not subject to some of the most serious problems noted here.
2. The argument from some meta-analysts is that all of the different design influences will “wash out” when a large number of studies gets combined. But this relies on the obviously absurd assumption that studies are drawn independently and at random with the same distribution from a population of those design decisions: control activities, intervention-testing intervals, dosages, and so forth. As if, for example, tests used in studies are selected at random from a population of possible tests so that the number of open-answer versus multiple-choice tests “balances out” in some way (see Berk, Citation2007, Citation2011).
3. Cheung and Slavin (Citation2016) noted that, across their sample of studies, researcher-designed measures resulted in effect sizes on average twice the size of those from studies with independently designed measures. The reason for this follows directly from the thought experiment here: Researchers can (and do) reduce noise by designing tests which target only the impact of the intervention; standardised tests will not do so. Though, of course, researchers can (and do) still select standardised tests to reduce noise and amplify the signal.