CrossRef citations to date
Methodological Studies

A Recipe for Disappointment: Policy, Effect Size, and the Winner’s Curse

Pages 643-662 | Received 25 Mar 2021, Accepted 31 Mar 2022, Published online: 07 Jun 2022


  • Azevedo, E. M., Deng, A., Montiel Olea, J. L., Rao, J., & Weyl, E. G. (2020). A/B testing with fat tails. Journal of Political Economy, 128(12), 4614–4672. https://doi.org/10.1086/710607
  • Capen, E. C., Clapp, R. V., & Campbell, W. M. (1971). Competitive bidding in high-risk situations. Journal of Petroleum Technology, 23(06), 641–653. https://doi.org/10.2118/2993-PA
  • Cartwright, N., & Hardie, J. (2012). Evidence-based policy: A practical guide to doing it better. Oxford University Press.
  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates.
  • Connolly, P., Biggart, A., Miller, S., O’Hare, L., & Thurston, A. (2017). Using randomised controlled trials in education. Sage.
  • Education Endowment Foundation. (2019). Classification of the security of findings from EEF evaluations.
  • Fuchs, L. S., Newman-Gonchar, R., Schumacher, R., Dougherty, B., Bucka, N., Karp, K. S., Woodward, J., Clarke, B., Jordan, N. C., Gersten, R., Jayanthi, M., Keating, B., & Morgan, S. (2021). Assisting Students Struggling with Mathematics: Intervention in the Elementary Grades (WWC 2021006). National Center for Education Evaluation and Regional Assistance (NCEE), Institute of Education Sciences, U.S. Department of Education. http://whatworks.ed.gov/.
  • Gelman, A., & Carlin, J. (2014). Beyond power calculations: Assessing type S (sign) and type M (magnitude) errors. Perspectives on Psychological Science: A Journal of the Association for Psychological Science, 9(6), 641–651.
  • Gelman, A., & Loken, E. (2014). The statistical crisis in science. American Scientist, 102(6), 460. https://doi.org/10.1511/2014.111.460
  • Hanley, P., Chambers, B., & Haslam, J. (2016). Reassessing RCTs as the “gold standard”: Synergy not separatism in evaluation designs. International Journal of Research & Method in Education, 39(3), 287–298. https://doi.org/10.1080/1743727X.2016.1138457
  • Hattie, J. (2009). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. Routledge.
  • Haynes, L., Goldacre, B., & Torgerson, D. (2012). Test, learn, adapt: Developing public policy with randomised controlled trials. Cabinet Office-Behavioural Insights Team.
  • Hedges, L. V. (1982). Estimation of effect size from a series of independent experiments. Psychological Bulletin, 92(2), 490–499. https://doi.org/10.1037/0033-2909.92.2.490
  • Higgins, S. (2018). Improving Learning: Meta-analysis of Intervention Research in Education. Cambridge University Press.
  • Hodgen, J., Foster, C., Marks, R., & Brown, M. (2018). Evidence for review of mathematics teaching: Improving mathematics in key stages two and three. Education Endowment Foundation. https://educationendowmentfoundation.org.uk/public/files/Publications/Maths/EEF_Maths_Evidence_Review.pdf.
  • Husain, F., Wishart, R., Marshall, L., Frankenberg, S., Bussard, L., Chidley, S., Hudson, R., Votjkova, M., & Morris, S. (2018). Family skills: Evaluation report and executive summary. Education Endowment Foundation.
  • Jerrim, J. (n.d.). Chess in school: Protocol. Education Endowment Foundation.
  • Kirk, R. E. (2001). Promoting good statistical practices: Some suggestions. Educational and Psychological Measurement, 61(2), 213–218. https://doi.org/10.1177/00131640121971185
  • Lord, P., Bradshaw, S., Stevens, E., & Styles, B. (2015). Perry Beeches coaching programme: Evaluation report and executive summary. Education Endowment Foundation.
  • Lortie-Forgues, H., & Inglis, M. (2019). Rigorous large-scale educational RCTS are often uninformative: Should we be concerned? Educational Researcher, 48(3), 158–166. https://doi.org/10.3102/0013189X19832850
  • McNally, S. (2014). Hampshire Hundreds: Evaluation report and executive summary. Education Endowment Foundation.
  • Merrell, C., & Kasim, A. (2015). Butterfly phonics: Evaluation report and executive summary. Education Endowment Foundation.
  • Prentice, D. A., & Miller, D. T. (1992). When small effects are impressive. Psychological Bulletin, 112(1), 160–164. https://doi.org/10.1037/0033-2909.112.1.160
  • Raudenbush, S. W., & Bryk, A. S. (1985). Empirical Bayes meta-analysis. Journal of Educational Statistics, 10(2), 75–98. https://doi.org/10.3102/10769986010002075
  • Robinson-Smith, L., Merrell, C., Stothard, S., & Torgerson, C. (n.d.). EasyPeasy trial protocol. Education Endowment Foundation.
  • Ruiz-Primo, M. A., Shavelson, R. J., Hamilton, L., & Klein, S. (2002). On the evaluation of systemic science education reform: Searching for instructional sensitivity. Journal of Research in Science Teaching, 39(5), 369–393. https://doi.org/10.1002/tea.10027
  • Simpson, A. (2017). The misdirection of public policy: Comparing and combining standardised effect sizes. Journal of Education Policy, 32(4), 450–466. https://doi.org/10.1080/02680939.2017.1280183
  • Sloan, S., Gildea, A., Miller, S., & Thurston, A. (2018). Zippy’s Friends: Evaluation report and executive summary. Education Endowment Foundation.
  • Speckesser, S., Runge, J., Foliano, F., Bursnall, M., Hudson-Sharp, N., Rolfe, H., & Anders, J. (2018). Embedding formative assessment: Evaluation report and executive summary. Education Endowment Foundation.
  • Stokes, L., Hudson-Sharp, N., Dorsett, R., Rolfe, H., Anders, J., George, A., Buzzeo, J., & Munro-Lott, N. (2018). Mathematical reasoning: Evaluation report and executive summary. Education Endowment Foundation.
  • Terrin, N., Schmid, C. H., Lau, J., & Olkin, I. (2003). Adjusting for publication bias in the presence of heterogeneity. Statistics in Medicine, 22(13), 2113–2126.
  • Torgerson, D., & Torgerson, C. (2008). Designing randomised trials in health, education and the social sciences: An introduction. Palgrave Macmillan.
  • van Zwet, E. W., & Cator, E. A. (2021). The significance filter, the winner’s curse and the need to shrink. Statistica Neerlandica, 75(4), 437–452. https://doi.org/10.1111/stan.12241
  • van Zwet, E., & Gelman, A. (2022). A proposal for informative default priors scaled by the standard error of estimates. The American Statistician, 76(1), 1–9. https://doi.org/10.1080/00031305.2021.1938225
  • van Zwet, E., Schwab, S., & Senn, S. (2021). The statistical properties of RCTs and a proposal for shrinkage. Statistics in Medicine, 40(27), 6107–6117. https://doi.org/10.1002/sim.9173
  • Vignoles, A., Jerrim, J., & Cowan, R. (2015). Mathematics mastery: Primary evaluation report. Education Endowment Foundation.
  • Young, K., Ashby, D., Boaz, A., & Grayson, L. (2002). Social science and the evidence-based policy movement. Social Policy and Society, 1(3), 215–224. https://doi.org/10.1017/S1474746402003068