258
Views
0
CrossRef citations to date
0
Altmetric
Articles

On the Accuracy of Replication Failure Rates

ORCID Icon

References

  • Anderson, S. F., & Maxwell, S. E. (2016). There's more than one way to conduct a replication study: Beyond statistical significance. Psychological Methods, 21(1), 1–12. https://doi.org/10.1037/met0000051
  • Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature, 533(7604), 452–454. https://doi.org/10.1038/533452a
  • Bakker, M., Hartgerink, C. H. J., Wicherts, J. M., & van der Maas, H. L. J. (2016). Researchers' Intuitions about power in psychological research. Psychological Science, 27(8), 1069–1077. https://doi.org/10.1177/0956797616647519
  • Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E.-J., Berk, R., Bollen, K. A., Brembs, B., Brown, L., Camerer, C., Cesarini, D., Chambers, C. D., Clyde, M., Cook, T. D., De Boeck, P., Dienes, Z., Dreber, A., Easwaran, K., Efferson, C., … Johnson, V. E. (2018). Redefine statistical significance. Nature Human Behaviour, 2(1), 6–10. https://doi.org/10.1038/s41562-017-0189-z
  • Bonett, D. G. (2021). Design and analysis of replication studies. Organizational Research Methods, 24(3), 513–529. https://doi.org/10.1177/1094428120911088
  • Brandt, M. J., IJzerman, H., Dijksterhuis, A., Farach, F. J., Geller, J., Giner-Sorolla, R., Grange, J. A., Perugini, M., Spies, J. R., & van't Veer, A. (2014). The replication recipe: What makes for a convincing replication?. Journal of Experimental Social Psychology, 50, 217–224. https://doi.org/10.1016/j.jesp.2013.10.005
  • Button, K. S., Ioannidis, J. P., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., & Munafò, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews. Neuroscience, 14(5), 365–376. https://doi.org/10.1038/nrn3475
  • Camerer, C. F., Dreber, A., Forsell, E., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Almenberg, J., Altmejd, A., Chan, T., Heikensten, E., Holzmeister, F., Imai, T., Isaksson, S., Nave, G., Pfeiffer, T., Razen, M., & Wu, H. (2016). Evaluating replicability of laboratory experiments in economics. Science (New York, N.Y.), 351(6280), 1433–1436. https://doi.org/10.1126/science.aaf0918
  • Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Nave, G., Nosek, B. A., Pfeiffer, T., Altmejd, A., Buttrick, N., Chan, T., Chen, Y., Forsell, E., Gampa, A., Heikensten, E., Hummer, L., Imai, T., … Wu, H. (2018). Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour, 2(9), 637–644. https://doi.org/10.1038/s41562-018-0399-z
  • Cheng, Y., Gao, D., & Tong, T. (2015). Bias and variance reduction in estimating the proportion of true-null hypotheses. Biostatistics (Oxford, England), 16(1), 189–204. https://doi.org/10.1093/biostatistics/kxu029
  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.) Lawrence Erlbaum Associates.
  • Collins, H. (1985). Changing order: Replication and induction in scientific practice. Sage Publications.
  • Connor, S. (27 A. (2015). Study reveals that a lot of psychology research really is just ‘psycho-babble’. The Independent.
  • Cooper, H. M., Hedges, L. V., & Valentine, J. (2019). The handbook of research synthesis and meta-analysis (3rd ed.). The Russell Sage Foundation.
  • Cooper, M. L. (2016). Editorial. Journal of Personality and Social Psychology, 110(3), 431–434. https://doi.org/10.1037/pspp0000033
  • Dennis, M. L., Lennox, R. D., & Foss, M. A. (1997). Practical power analysis for substance abuse health services research. In K. J. Bryant The science of prevention: Methodological Advances from Alcohol and Substance Abuse Research. American Psychological Association.
  • Dickersin, K. (1997). How important is publication bias? A synthesis of available data. AIDS Education and Prevention, 9(1 Suppl), 15–21.
  • Edwards, A. W. P. (1960). The meaning of binomial distribution. Nature, 186, 1074. https://doi.org/10.1038/1861074a0
  • Etz, A., & Vandekerckhove, J. (2016). A Bayesian perspective on the reproducibility project. Psychology. PloS One, 11(2), e0149794. https://doi.org/10.1371/journal.pone.0149794
  • Gilbert, D. T., King, G., Pettigrew, S., & Wilson, T. D. (2016). Comment on "Estimating the reproducibility of psychological science". Science (New York, N.Y.), 351(6277), 1037–1037. https://doi.org/10.1126/science.aad7243
  • Hartgerink, C. H. J., Wicherts, J. M., & van Assen, M. A. L. M. (2017). Too good to be false: Nonsignificant results revisited. Collabra: Psychology, 3(1), 9. https://doi.org/10.1525/collabra.71
  • Hedges, L. V. & Olkin, I. (1985). Statistical Methods for Meta-analysis. New York: Academic Press.
  • Hedges, L. V., & Schauer, J. M. (2019a). More than one replication study is needed for unambiguous tests of replication. Journal of Educational and Behavioral Statistics, 44(5), 543–570. https://doi.org/10.3102/1076998619852953
  • Hedges, L. V., & Schauer, J. M. (2019b). Statistical analyses for studying replication: Meta-analytic perspectives. Psychological Methods, 24(5), 557–570. https://doi.org/10.1037/met0000189
  • Hedges, L. V., & Schauer, J. M. (2021). The design of replication studies. Journal of the Royal Statistical Society: Series A (Statistics in Society), 184(3), 868–886. https://doi.org/10.1111/rssa.12688
  • Hedges, L. V., & Vevea, J. L. (1998). Fixed- and random-effects models in meta-analysis. Psychological Methods, 3(4), 486–504. https://doi.org/10.1037/1082-989X.3.4.486
  • Hsueh, H-m., Chen, J. J., & Kodell, R. L. (2003). Comparison of methods for estimating the number of true null hypotheses in multiplicity testing. Journal of Biopharmaceutical Statistics, 13(4), 675–689. https://doi.org/10.1081/BIP-120024202
  • Hung, K., & Fithian, W. (2020). Statistical methods for replicability assessment. Annals of Applied Statistics, 14(3), 1063–1087. https://doi.org/10.1214/20-aoas1336
  • Jiang, H., & Doerge, R. W. (2008). Estimating the proportion of true null hypotheses for multiple comparisons. Cancer Informatics, 6(25), 25–32.
  • Klein, R. A., Cook, C. L., Ebersole, C. R., Vitiello, C. A., Nosek, B. A., Chartier, C. R., … Ratliff, K. A. (2019). Many Labs 4: Failure to replicate mortality salience effect with and without original author involvement. Retrieved from: https://psyarxiv.com/vef2c.
  • Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B., Bahník, Š., Bernstein, M. J., Bocian, K., Brandt, M. J., Brooks, B., Brumbaugh, C. C., Cemalcilar, Z., Chandler, J., Cheong, W., Davis, W. E., Devos, T., Eisner, M., Frankowska, N., Furrow, D., Galliani, E. M., … Nosek, B. A. (2014). Investigating variation in replicability: A “many labs” replication project. Social Psychology, 45(3), 142–152. https://doi.org/10.1027/1864-9335/a000178
  • Klein, R. A., Vianello, M., Hasselman, F., Adams, B. G., Adams, R. B., Alper, S., Aveyard, M., Axt, J. R., Babalola, M. T., Bahník, Š., Batra, R., Berkics, M., Bernstein, M. J., Berry, D. R., Bialobrzeska, O., Binan, E. D., Bocian, K., Brandt, M. J., Busching, R., … Nosek, B. A. (2018). Many Labs 2: Investigating variation in replicability across samples and settings. Advances in Methods and Practices in Psychological Science, 1(4), 443–490. https://doi.org/10.1177/2515245918810225
  • Laird, N. M., & Mosteller, F. (1990). Some statistical methods for combining experimental results. International Journal of Technology Assessment in Health Care, 6(1), 5–30. https://doi.org/10.1017/s0266462300008916
  • Langaas, M., Lindqvist, B. H., & Ferkingstad, E. (2005). Estimating the proportion of true null hypotheses, with application to DNA microarray data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(4), 555–572. https://doi.org/10.1111/j.1467-9868.2005.00515.x
  • Le Cam, L. (1960). An approximation theorem for the Poisson binomial distribution. Pacific Journal of Mathematics, 10(4), 1181–1197. https://doi.org/10.2140/pjm.1960.10.1181
  • Mathur, M. B., & VanderWeele, T. J. (2019). Challenges and suggestions for defining replication “success” when effects may be heterogeneous: Comment on Hedges and Schauer (2019)). Psychological Methods, 24(5), 571–575. https://doi.org/10.1037/met0000223
  • Mathur, M. B., & VanderWeele, T. J. (2020). New statistical metrics for multisite replication projects. Journal of the Royal Statistical Society: Series A (Statistics in Society), 183(3), 1145–1166. https://doi.org/10.1111/rssa.12572
  • Maxwell, S. E., Lau, M. Y., & Howard, G. S. (2015). Is psychology suffering from a replication crisis? What does “failure to replicate” really mean? The American Psychologist, 70(6), 487–498. https://doi.org/10.1037/a0039400
  • McNutt, M. (2014). Reproducibility. Science (New York, N.Y.), 343(6168), 229. https://doi.org/10.1126/science.1250475
  • McShane, B. B., & Böckenholt, U. (2014). You cannot step into the same river twice: When power analyses are optimistic. Perspect Psychol Sci, 9(6), 612–625. https://doi.org/10.1177/1745691614548513
  • McShane, B. B., Böckenholt, U., & Hansen, K. T. (2016). Adjusting for publication bias in meta-analysis: An evaluation of selection methods and some cautionary notes. Perspectives on Psychological Science : A Journal of the Association for Psychological Science, 11(5), 730–749. https://doi.org/10.1177/1745691616662243
  • Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716–aac4716. https://doi.org/10.1126/science.aac4716
  • Oyeniran, O., & Chen, H. (2016). Estimating the proportion of true null hypotheses in multiple testing problems. Journal of Probability and Statistics, 1, 1–7. https://doi.org/10.1155/2016/3937056
  • Pashler, H., & Harris, C. R. (2012). Is the replicability crisis overblown? Three arguments examined. Perspectives on Psychological Science: A Journal of the Association for Psychological Science, 7(6), 531–536. https://doi.org/10.1177/1745691612463401
  • Patil, P., Peng, R. D., & Leek, J. T. (2016). What should researchers expect when they replicate studies? A statistical view of replicability in psychological science. Perspectives on Psychological Science: A Journal of the Association for Psychological Science, 11(4), 539–544. https://doi.org/10.1177/1745691616646366
  • Rossi, J. S. (2013). Statistical power analysis. In J.A. Schinka & W.F. Velicer (Eds.), (I.B. Weiner, Editor-in-Chief), Handbook of Psychology.: Research Methods in Psychology (2nd ed., Vol. 2, pp. 71–108). John Wiley & Sons.
  • Rothstein, H., Sutton, A. J., & Borenstein, M. (2005). Publication bias in meta-analysis: Prevention, assessment and adjustments. Wiley.
  • Schauer, J. M. (2018). Statistical methods for assessing replication: A meta-analytic framework (Doctoral dissertation). Northwestern University, Evanston, IL.
  • Schauer, J. M., & Hedges, L. V. (2021). Reconsidering statistical methods for assessing replication. Psychological Methods, 26(1), 127–139. https://doi.org/10.1037/met0000302
  • Schauer, J. M., Fitzgerald, K. G., Peko-Spicer, S., Whalen, M. C. R., Zejnullahi, R., & Hedges, L. V. (2021). An evaluation of statistical methods for aggregate patterns of replication failure. Annals of Applied Statistics, 15(1), 208–229. https://doi.org/10.1214/20-AOAS1387
  • Schweinsberg, M., Madan, N., Vianello, M., Sommer, S. A., Jordan, J., Tierney, W., Awtrey, E., Zhu, L. L., Diermeier, D., Heinze, J. E., Srinivasan, M., Tannenbaum, D., Bivolaru, E., Dana, J., Davis-Stober, C. P., du Plessis, C., Gronau, Q. F., Hafenbrack, A. C., Liao, E. Y., … Uhlmann, E. L. (2016). The pipeline project: Pre-publication independent replications of a single laboratory’s research pipeline. Journal of Experimental Social Psychology, 66, 55–67. https://doi.org/10.1016/j.jesp.2015.10.001
  • Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin.
  • Shepp, L., & Olkin, I. (1981). Entropy of the sum of independent Bernoulli random variables and of the multinomial distribution. In J. Gani, & V. K. Rohatgi (Eds.), Contributions to probability: A collection of papers dedicated to Eugene Lukacs (pp. 201–206). Academic Press.
  • Simonsohn, U. (2015). Small telescopes: Detectability and the evaluation of replication results. Psychological Science, 26(5), 559–569. https://doi.org/10.1177/0956797614567341
  • Steiner, P. M., Wong, V. C., & Anglin, K. (2019). A causal replication framework for designing and assessing replication efforts. Zeitschrift Für Psychologie, 227(4), 280–292. https://doi.org/10.1027/2151-2604/a000385
  • Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society, Series: Statistical Methodology, 64(3), 489–498. https://doi.org/10.1111/1467-9868.00346
  • Student (1931). The Lanarkshire milk experiment. Biometrika, 398–406. https://doi.org/10.2307/2332424[Mismatch
  • Tamhane, A., & Shi, J. (2009). Parametric mixture models for estimating the proportion of true null hypotheses and adaptive control of FDR. Lecture Notes-Monograph Series, 57, 304–325.
  • Tryon, W. W. (2016). Replication is about effect size: Comment on Maxwell, Lau, and Howard (2015). The American Psychologist, 71(3), 236–237. https://doi.org/10.1037/a0040191
  • Valentine, J. C., Biglan, A., Boruch, R. F., Castro, F. G., Collins, L. M., Flay, B. R., Kellam, S., Mościcki, E. K., & Schinke, S. P. (2011). Replication in prevention science. Prevention Science : The Official Journal of the Society for Prevention Research, 12(2), 103–117. https://doi.org/10.1007/s11121-011-0217-6
  • van Aert, R. C., & Van Assen, M. A. (2017). Bayesian evaluation of effect size after replicating an original study. PloS One, 12(4), e0175302. https://doi.org/10.1371/journal.pone.0175302
  • Vankov, I., Bowers, J., & Munafò, M. R. (2014). On the persistence of low power in psychological science. Quarterly Journal of Experimental Psychology (2006), 67(5), 1037–1040. https://doi.org/10.1080/17470218.2014.885986
  • Verhagen, J., & Wagenmakers, E.-J. (2014). Bayesian tests to quantify the result of a replication attempt. Journal of Experimental Psychology. General, 143(4), 1457–1475. https://doi.org/10.1037/a0036731
  • Wang, Y. H. (1993). On the number of successes in independent trials. Statistica Sinica, 3(2), 295–312.
  • West, S. G., & Thoemmes, F. (2010). Campbell's and Rubin's perspectives on causal inference. Psychological Methods, 15(1), 18–37. https://doi.org/10.1037/a0015917
  • Wood, P., Randall, D. (2018). How bad is the government’s science? Wall Street Journal. Retrieved from https://www.wsj.com/articles/how-bad-is-the-governments-science-1523915765.
  • Yong, E. (2016). The inevitable evolution of bad science. The Atlantic. Retrieved from https://www.theatlantic.com/science/archive/2016/09/the-inevitable-evolution-of-bad-science/500609/.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.