Search in:

Advanced search

Multivariate Behavioral Research Volume 58, 2023 - Issue 3

Submit an article Journal homepage

258

Views

CrossRef citations to date

Altmetric

Articles

On the Accuracy of Replication Failure Rates

Jacob M. SchauerFeinberg School of Medicine, Northwestern UniversityCorrespondence[email protected]

https://orcid.org/0000-0002-9041-7082

Pages 598-615 | Published online: 04 Jun 2022

Cite this article
https://doi.org/10.1080/00273171.2022.2066500
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Anderson, S. F., & Maxwell, S. E. (2016). There's more than one way to conduct a replication study: Beyond statistical significance. Psychological Methods, 21(1), 1–12. https://doi.org/10.1037/met0000051
PubMed Web of Science ®Google Scholar
Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature, 533(7604), 452–454. https://doi.org/10.1038/533452a
PubMed Web of Science ®Google Scholar
Bakker, M., Hartgerink, C. H. J., Wicherts, J. M., & van der Maas, H. L. J. (2016). Researchers' Intuitions about power in psychological research. Psychological Science, 27(8), 1069–1077. https://doi.org/10.1177/0956797616647519
PubMed Web of Science ®Google Scholar
Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E.-J., Berk, R., Bollen, K. A., Brembs, B., Brown, L., Camerer, C., Cesarini, D., Chambers, C. D., Clyde, M., Cook, T. D., De Boeck, P., Dienes, Z., Dreber, A., Easwaran, K., Efferson, C., … Johnson, V. E. (2018). Redefine statistical significance. Nature Human Behaviour, 2(1), 6–10. https://doi.org/10.1038/s41562-017-0189-z
PubMed Web of Science ®Google Scholar
Bonett, D. G. (2021). Design and analysis of replication studies. Organizational Research Methods, 24(3), 513–529. https://doi.org/10.1177/1094428120911088
Web of Science ®Google Scholar
Brandt, M. J., IJzerman, H., Dijksterhuis, A., Farach, F. J., Geller, J., Giner-Sorolla, R., Grange, J. A., Perugini, M., Spies, J. R., & van't Veer, A. (2014). The replication recipe: What makes for a convincing replication?. Journal of Experimental Social Psychology, 50, 217–224. https://doi.org/10.1016/j.jesp.2013.10.005
Google Scholar
Button, K. S., Ioannidis, J. P., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., & Munafò, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews. Neuroscience, 14(5), 365–376. https://doi.org/10.1038/nrn3475
PubMed Web of Science ®Google Scholar
Camerer, C. F., Dreber, A., Forsell, E., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Almenberg, J., Altmejd, A., Chan, T., Heikensten, E., Holzmeister, F., Imai, T., Isaksson, S., Nave, G., Pfeiffer, T., Razen, M., & Wu, H. (2016). Evaluating replicability of laboratory experiments in economics. Science (New York, N.Y.), 351(6280), 1433–1436. https://doi.org/10.1126/science.aaf0918
PubMed Web of Science ®Google Scholar
Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Nave, G., Nosek, B. A., Pfeiffer, T., Altmejd, A., Buttrick, N., Chan, T., Chen, Y., Forsell, E., Gampa, A., Heikensten, E., Hummer, L., Imai, T., … Wu, H. (2018). Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour, 2(9), 637–644. https://doi.org/10.1038/s41562-018-0399-z
PubMed Web of Science ®Google Scholar
Cheng, Y., Gao, D., & Tong, T. (2015). Bias and variance reduction in estimating the proportion of true-null hypotheses. Biostatistics (Oxford, England), 16(1), 189–204. https://doi.org/10.1093/biostatistics/kxu029
PubMed Web of Science ®Google Scholar
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.) Lawrence Erlbaum Associates.
Google Scholar
Collins, H. (1985). Changing order: Replication and induction in scientific practice. Sage Publications.
Google Scholar
Connor, S. (27 A. (2015). Study reveals that a lot of psychology research really is just ‘psycho-babble’. The Independent.
Google Scholar
Cooper, H. M., Hedges, L. V., & Valentine, J. (2019). The handbook of research synthesis and meta-analysis (3rd ed.). The Russell Sage Foundation.
Google Scholar
Cooper, M. L. (2016). Editorial. Journal of Personality and Social Psychology, 110(3), 431–434. https://doi.org/10.1037/pspp0000033
PubMed Web of Science ®Google Scholar
Dennis, M. L., Lennox, R. D., & Foss, M. A. (1997). Practical power analysis for substance abuse health services research. In K. J. Bryant The science of prevention: Methodological Advances from Alcohol and Substance Abuse Research. American Psychological Association.
Google Scholar
Dickersin, K. (1997). How important is publication bias? A synthesis of available data. AIDS Education and Prevention, 9(1 Suppl), 15–21.
PubMed Web of Science ®Google Scholar
Edwards, A. W. P. (1960). The meaning of binomial distribution. Nature, 186, 1074. https://doi.org/10.1038/1861074a0
PubMed Web of Science ®Google Scholar
Etz, A., & Vandekerckhove, J. (2016). A Bayesian perspective on the reproducibility project. Psychology. PloS One, 11(2), e0149794. https://doi.org/10.1371/journal.pone.0149794
PubMed Web of Science ®Google Scholar
Gilbert, D. T., King, G., Pettigrew, S., & Wilson, T. D. (2016). Comment on "Estimating the reproducibility of psychological science". Science (New York, N.Y.), 351(6277), 1037–1037. https://doi.org/10.1126/science.aad7243
PubMed Web of Science ®Google Scholar
Hartgerink, C. H. J., Wicherts, J. M., & van Assen, M. A. L. M. (2017). Too good to be false: Nonsignificant results revisited. Collabra: Psychology, 3(1), 9. https://doi.org/10.1525/collabra.71
Google Scholar
Hedges, L. V. & Olkin, I. (1985). Statistical Methods for Meta-analysis. New York: Academic Press.
Google Scholar
Hedges, L. V., & Schauer, J. M. (2019a). More than one replication study is needed for unambiguous tests of replication. Journal of Educational and Behavioral Statistics, 44(5), 543–570. https://doi.org/10.3102/1076998619852953
Web of Science ®Google Scholar
Hedges, L. V., & Schauer, J. M. (2019b). Statistical analyses for studying replication: Meta-analytic perspectives. Psychological Methods, 24(5), 557–570. https://doi.org/10.1037/met0000189
PubMed Web of Science ®Google Scholar
Hedges, L. V., & Schauer, J. M. (2021). The design of replication studies. Journal of the Royal Statistical Society: Series A (Statistics in Society), 184(3), 868–886. https://doi.org/10.1111/rssa.12688
Web of Science ®Google Scholar
Hedges, L. V., & Vevea, J. L. (1998). Fixed- and random-effects models in meta-analysis. Psychological Methods, 3(4), 486–504. https://doi.org/10.1037/1082-989X.3.4.486
Web of Science ®Google Scholar
Hsueh, H-m., Chen, J. J., & Kodell, R. L. (2003). Comparison of methods for estimating the number of true null hypotheses in multiplicity testing. Journal of Biopharmaceutical Statistics, 13(4), 675–689. https://doi.org/10.1081/BIP-120024202
PubMedGoogle Scholar
Hung, K., & Fithian, W. (2020). Statistical methods for replicability assessment. Annals of Applied Statistics, 14(3), 1063–1087. https://doi.org/10.1214/20-aoas1336
Web of Science ®Google Scholar
Jiang, H., & Doerge, R. W. (2008). Estimating the proportion of true null hypotheses for multiple comparisons. Cancer Informatics, 6(25), 25–32.
PubMedGoogle Scholar
Klein, R. A., Cook, C. L., Ebersole, C. R., Vitiello, C. A., Nosek, B. A., Chartier, C. R., … Ratliff, K. A. (2019). Many Labs 4: Failure to replicate mortality salience effect with and without original author involvement. Retrieved from: https://psyarxiv.com/vef2c.
Google Scholar
Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B., Bahník, Š., Bernstein, M. J., Bocian, K., Brandt, M. J., Brooks, B., Brumbaugh, C. C., Cemalcilar, Z., Chandler, J., Cheong, W., Davis, W. E., Devos, T., Eisner, M., Frankowska, N., Furrow, D., Galliani, E. M., … Nosek, B. A. (2014). Investigating variation in replicability: A “many labs” replication project. Social Psychology, 45(3), 142–152. https://doi.org/10.1027/1864-9335/a000178
Web of Science ®Google Scholar
Klein, R. A., Vianello, M., Hasselman, F., Adams, B. G., Adams, R. B., Alper, S., Aveyard, M., Axt, J. R., Babalola, M. T., Bahník, Š., Batra, R., Berkics, M., Bernstein, M. J., Berry, D. R., Bialobrzeska, O., Binan, E. D., Bocian, K., Brandt, M. J., Busching, R., … Nosek, B. A. (2018). Many Labs 2: Investigating variation in replicability across samples and settings. Advances in Methods and Practices in Psychological Science, 1(4), 443–490. https://doi.org/10.1177/2515245918810225
Google Scholar
Laird, N. M., & Mosteller, F. (1990). Some statistical methods for combining experimental results. International Journal of Technology Assessment in Health Care, 6(1), 5–30. https://doi.org/10.1017/s0266462300008916
PubMedGoogle Scholar
Langaas, M., Lindqvist, B. H., & Ferkingstad, E. (2005). Estimating the proportion of true null hypotheses, with application to DNA microarray data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(4), 555–572. https://doi.org/10.1111/j.1467-9868.2005.00515.x
Web of Science ®Google Scholar
Le Cam, L. (1960). An approximation theorem for the Poisson binomial distribution. Pacific Journal of Mathematics, 10(4), 1181–1197. https://doi.org/10.2140/pjm.1960.10.1181
Google Scholar
Mathur, M. B., & VanderWeele, T. J. (2019). Challenges and suggestions for defining replication “success” when effects may be heterogeneous: Comment on Hedges and Schauer (2019)). Psychological Methods, 24(5), 571–575. https://doi.org/10.1037/met0000223
PubMed Web of Science ®Google Scholar
Mathur, M. B., & VanderWeele, T. J. (2020). New statistical metrics for multisite replication projects. Journal of the Royal Statistical Society: Series A (Statistics in Society), 183(3), 1145–1166. https://doi.org/10.1111/rssa.12572
Web of Science ®Google Scholar
Maxwell, S. E., Lau, M. Y., & Howard, G. S. (2015). Is psychology suffering from a replication crisis? What does “failure to replicate” really mean? The American Psychologist, 70(6), 487–498. https://doi.org/10.1037/a0039400
PubMed Web of Science ®Google Scholar
McNutt, M. (2014). Reproducibility. Science (New York, N.Y.), 343(6168), 229. https://doi.org/10.1126/science.1250475
PubMed Web of Science ®Google Scholar
McShane, B. B., & Böckenholt, U. (2014). You cannot step into the same river twice: When power analyses are optimistic. Perspect Psychol Sci, 9(6), 612–625. https://doi.org/10.1177/1745691614548513
PubMedGoogle Scholar
McShane, B. B., Böckenholt, U., & Hansen, K. T. (2016). Adjusting for publication bias in meta-analysis: An evaluation of selection methods and some cautionary notes. Perspectives on Psychological Science : A Journal of the Association for Psychological Science, 11(5), 730–749. https://doi.org/10.1177/1745691616662243
PubMedGoogle Scholar
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716–aac4716. https://doi.org/10.1126/science.aac4716
PubMed Web of Science ®Google Scholar
Oyeniran, O., & Chen, H. (2016). Estimating the proportion of true null hypotheses in multiple testing problems. Journal of Probability and Statistics, 1, 1–7. https://doi.org/10.1155/2016/3937056
Google Scholar
Pashler, H., & Harris, C. R. (2012). Is the replicability crisis overblown? Three arguments examined. Perspectives on Psychological Science: A Journal of the Association for Psychological Science, 7(6), 531–536. https://doi.org/10.1177/1745691612463401
PubMedGoogle Scholar
Patil, P., Peng, R. D., & Leek, J. T. (2016). What should researchers expect when they replicate studies? A statistical view of replicability in psychological science. Perspectives on Psychological Science: A Journal of the Association for Psychological Science, 11(4), 539–544. https://doi.org/10.1177/1745691616646366
PubMedGoogle Scholar
Rossi, J. S. (2013). Statistical power analysis. In J.A. Schinka & W.F. Velicer (Eds.), (I.B. Weiner, Editor-in-Chief), Handbook of Psychology.: Research Methods in Psychology (2nd ed., Vol. 2, pp. 71–108). John Wiley & Sons.
Google Scholar
Rothstein, H., Sutton, A. J., & Borenstein, M. (2005). Publication bias in meta-analysis: Prevention, assessment and adjustments. Wiley.
Google Scholar
Schauer, J. M. (2018). Statistical methods for assessing replication: A meta-analytic framework (Doctoral dissertation). Northwestern University, Evanston, IL.
Google Scholar
Schauer, J. M., & Hedges, L. V. (2021). Reconsidering statistical methods for assessing replication. Psychological Methods, 26(1), 127–139. https://doi.org/10.1037/met0000302
PubMed Web of Science ®Google Scholar
Schauer, J. M., Fitzgerald, K. G., Peko-Spicer, S., Whalen, M. C. R., Zejnullahi, R., & Hedges, L. V. (2021). An evaluation of statistical methods for aggregate patterns of replication failure. Annals of Applied Statistics, 15(1), 208–229. https://doi.org/10.1214/20-AOAS1387
Web of Science ®Google Scholar
Schweinsberg, M., Madan, N., Vianello, M., Sommer, S. A., Jordan, J., Tierney, W., Awtrey, E., Zhu, L. L., Diermeier, D., Heinze, J. E., Srinivasan, M., Tannenbaum, D., Bivolaru, E., Dana, J., Davis-Stober, C. P., du Plessis, C., Gronau, Q. F., Hafenbrack, A. C., Liao, E. Y., … Uhlmann, E. L. (2016). The pipeline project: Pre-publication independent replications of a single laboratory’s research pipeline. Journal of Experimental Social Psychology, 66, 55–67. https://doi.org/10.1016/j.jesp.2015.10.001
Web of Science ®Google Scholar
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin.
Google Scholar
Shepp, L., & Olkin, I. (1981). Entropy of the sum of independent Bernoulli random variables and of the multinomial distribution. In J. Gani, & V. K. Rohatgi (Eds.), Contributions to probability: A collection of papers dedicated to Eugene Lukacs (pp. 201–206). Academic Press.
Google Scholar
Simonsohn, U. (2015). Small telescopes: Detectability and the evaluation of replication results. Psychological Science, 26(5), 559–569. https://doi.org/10.1177/0956797614567341
PubMed Web of Science ®Google Scholar
Steiner, P. M., Wong, V. C., & Anglin, K. (2019). A causal replication framework for designing and assessing replication efforts. Zeitschrift Für Psychologie, 227(4), 280–292. https://doi.org/10.1027/2151-2604/a000385
Web of Science ®Google Scholar
Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society, Series: Statistical Methodology, 64(3), 489–498. https://doi.org/10.1111/1467-9868.00346
Google Scholar
Student (1931). The Lanarkshire milk experiment. Biometrika, 398–406. https://doi.org/10.2307/2332424[Mismatch
Google Scholar
Tamhane, A., & Shi, J. (2009). Parametric mixture models for estimating the proportion of true null hypotheses and adaptive control of FDR. Lecture Notes-Monograph Series, 57, 304–325.
Google Scholar
Tryon, W. W. (2016). Replication is about effect size: Comment on Maxwell, Lau, and Howard (2015). The American Psychologist, 71(3), 236–237. https://doi.org/10.1037/a0040191
PubMed Web of Science ®Google Scholar
Valentine, J. C., Biglan, A., Boruch, R. F., Castro, F. G., Collins, L. M., Flay, B. R., Kellam, S., Mościcki, E. K., & Schinke, S. P. (2011). Replication in prevention science. Prevention Science : The Official Journal of the Society for Prevention Research, 12(2), 103–117. https://doi.org/10.1007/s11121-011-0217-6
PubMed Web of Science ®Google Scholar
van Aert, R. C., & Van Assen, M. A. (2017). Bayesian evaluation of effect size after replicating an original study. PloS One, 12(4), e0175302. https://doi.org/10.1371/journal.pone.0175302
PubMed Web of Science ®Google Scholar
Vankov, I., Bowers, J., & Munafò, M. R. (2014). On the persistence of low power in psychological science. Quarterly Journal of Experimental Psychology (2006), 67(5), 1037–1040. https://doi.org/10.1080/17470218.2014.885986
PubMed Web of Science ®Google Scholar
Verhagen, J., & Wagenmakers, E.-J. (2014). Bayesian tests to quantify the result of a replication attempt. Journal of Experimental Psychology. General, 143(4), 1457–1475. https://doi.org/10.1037/a0036731
PubMed Web of Science ®Google Scholar
Wang, Y. H. (1993). On the number of successes in independent trials. Statistica Sinica, 3(2), 295–312.
Web of Science ®Google Scholar
West, S. G., & Thoemmes, F. (2010). Campbell's and Rubin's perspectives on causal inference. Psychological Methods, 15(1), 18–37. https://doi.org/10.1037/a0015917
PubMed Web of Science ®Google Scholar
Wood, P., Randall, D. (2018). How bad is the government’s science? Wall Street Journal. Retrieved from https://www.wsj.com/articles/how-bad-is-the-governments-science-1523915765.
Google Scholar
Yong, E. (2016). The inevitable evolution of bad science. The Atlantic. Retrieved from https://www.theatlantic.com/science/archive/2016/09/the-inevitable-evolution-of-bad-science/500609/.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

On the Accuracy of Replication Failure Rates

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

On the Accuracy of Replication Failure Rates

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date