Abstract
Over the last decade, large-scale replication projects across the biomedical and social sciences have reported relatively low replication rates. In these large-scale replication projects, replication has typically been evaluated based on a single replication study of some original study and dichotomously as successful or failed. However, evaluations of replicability that are based on a single study and are dichotomous are inadequate, and evaluations of replicability should instead be based on multiple studies, be continuous, and be multi-faceted. Further, such evaluations are in fact possible due to two characteristics shared by many large-scale replication projects. In this article, we provide such an evaluation for two prominent large-scale replication projects, one which replicated a phenomenon from cognitive psychology and another which replicated 13 phenomena from social psychology and behavioral economics. Our results indicate a very high degree of replicability in the former and a medium to low degree of replicability in the latter. They also suggest an unidentified covariate in each, namely ocular dominance in the former and political ideology in the latter, that is theoretically pertinent. We conclude by discussing evaluations of replicability at large, recommendations for future large-scale replication projects, and design-based model generalization. Supplementary materials for this article are available online.
Supplementary Materials
The Supplementary Materials contain data and code to reproduce all results (i.e., ) in the article.
Notes
1 All estimates discussed in the text are posterior median estimates rounded to the nearest integer for those given in ms and ms2 units and to two decimal places for those given in all other units.
2 The MLP reused the dependent measure associated with the Sex Differences in Implicit Math Attitudes phenomenon in an analysis of a thirteenth phenomenon, Relations Between Implicit and Explicit Math Attitudes, that had no experimental condition associated with it; we do not consider that phenomenon here.
3 There was one exception: the experimental condition for the dependent measure associated with the Sex Differences in Implicit Math Attitudes phenomenon was sex, which was of course not randomly assigned. We also note that the order in which the phenomena were presented was randomized, with the exception that the Sex Differences in Implicit Math Attitudes phenomenon was always presented last, and that the four dependent measures associated with the Anchoring phenomenon were always presented in the order distance from San Francisco to New York City, population of Chicago, height of Mount Everest, number of babies born per day in the United States with subjects randomly assigned to the high anchor or low anchor condition separately for each of the four dependent measures.
4 In large-replication projects that are multilevel in nature such as RRRs and MLPs, the estimates from the large-scale replication project used for comparison to date have typically been single meta-analytic average estimates, and therefore the replication could be considered to be evaluated based on a single study.