References
- Amrhein, V., Korner-Nievergelt, F., and Roth, T. (2017), “The Earth is Flat (p > 0.05): Significance Thresholds and the Crisis of Unreplicable Research,” PeerJ 5. DOI: https://doi.org/10.7717/peerj.3544.
- Amrhein, V., Trafimow, D., and Greenland, S. (2019), “Inferential Statistics as Descriptive Statistics: There is No Replication Crisis If We Don’t Expect Replication,” The American Statistician, 73, 262–270. DOI: https://doi.org/10.1080/00031305.2018.1543137.
- Anderson, C. J., Štěpán Bahník, Barnett-Cowan, M., Bosco, F. A., Chandler, J., Chartier, C. R., Cheung, F., Christopherson, C. D., Cordes, A., Cremata, E. J., Penna, N. D., Estel, V., Fedor, A., Fitneva, S. A., Frank, M. C., Grange, J. A., Hartshorne, J. K., Hasselman, F., Henninger, F., van der Hulst, M., Jonas, K. J., Lai, C. K., Levitan, C. A., Miller, J. K., Moore, K. S., Meixner, J. M., Munafò, M. R., Neijenhuijs, K. I., Nilsonne, G., Nosek, B. A., Plessow, F., Prenoveau, J. M., Ricker, A. A., Schmidt, K., Spies, J. R., Stieger, S., Strohminger, N., Sullivan, G. B., van Aert, R. C. M., van Assen, M. A. L. M., Vanpaemel, L. M., Vianello, W., Voracek, M., and Zuni, K. (2016), “Response to Comment on “Estimating the Reproducibility of Psychological Science,” Science, 351, 1037–1037.
- Benjamin, D. J., J. O. Berger, M. Johannesson, B. A. Nosek, E. J. Wagenmakers, R. Berk, K. A. Bollen, B. Brembs, L. Brown, C. Camerer, D. Cesarini, C. D. Chambers, M. Clyde, T. D. Cook, P. De Boeck, Z. Dienes, A. Dreber, K. Easwaran, C. Efferson, E. Fehr, F. Fidler, A. P. Field, M. Forster, E. I. George, R. Gonzalez, S. Goodman, E. Green, D. P. Green, A. G. Greenwald, J. D. Hadfield, L. V. Hedges, L. Held, T. Hua Ho, H. Hoijtink, D. J. Hruschka, K. Imai, G. Imbens, J. P. A. Ioannidis, M. Jeon, J. H. Jones, M. Kirchler, D. Laibson, J. List, R. Little, A. Lupia, E. Machery, S. E. Maxwell, M. McCarthy, D. A. Moore, S. L. Morgan, M. Munafó, S. Nakagawa, B. Nyhan, T. H. Parker, L. Pericchi, M. Perugini, J. Rouder, J. Rousseau, V. Savalei, F. D. Schönbrodt, T. Sellke, B. Sinclair, D. Tingley, T. Van Zandt, S. Vazire, D. J. Watts, C. Winship, R. L. Wolpert, Y. Xie, C. Young, J. Zinman, and V. E. Johnson (2017), “Redefine Statistical Significance,” Nature Human Behaviour, 2, 6–10. DOI: https://doi.org/10.1038/s41562-017-0189-z.
- Benjamini, Y. (2020), “Selective Inference: The Silent Killer of Replicability,” Harvard Data Science Review, 2, available at https://hdsr.mitpress.mit.edu/pub/l39rpgyc. DOI: https://doi.org/10.1162/99608f92.fc62b261.
- Benjamini, Y., De Veaux, R. D., Efron, B., Evans, S., Glickman, M., Graubard, B. I., He, X., Meng, X.-L., Reid, N., and Stigler, S. M. (2021), “The ASA President’s Task Force Statement on Statistical Significance and Replicability,” The Annals of Applied Statistics, 15, 1084–1085. DOI: https://doi.org/10.1214/21-AOAS1501.
- Benjamini, Y., and Hochberg, Y. (1995), “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing,” Journal Royal Statistical Society, Series B, 57, 289–300. DOI: https://doi.org/10.1111/j.2517-6161.1995.tb02031.x.
- Berger, J. O. (1993), Statistical Decision Theory and Bayesian Analysis, New York: Springer Science & Business Media.
- Berger, J. O., Boukai, B., and Wang, Y. (1997), “Unified Frequentist and Bayesian Testing of a Precise Hypothesis,” Statistical Science, 12, 133–148. DOI: https://doi.org/10.1214/ss/1030037904.
- Berger, J. O., and Delampady, M. (1987), “Testing Precise Hypotheses” (with discussion), Statististical Science, 2, 317–352.
- Betensky, R. A. (2019), “The p-Value Requires Context, Not a Threshold,” The American Statistician, 73(sup1.), 115–117. DOI: https://doi.org/10.1080/00031305.2018.1529624.
- Bickel, D. R. (2020), “Null Hypothesis Significance Testing Interpreted and Calibrated by Estimating Probabilities of Sign Errors: A Bayes-Frequentist Continuum,” The American Statistician, 75, 104–112. DOI: https://doi.org/10.1080/00031305.2020.1816214.
- Bickel, D. R (2021), “Null Hypothesis Significance Testing Defended and Calibrated by Bayesian Model Checking,” The American Statistician, 75, 249–255.
- Cai, T. T., and Sun, W. (2009), “Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks,” Journal of American Statistical Association, 104, 1467–1481. DOI: https://doi.org/10.1198/jasa.2009.tm08415.
- Cai, T. T., Sun, W., and Wang, W. (2019), “Covariate-Assisted Ranking and Screening for Large-Scale Two-Sample Inference,” Journal of the Royal Statistical Society, Series B, 81, 187–234. DOI: https://doi.org/10.1111/rssb.12304.
- Campbell, H., and Gustafson, P. (2019), “The World of Research Has Gone Berserk: Modeling the Consequences of Requiring ‘Greater Statistical Stringency’ for Scientific Publication,” The American Statistician, 73, 358–373. DOI: https://doi.org/10.1080/00031305.2018.1555101.
- Cohen, J. (1988), Statistical Power Analysis for the Behavioral Sciences (2nd ed.), Hillsdale, NJ: Erlbaum.
- Cohen, J. (1994), “The Earth is Round (p <0.05),” The American Psychologist, 49, 997–1003.
- Colquhoun, D. (2019), “The False Positive Risk: A Proposal Concerning What to Do About p-Values,” The American Statistician, 73, 192–201. DOI: https://doi.org/10.1080/00031305.2018.1529622.
- Dumas-Mallet, E., K. S. Button, T. Boraud, F. Gonon, and M. R. Munafo (2017), “Low Statistical Power in Biomedical Science: A Review of Three Human Research Domains,” Royal Society of Open Science, 4, 160254. DOI: https://doi.org/10.1098/rsos.160254.
- Efron, B. (2008), “Microarrays, Empirical Bayes and the Two-Group Model,” Statistical Science, 23, 1–22.
- Efron, B. (2010), Large-Scale Inference, Volume 1 of Institute of Mathematical Statistics (IMS) Monographs, Cambridge: Cambridge University Press. Empirical Bayes methods for estimation, testing, and prediction.
- Efron, B., Tibshirani, R., Storey, J. D., and Tusher, P. (2001), “Empirical Bayes Analysis of a Microarray Experiment,” Journal of the American Statistical Association, 96, 1151 – 1160. DOI: https://doi.org/10.1198/016214501753382129.
- Fisher, R. A. (1915), “Frequency Distribution of the Values of the Correlation Coefficient in Samples From an Indefinitely Large Population,” Biometrika, 10, 507–521. DOI: https://doi.org/10.2307/2331838.
- Fisher, R. A. (1925), Statistical Methods for the Research Worker, Edinburgh: Oliver and Boyd.
- Gelman, A. (2019), “Don’t Calculate Post-Hoc Power Using Observed Estimate of Effect Size,” Annals of Surgery, 269(1), 9–10. DOI: https://doi.org/10.1097/SLA.0000000000002908.
- Genovese, C., and Wasserman, L. (2002), “Operating Characteristic and Extensions of the False Discovery Rate Procedure,” Journal of the Royal Statistical Society, Series B, 64, 499–517. DOI: https://doi.org/10.1111/1467-9868.00347.
- Gigerenzer, G. (2004), “Mindless Statistics,” The Journal of Socio-Economics, 33, 587–606. DOI: https://doi.org/10.1016/j.socec.2004.09.033.
- Gilbert, D. T., King, G., Pettigrew, S., and Wilson, T. D. (2016), “Comment on “Estimating the Reproducibility of Psychological Science,” Science, 351, 1037–1037.
- Goodman, S. N. (1999), “Toward Evidence-Based Medical Statistics. 2: The Bayes Factor,” Annals of Internal Medicine, 130, 1005–1013. DOI: https://doi.org/10.7326/0003-4819-130-12-199906150-00019.
- Goodman, W. M., Spruill, S. E., and Komaroff, E. (2019), “A Proposed Hybrid Effect Size Plus p-Value Criterion: Empirical Evidence Supporting Its Use,” The American Statistician, 73, 168–185. DOI: https://doi.org/10.1080/00031305.2018.1564697.
- Grimes, D. R., Bauch, C. T., and Ioannidis, J. P. A. (2018), “Modelling Science Trustworthiness Under Publish or Perish Pressure,” Royal Society of Open Science, 5, 171511. DOI: https://doi.org/10.1098/rsos.171511.
- Habiger, J. D. (2017), “Adaptive False Discovery Rate Control for Heterogeneous Data,” Statistica Sinica, 27, 1731–1756.
- Haller, H., and Krauss, S. (2002), “Misinterpretations of Significance: A Problem Students Share With Their Teachers,” Methods of Psychological Research, 7, 1–20.
- Higginson, A. D., and Munafò, M. R. (2016), “Current Incentives for Scientists Lead to Underpowered Studies With Erroneous Conclusions,” PLoS Biology, 14, e2000995. DOI: https://doi.org/10.1371/journal.pbio.2000995.
- Hubbard, R. (2004), “Alphabet Soup: Blurring the Distinctions Between p’s Anda’s in Psychological Research,” Theory & Psychology 14, 295–327.
- Hubbard, R. (2019), “Will the ASA’s Efforts to Improve Statistical Practice be Successful? Some Evidence to the Contrary,” The American Statistician, 73, 31–35.
- Hurlbert, S. H., Levine, R. A., and Utts, J. (2019), “Coup de Grâce for a Tough Old Bull: ‘Statistically Significant’ Expires,” The American Statistician, 73, 352–357. DOI: https://doi.org/10.1080/00031305.2018.1543616.
- Hurlbert, S. H., and Lombardi, C. M. (2009), “Final Collapse of the Neyman–Pearson Decision Theoretic Framework and Rise of the Neofisherian,” in Annales Zoologici Fennici, Vol. 46, pp. 311–349. BioOne. DOI: https://doi.org/10.5735/086.046.0501.
- Ioannidis, J. P. (2005), “Why Most Published Research Findings Are False,” PLoS Medicine, 2, e124. DOI: https://doi.org/10.1371/journal.pmed.0020124.
- Ioannidis, J. P., Hozo, I., and Djulbegovic, B. (2013), “Optimal Type I and Type II Error Pairs When the Available Sample Size is Fixed,” Journal of Clinical Epidemiology, 66, 903–910. DOI: https://doi.org/10.1016/j.jclinepi.2013.03.002.
- Ioannidis, J. P. A. (2013), “Discussion: Why ‘An Estimate of the Science-Wise False Discovery Rate and Application to the Top Medical Literature’ is False,” Biostatistics, 15, 28–36. DOI: https://doi.org/10.1093/biostatistics/kxt036.
- Ioannidis, J. P. A. (2019), “What Have We (Not) Learnt From Millions of Scientific Papers With p Values?” The American Statistician, 73, 20–25. DOI: https://doi.org/10.1080/00031305.2018.1447512.
- Jager, L. R., and Leek, J. T. (2013), “An Estimate of the Science-Wise False Discovery Rate and Application to the Top Medical Literature,” Biostatistics, 15, 1–12. DOI: https://doi.org/10.1093/biostatistics/kxt007.
- Johnson, D. H. (1999), “The Insignificance of Statistical Significance Testing,” The Journal of Wildlife Management, 63, 763–772. DOI: https://doi.org/10.2307/3802789.
- Johnson, V. E. (2013), “Revised Standards for Statistical Evidence,” Proceedings of the National Academy of Sciences, 110, 19313–19317. DOI: https://doi.org/10.1073/pnas.1313476110.
- Johnson, V. E., Payne, R. D., Wang, T., Asher, A., and Mandal, S. (2017), “On the Reproducibility of Psychological Science,” Journal of the American Statistical Association, 112 (517), 1–10. DOI: https://doi.org/10.1080/01621459.2016.1240079.
- Kennedy-Shaffer, L. (2019). “Before p <0.05 to Beyond p <0.05: Using History to Contextualize p-values and Significance Testing,” The American Statistician, 73, 82–90.
- Krantz, D. H. (1999), “The Null Hypothesis Testing Controversy in Psychology,” Journal of the American Statistical Association, 94, 1372–1381. DOI: https://doi.org/10.1080/01621459.1999.10473888.
- Lakens, D., Adolfi, F. G., Albers, C., Anvari, F., Apps, M., Argamon, S., Baguley, T., Becker, R., Benning, S. D., Bradford, D., Buchanan, E. M., Caldwell, A. R., Calster, B., Carlsson, R., Chin Chen, S., Chung, B., Colling, L. J., Collins, G., Crook, Z., Cross, E. S., Daniels, S., Danielsson, H., DeBruine, L., Dunleavy, D. J., Earp, B., Feist, M. I., Ferrell, J. D., Field, J. G., Fox, N. W., Friesen, A., Gomes, C., Gonzalez-Marquez, M., Grange, J., Grieve, A., Guggenberger, R., Grist, J., Harmelen, A.-L., Hasselman, F., Hochard, K. D., Hoffarth, M., Holmes, N., Ingre, M., Isager, P., Isotalus, H., Johansson, C., Juszczyk, K., Kenny, D., Khalil, A., Konat, B., Lao, J., Larsen, E. G., Lodder, G., Lukavský, J., Madan, C., Manheim, D., Martin, S. R., Martin, A. E., Mayo, D., McCarthy, R. J., McConway, K., McFarland, C., Nio, A., Nilsonne, G., Oliveira, C. L., Xivry, J. O., Parsons, S., Pfuhl, G., Quinn, K., Sakon, J. J., Saribay, S. A., Schneider, I., Selvaraju, M., Sjoerds, Z., Smith, S. G., Smits, T., Spies, J. R., Sreekumar, V., Steltenpohl, C. N., Stenhouse, N., Wiatkowski, W., Vadillo, M. A., Assen, M. V., Williams, M., Williams, S. E., Williams, D. R. Yarkoni, T., Ziano, I., and Zwaan, R. A. (2018), “Justify Your Alpha,” Nature Human Behaviour, 2, 168–171. DOI: https://doi.org/10.1038/s41562-018-0311-x.
- Matthews, R. (2021), “The p-Value Statement, Five Years On,” Significance, 18 (2), 16–19. DOI: https://doi.org/10.1111/1740-9713.01505.
- Matthews, R. A. (2001), “Why Should Clinicians Care About Bayesian Methods?,” Journal of Statistical Planning and Inference 94, 43–58.
- McCann, M. H., and Habiger, J. D. (2020), “The Detection of Nonnegligible Directional Effects With Associated Measures of Statistical Significance,” The American Statistician, 74, 213–217. DOI: https://doi.org/10.1080/00031305.2018.1497538.
- McLachlan, G. J., and Peel, D. (2000), Finite Mixture Models, New York: Wiley Series in Probability and Statistics.
- McShane, B. B., Böckenholt, U., and Hansen, K. T. (2020), “Average Power: A Cautionary Note,” Advances in Methods and Practices in Psychological Science, 3, 185–199. DOI: https://doi.org/10.1177/2515245920902370.
- McShane, B. B., and Gal, D. (2016), “Blinding Us to the Obvious? The Effect of Statistical Training on the Evaluation of Evidence,” Management Science, 62, 1707–1718. DOI: https://doi.org/10.1287/mnsc.2015.2212.
- McShane, B. B., and Gal, D. (2017), “Statistical Significance and the Dichotomization of Evidence,” Journal of the American Statistical Association, 112, 885–895.
- McShane, B. B., Gal, D., Gelman, A., Robert, C., and Tackett, J. L. (2019), “Abandon Statistical Significance,” The American Statistician, 73, 235–245. DOI: https://doi.org/10.1080/00031305.2018.1527253.
- McShane, B. B., Tackett, J. L., Böckenholt, U., and Gelman, A. (2019), “Large-Scale Replication Projects in Contemporary Psychological Research,” The American Statistician, 73, 99–105. DOI: https://doi.org/10.1080/00031305.2018.1505655.
- Morton, N. E. (1955), “Sequential Tests for The Detection of Linkage,” American Journal of Human Genetics, 7, 277.
- Moss, J., and R. De Bin (2021+), “Modelling Publication Bias and p-Hacking,” Biometrics, available at https://onlinelibrary.wiley.com/doi/abs/10.1111/biom.13560.
- OSC (2015), “Estimating the Reproducibility of Psychological Science,” Science, 349.
- Robbins, H. (1951), “Asymptotically Subminimax Solutions of Compound Statistical Decision Problems,” in Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, 1950, pp. 131–148. Berkeley: University of California Press.
- Sellke, T., Bayarri, M. J., and Berger, J. O. (2001), “Calibration of p-Values for Testing Precise Null Hypotheses,” The American Statistician, 55, 62–71. DOI: https://doi.org/10.1198/000313001300339950.
- Shi, H., and Yin, G. (2021), “Reconnecting p-Value and Posterior Probability Under One- and Two-Sided Tests,” The American Statistician, 75, 265–275. DOI: https://doi.org/10.1080/00031305.2020.1717621.
- Storey, J. (2003), “The Positive False Discovery Rate: A Bayesian Interpretation and the q-Value,” The Annals of Statistics, 31, 2012 – 2035. DOI: https://doi.org/10.1214/aos/1074290335.
- Sun, W., and Cai, T. (2007), “Oracle and Adaptive Compound Decision Rules for False Discovery Rate Control,” Journal of American Statistical Association, 102, 901–912. DOI: https://doi.org/10.1198/016214507000000545.
- Szucs, D., and Ioannidis, J. (2017), “When Null Hypothesis Significance Testing is Unsuitable for Research: A Reassessment,” Frontiers in Human Neuroscience, 11, 390. DOI: https://doi.org/10.3389/fnhum.2017.00390.
- Thomas, L. (1997), “Retrospective Power Analysis,” Conservation Biology, 11, 276–280. DOI: https://doi.org/10.1046/j.1523-1739.1997.96102.x.
- Wasserstein, R. L., and Lazar, N. A. (2016), “The ASA’s Statement on p-Values: Context, Process, and Purpose,” The American Statistician, 70, 129–133. DOI: https://doi.org/10.1080/00031305.2016.1154108.
- Wasserstein, R. L., Schirm, A. L., and Lazar, N. A. (2019), “Moving to a World Beyond p <0.05,” The American Statistician, 73, 1–19.
- Yuan, K.-H., and Maxwell, S. (2005), “On the Post Hoc Power in Testing Mean Differences,” Journal of Educational and Behavioral Statistics, 30, 141–167. DOI: https://doi.org/10.3102/10769986030002141.