References
- Amrhein, V. (2018), “Inferential Statistics is Not Inferential,” sci five, University of Basel, available at http://bit.ly/notinfer.
- Amrhein, V., and Greenland, S. (2018), “Remove, Rather Than Redefine, Statistical Significance,” Nature Human Behaviour, 2, 4. DOI: 10.1038/s41562-017-0224-0.
- Amrhein, V., Korner-Nievergelt, F., and Roth, T. (2017), “The Earth is Flat (p > 0.05): Significance Thresholds and the Crisis of Unreplicable Research,” PeerJ, 5, e3544. DOI: 10.7717/peerj.3544.
- Amrhein, V., Trafimow, D., and Greenland, S. (2018), “Abandon Statistical Inference,” PeerJ Preprints, 6, e26857v1.
- Baker, M. (2016), “Is There a Reproducibility Crisis?” Nature, 533, 452–454. DOI: 10.1038/533452a.
- Barnard, G. A. (1996), “Fragments of a Statistical Autobiography,” Student, 1, 257–268.
- Bayarri, M. J., and Berger, J. O. (2000), “P-values for Composite Null Models,” Journal of the American Statistical Association, 95, 1127–1142. DOI: 10.2307/2669749.
- Boring, E. G. (1919), “Mathematical vs. Scientific Significance,” Psychological Bulletin, 16, 335–338. DOI: 10.1037/h0074554.
- Box, G. E. P. (1980), “Sampling and Bayes’ Inference in Scientific Modeling and Robustness,” Journal of the Royal Statistical Society, Series A, 143, 383–430. DOI: 10.2307/2982063.
- Brown, H. K., Ray, J. G., Wilton, A. S., Lunsky, Y., Gomes, T., and Vigod, S. N. (2017), “Association Between Serotonergic Antidepressant Use During Pregnancy and Autism Spectrum Disorder in Children,” JAMA: Journal of the American Medical Association, 317, 1544–1552. DOI: 10.1001/jama.2017.3415.
- Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T. H., Huber, J., Johannesson, M., Kirchler, M., Nave, G., Nosek, B. A., Pfeiffer, T., Altmejd, A., Buttrick, N., Chan, T. Z., Chen, Y. L., Forsell, E., Gampa, A., Heikensten, E., Hummer, L., Imai, T., Isaksson, S., Manfredi, D., Rose, J., Wagenmakers, E. J., and Wu, H. (2018), “Evaluating the Replicability of Social Science Experiments in Nature and Science Between 2010 and 2015,” Nature Human Behaviour, 2, 637–644. DOI: 10.1038/s41562-018-0399-z.
- Cohen, J. (1994), “The Earth Is Round (p <.05),” American Psychologist, 49, 997–1003.
- Cox, D. R. (1978), “Foundations of Statistical Inference: The Case for Eclecticism,” Australian Journal of Statistics, 20, 43–59. DOI: 10.1111/j.1467-842X.1978.tb01094.x.
- Crane, H. (2017), “Why ‘Redefining Statistical Significance’ Will Not Improve Reproducibility and Could Make the Replication Crisis Worse,” available at https://arxiv.org/abs/1711.07801.
- Cumming, G. (2014), “The New Statistics: Why and How,” Psychological Science, 25, 7–29. DOI: 10.1177/0956797613504966.
- Edgeworth, F. Y. (1885), “Methods of Statistics,” Journal of the Statistical Society of London, Jubilee Volume, 181–217.
- Efron, B., and Hastie, T. (2016), Computer Age Statistical Inference: Algorithms, Evidence, and Data Science, New York: Cambridge University Press.
- Fisher, R. A. (1937), The Design of Experiments (2nd ed.), Edinburgh: Oliver and Boyd.
- Gelman, A. (2016), “The Problems with P-values are not Just with P-values,” The American Statistician, this issue Supplemental Material to the ASA Statement on P-values and Statistical Significance.
- Gelman, A., and Hennig, C. (2017), “Beyond Subjective and Objective in Statistics,” Journal of the Royal Statistical Society, Series A, 180, 967–1033. DOI: 10.1111/rssa.12276.
- Gelman, A., and Stern, H. (2006), “The Difference Between ‘Significant’ and ‘Not Significant’ is Not Itself Statistically Significant,” The American Statistician, 60, 328–331. DOI: 10.1198/000313006X152649.
- Gigerenzer, G. (1993), “The Superego, the Ego, and the ID in Statistical Reasoning,” in A Handbook for Data Analysis in the Behavioral Sciences, eds G. Keren and C. Lewis, Hillsdale, NJ: Lawrence Erlbaum Associates, pp. 311–339.
- Good, I. J. (1957), “Some Logic and History of Hypothesis Testing,” in Philosophical Foundations of Economics, ed. J. C. Pitt, Dordrecht, Holland: D. Reidel, pp. 149–174. (Reprinted as Ch. 14 in Good, I. J. (1983), Good Thinking, 129–148, Minneapolis, MN: University of Minnesota Press).
- Goodman, S. N. (1992), “A Comment on Replication, P-values and Evidence,” Statistics in Medicine, 11, 875–879. DOI: 10.1002/sim.4780110705.
- Greenland, S. (2011), “Null Misinterpretation in Statistical Testing and Its Impact on Health Risk Assessment,” Preventive Medicine, 53, 225–228. DOI: 10.1016/j.ypmed.2011.08.010.
- Greenland, S. (2017), “Invited Commentary: The Need for Cognitive Science in Methodology,” American Journal of Epidemiology, 186, 639–645. DOI: 10.1093/aje/kwx259.
- Greenland, S. (2019a), “Valid P-values Behave Exactly as They Should: Some Misleading Criticisms of P-values and Their Resolution with S-values,” The American Statistician, this issue.
- Greenland, S. (2019b), “The Unconditional Information in P− values, and Its Refutational Interpretation via S-values,” submitted.
- Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. C., Poole, C., Goodman, S. N., and Altman, D. G. (2016), “Statistical Tests, Confidence Intervals, and Power: A Guide to Misinterpretations,” The American Statistician, 70, online supplement 1 at http://amstat.tandfonline.com/doi/suppl/10.1080/00031305.2016.1154108/suppl_file/utas_a_1154108_sm5368.pdf; reprinted in European Journal of Epidemiology, 31, 337–350. DOI: 10.1007/s10654-016-0149-3.
- Halsey, L. G., Curran-Everett, D., Vowler, S. L., and Drummond, G. B. (2015), “The Fickle P-value Generates Irreproducible Results,” Nature Methods, 12, 179–185. DOI: 10.1038/nmeth.3288.
- Hurlbert, S. H. and Lombardi, C. M. (2009), “Final Collapse of the Neyman-Pearson Decision Theoretic Framework and Rise of the neo Fisherian. Annales Zoologici Fennici, 46, 311–349. DOI: 10.5735/086.046.0501.
- John, L. K., Loewenstein, G., and Prelec, D. (2012), “Measuring the Prevalence of Questionable Research Practices with Incentives for Truth Telling,” Psychological Science, 23, 524–532. DOI: 10.1177/0956797611430953.
- Lakens, D., Scheel, A. M., and Isager, P. M. (2018), “Equivalence Testing for Psychological Research: A Tutorial,” Advances in Methods and Practices in Psychological Science, 1, 259–269. DOI: 10.1177/2515245918770963.
- Lehmann, E. L. (1986), Testing Statistical Hypotheses (2nd ed.), New York: Springer.
- Little, R. J. (2006), “Calibrated Bayes: A Bayes/Frequentist Roadmap,” The American Statistician, 60, 213–223. DOI: 10.1198/000313006X117837.
- Locascio, J. (2017), “Results Blind Science Publishing,” Basic and Applied Social Psychology, 39, 239–246. DOI: 10.1080/01973533.2017.1336093.
- Martinson, B. C., Anderson, M. S., and de Vries, R. (2005), “Scientists Behaving Badly,” Nature, 435, 737–738. DOI: 10.1038/435737a.
- McShane, B. B., Gal, D., Gelman, A., Robert, C., and Tackett, J. L. (2019), “Abandon Statistical Significance,” The American Statistician.
- Meehl, P. E. (1990), “Why Summaries of Research on Psychological Theories Are Often Uninterpretable,” Psychological Reports, 66, 195–244. DOI: 10.2466/pr0.1990.66.1.195.
- Neyman, J., and Pearson, E. S. (1933), “The Testing of Statistical Hypotheses in Relation to Probabilities a priori,” Mathematical Proceedings of the Cambridge Philosophical Society, 29, 492–510. DOI: 10.1017/S030500410001152X.
- Open Science Collaboration. (2015), “Estimating the Reproducibility of Psychological Science,” Science, 349, aac4716.
- Poole, C. (1987a), “Beyond the Confidence Interval,” American Journal of Public Health, 77, 195–199.
- Poole, C. (1987b), “Confidence Intervals Exclude Nothing,” American Journal of Public Health, 77, 492–493. DOI: 10.2105/AJPH.77.4.492.
- Popper, K. R. (1968), The Logic of Scientific Discovery (2nd English ed.), London: Routledge.
- Robins, J. M., van der Vaart, A., and Ventura, V. (2000), “Asymptotic Distribution of P-values in Composite Null Models,” Journal of the American Statistical Association, 95, 1143–1156. DOI: 10.2307/2669750.
- Rothman, K., Greenland, S., and Lash, T. L. (2008), Modern Epidemiology (3rd ed., Ch. 10), Philadelphia, PA: Lippincott Williams & Wilkins.
- Senn, S. J. (2001), “Two Cheers for P-values?” Journal of Epidemiology and Biostatistics, 6, 193–204.
- Senn, S. J. (2002), “‘Letter to the Editor’ Re: Goodman 1992,” Statistics in Medicine, 21, 2437–2444. DOI: 10.1002/sim.1072.
- Senn, S. J. (2011), “You May Believe You Are a Bayesian But You Are Probably Wrong,” Rational Markets and Morals, 2, 48–66.
- Stark, P. B., and Saltelli, A. (2018), “Cargo-Cult Statistics and Scientific Crisis,” Significance, 15, 40–43. DOI: 10.1111/j.1740-9713.2018.01174.x.
- Trafimow, D., Amrhein, V., Areshenkoff, C. N., Barrera-Causil, C., Beh, E. J., Bilgiç, Y., Bono, R., Bradley, M. T., Briggs, W. M., Cepeda-Freyre, H. A., Chaigneau, S. E., Ciocca, D. R., Carlos Correa, J., Cousineau, D., de Boer, M. R., Dhar, S. S., Dolgov, I., Gómez-Benito, J., Grendar, M., Grice, J., Guerrero-Gimenez, M. E., Gutiérrez, A., Huedo-Medina, T. B., Jaffe, K., Janyan, A., Karimnezhad, A., Korner-Nievergelt, F., Kosugi, K., Lachmair, M., Ledesma, R., Limongi, R., Liuzza, M. T., Lombardo, R., Marks, M., Meinlschmidt, G., Nalborczyk, L., Nguyen, H. T., Ospina, R., Perezgonzalez, J. D., Pfister, R., Rahona, J. J., Rodríguez-Medina, D. A., Romão, X., Ruiz-Fernández, S., Suarez, I., Tegethoff, M., Tejo, M., van de Schoot, R., Vankov, I., Velasco-Forero, S., Wang, T., Yamada, Y., Zoppino, F. C., and Marmolejo-Ramos, F. (2018), “Manipulating the Alpha Level Cannot Cure Significance Testing,” Frontiers in Psychology, 9, 699.
- Trafimow, D., and Marks, M. (2015), “Editorial,” Basic and Applied Social Psychology, 37, 1–2. DOI: 10.1080/01973533.2015.1012991.
- Wellek, S. (2010), Testing Statistical Hypotheses of Equivalence and Noninferiority (2nd ed.), New York: Chapman & Hall.
- Ziliak, S. T., and McCloskey, D. N. (2008), The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives. Ann Arbor, MI: University of Michigan Press.