72,039
Views
215
CrossRef citations to date
0
Altmetric
Adopting More Holistic Approaches

Inferential Statistics as Descriptive Statistics: There Is No Replication Crisis if We Don’t Expect Replication

, &
Pages 262-270 | Received 19 Mar 2018, Accepted 25 Oct 2018, Published online: 20 Mar 2019

References

  • Amrhein, V. (2018), “Inferential Statistics is Not Inferential,” sci five, University of Basel, available at http://bit.ly/notinfer.
  • Amrhein, V., and Greenland, S. (2018), “Remove, Rather Than Redefine, Statistical Significance,” Nature Human Behaviour, 2, 4. DOI: 10.1038/s41562-017-0224-0.
  • Amrhein, V., Korner-Nievergelt, F., and Roth, T. (2017), “The Earth is Flat (p > 0.05): Significance Thresholds and the Crisis of Unreplicable Research,” PeerJ, 5, e3544. DOI: 10.7717/peerj.3544.
  • Amrhein, V., Trafimow, D., and Greenland, S. (2018), “Abandon Statistical Inference,” PeerJ Preprints, 6, e26857v1.
  • Baker, M. (2016), “Is There a Reproducibility Crisis?” Nature, 533, 452–454. DOI: 10.1038/533452a.
  • Barnard, G. A. (1996), “Fragments of a Statistical Autobiography,” Student, 1, 257–268.
  • Bayarri, M. J., and Berger, J. O. (2000), “P-values for Composite Null Models,” Journal of the American Statistical Association, 95, 1127–1142. DOI: 10.2307/2669749.
  • Boring, E. G. (1919), “Mathematical vs. Scientific Significance,” Psychological Bulletin, 16, 335–338. DOI: 10.1037/h0074554.
  • Box, G. E. P. (1980), “Sampling and Bayes’ Inference in Scientific Modeling and Robustness,” Journal of the Royal Statistical Society, Series A, 143, 383–430. DOI: 10.2307/2982063.
  • Brown, H. K., Ray, J. G., Wilton, A. S., Lunsky, Y., Gomes, T., and Vigod, S. N. (2017), “Association Between Serotonergic Antidepressant Use During Pregnancy and Autism Spectrum Disorder in Children,” JAMA: Journal of the American Medical Association, 317, 1544–1552. DOI: 10.1001/jama.2017.3415.
  • Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T. H., Huber, J., Johannesson, M., Kirchler, M., Nave, G., Nosek, B. A., Pfeiffer, T., Altmejd, A., Buttrick, N., Chan, T. Z., Chen, Y. L., Forsell, E., Gampa, A., Heikensten, E., Hummer, L., Imai, T., Isaksson, S., Manfredi, D., Rose, J., Wagenmakers, E. J., and Wu, H. (2018), “Evaluating the Replicability of Social Science Experiments in Nature and Science Between 2010 and 2015,” Nature Human Behaviour, 2, 637–644. DOI: 10.1038/s41562-018-0399-z.
  • Cohen, J. (1994), “The Earth Is Round (p <.05),” American Psychologist, 49, 997–1003.
  • Cox, D. R. (1978), “Foundations of Statistical Inference: The Case for Eclecticism,” Australian Journal of Statistics, 20, 43–59. DOI: 10.1111/j.1467-842X.1978.tb01094.x.
  • Crane, H. (2017), “Why ‘Redefining Statistical Significance’ Will Not Improve Reproducibility and Could Make the Replication Crisis Worse,” available at https://arxiv.org/abs/1711.07801.
  • Cumming, G. (2014), “The New Statistics: Why and How,” Psychological Science, 25, 7–29. DOI: 10.1177/0956797613504966.
  • Edgeworth, F. Y. (1885), “Methods of Statistics,” Journal of the Statistical Society of London, Jubilee Volume, 181–217.
  • Efron, B., and Hastie, T. (2016), Computer Age Statistical Inference: Algorithms, Evidence, and Data Science, New York: Cambridge University Press.
  • Fisher, R. A. (1937), The Design of Experiments (2nd ed.), Edinburgh: Oliver and Boyd.
  • Gelman, A. (2016), “The Problems with P-values are not Just with P-values,” The American Statistician, this issue Supplemental Material to the ASA Statement on P-values and Statistical Significance.
  • Gelman, A., and Hennig, C. (2017), “Beyond Subjective and Objective in Statistics,” Journal of the Royal Statistical Society, Series A, 180, 967–1033. DOI: 10.1111/rssa.12276.
  • Gelman, A., and Stern, H. (2006), “The Difference Between ‘Significant’ and ‘Not Significant’ is Not Itself Statistically Significant,” The American Statistician, 60, 328–331. DOI: 10.1198/000313006X152649.
  • Gigerenzer, G. (1993), “The Superego, the Ego, and the ID in Statistical Reasoning,” in A Handbook for Data Analysis in the Behavioral Sciences, eds G. Keren and C. Lewis, Hillsdale, NJ: Lawrence Erlbaum Associates, pp. 311–339.
  • Good, I. J. (1957), “Some Logic and History of Hypothesis Testing,” in Philosophical Foundations of Economics, ed. J. C. Pitt, Dordrecht, Holland: D. Reidel, pp. 149–174. (Reprinted as Ch. 14 in Good, I. J. (1983), Good Thinking, 129–148, Minneapolis, MN: University of Minnesota Press).
  • Goodman, S. N. (1992), “A Comment on Replication, P-values and Evidence,” Statistics in Medicine, 11, 875–879. DOI: 10.1002/sim.4780110705.
  • Greenland, S. (2011), “Null Misinterpretation in Statistical Testing and Its Impact on Health Risk Assessment,” Preventive Medicine, 53, 225–228. DOI: 10.1016/j.ypmed.2011.08.010.
  • Greenland, S. (2017), “Invited Commentary: The Need for Cognitive Science in Methodology,” American Journal of Epidemiology, 186, 639–645. DOI: 10.1093/aje/kwx259.
  • Greenland, S. (2019a), “Valid P-values Behave Exactly as They Should: Some Misleading Criticisms of P-values and Their Resolution with S-values,” The American Statistician, this issue.
  • Greenland, S. (2019b), “The Unconditional Information in P− values, and Its Refutational Interpretation via S-values,” submitted.
  • Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. C., Poole, C., Goodman, S. N., and Altman, D. G. (2016), “Statistical Tests, Confidence Intervals, and Power: A Guide to Misinterpretations,” The American Statistician, 70, online supplement 1 at http://amstat.tandfonline.com/doi/suppl/10.1080/00031305.2016.1154108/suppl_file/utas_a_1154108_sm5368.pdf; reprinted in European Journal of Epidemiology, 31, 337–350. DOI: 10.1007/s10654-016-0149-3.
  • Halsey, L. G., Curran-Everett, D., Vowler, S. L., and Drummond, G. B. (2015), “The Fickle P-value Generates Irreproducible Results,” Nature Methods, 12, 179–185. DOI: 10.1038/nmeth.3288.
  • Hurlbert, S. H. and Lombardi, C. M. (2009), “Final Collapse of the Neyman-Pearson Decision Theoretic Framework and Rise of the neo Fisherian. Annales Zoologici Fennici, 46, 311–349. DOI: 10.5735/086.046.0501.
  • John, L. K., Loewenstein, G., and Prelec, D. (2012), “Measuring the Prevalence of Questionable Research Practices with Incentives for Truth Telling,” Psychological Science, 23, 524–532. DOI: 10.1177/0956797611430953.
  • Lakens, D., Scheel, A. M., and Isager, P. M. (2018), “Equivalence Testing for Psychological Research: A Tutorial,” Advances in Methods and Practices in Psychological Science, 1, 259–269. DOI: 10.1177/2515245918770963.
  • Lehmann, E. L. (1986), Testing Statistical Hypotheses (2nd ed.), New York: Springer.
  • Little, R. J. (2006), “Calibrated Bayes: A Bayes/Frequentist Roadmap,” The American Statistician, 60, 213–223. DOI: 10.1198/000313006X117837.
  • Locascio, J. (2017), “Results Blind Science Publishing,” Basic and Applied Social Psychology, 39, 239–246. DOI: 10.1080/01973533.2017.1336093.
  • Martinson, B. C., Anderson, M. S., and de Vries, R. (2005), “Scientists Behaving Badly,” Nature, 435, 737–738. DOI: 10.1038/435737a.
  • McShane, B. B., Gal, D., Gelman, A., Robert, C., and Tackett, J. L. (2019), “Abandon Statistical Significance,” The American Statistician.
  • Meehl, P. E. (1990), “Why Summaries of Research on Psychological Theories Are Often Uninterpretable,” Psychological Reports, 66, 195–244. DOI: 10.2466/pr0.1990.66.1.195.
  • Neyman, J., and Pearson, E. S. (1933), “The Testing of Statistical Hypotheses in Relation to Probabilities a priori,” Mathematical Proceedings of the Cambridge Philosophical Society, 29, 492–510. DOI: 10.1017/S030500410001152X.
  • Open Science Collaboration. (2015), “Estimating the Reproducibility of Psychological Science,” Science, 349, aac4716.
  • Poole, C. (1987a), “Beyond the Confidence Interval,” American Journal of Public Health, 77, 195–199.
  • Poole, C. (1987b), “Confidence Intervals Exclude Nothing,” American Journal of Public Health, 77, 492–493. DOI: 10.2105/AJPH.77.4.492.
  • Popper, K. R. (1968), The Logic of Scientific Discovery (2nd English ed.), London: Routledge.
  • Robins, J. M., van der Vaart, A., and Ventura, V. (2000), “Asymptotic Distribution of P-values in Composite Null Models,” Journal of the American Statistical Association, 95, 1143–1156. DOI: 10.2307/2669750.
  • Rothman, K., Greenland, S., and Lash, T. L. (2008), Modern Epidemiology (3rd ed., Ch. 10), Philadelphia, PA: Lippincott Williams & Wilkins.
  • Senn, S. J. (2001), “Two Cheers for P-values?” Journal of Epidemiology and Biostatistics, 6, 193–204.
  • Senn, S. J. (2002), “‘Letter to the Editor’ Re: Goodman 1992,” Statistics in Medicine, 21, 2437–2444. DOI: 10.1002/sim.1072.
  • Senn, S. J. (2011), “You May Believe You Are a Bayesian But You Are Probably Wrong,” Rational Markets and Morals, 2, 48–66.
  • Stark, P. B., and Saltelli, A. (2018), “Cargo-Cult Statistics and Scientific Crisis,” Significance, 15, 40–43. DOI: 10.1111/j.1740-9713.2018.01174.x.
  • Trafimow, D., Amrhein, V., Areshenkoff, C. N., Barrera-Causil, C., Beh, E. J., Bilgiç, Y., Bono, R., Bradley, M. T., Briggs, W. M., Cepeda-Freyre, H. A., Chaigneau, S. E., Ciocca, D. R., Carlos Correa, J., Cousineau, D., de Boer, M. R., Dhar, S. S., Dolgov, I., Gómez-Benito, J., Grendar, M., Grice, J., Guerrero-Gimenez, M. E., Gutiérrez, A., Huedo-Medina, T. B., Jaffe, K., Janyan, A., Karimnezhad, A., Korner-Nievergelt, F., Kosugi, K., Lachmair, M., Ledesma, R., Limongi, R., Liuzza, M. T., Lombardo, R., Marks, M., Meinlschmidt, G., Nalborczyk, L., Nguyen, H. T., Ospina, R., Perezgonzalez, J. D., Pfister, R., Rahona, J. J., Rodríguez-Medina, D. A., Romão, X., Ruiz-Fernández, S., Suarez, I., Tegethoff, M., Tejo, M., van de Schoot, R., Vankov, I., Velasco-Forero, S., Wang, T., Yamada, Y., Zoppino, F. C., and Marmolejo-Ramos, F. (2018), “Manipulating the Alpha Level Cannot Cure Significance Testing,” Frontiers in Psychology, 9, 699.
  • Trafimow, D., and Marks, M. (2015), “Editorial,” Basic and Applied Social Psychology, 37, 1–2. DOI: 10.1080/01973533.2015.1012991.
  • Wellek, S. (2010), Testing Statistical Hypotheses of Equivalence and Noninferiority (2nd ed.), New York: Chapman & Hall.
  • Ziliak, S. T., and McCloskey, D. N. (2008), The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives. Ann Arbor, MI: University of Michigan Press.