Search in:

The American Statistician Volume 73, 2019 - Issue sup1: Statistical Inference in the 21st Century: A World Beyond p < 0.05

Submit an article Journal homepage

Open access

58,652

Views

510

CrossRef citations to date

Altmetric

Articles

Abandon Statistical Significance

Blakeley B. McShaneDepartment of Marketing, Kellogg School of Management, Northwestern University, Evanston, IL; Correspondence[email protected]

David GalDepartment of Managerial Studies, College of Business Administration, University of Illinois at Chicago, Chicago, IL;

Andrew GelmanDepartment of Statistics and Department of Political Science, Columbia University, New York, NY;

Christian RobertCentre de Recherche en Mathématiques de la Décision (CEREMADE), Université Paris-Dauphine, Paris, France;

Jennifer L. TackettDepartment of Psychology, Northwestern University, Evanston, IL

Pages 235-245 | Received 30 Oct 2017, Accepted 06 Sep 2018, Published online: 20 Mar 2019

Cite this article
https://doi.org/10.1080/00031305.2018.1527253
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

References

Amrhein, V., and Greenland, S. (2018), “Remove, Rather Than Redefine, Statistical Significance,” Nature Human Behaviour, 2, 4. DOI:10.1038/s41562-017-0224-0.
PubMed Web of Science ®Google Scholar
Amrhein, V., Korner-Nievergelt, F., and Roth, T. (2017). “The Earth is Flat (p > 0.05): Significance Thresholds and the Crisis of Unreplicable Research,” PeerJ, 5, e3544. DOI:10.7717/peerj.3544.
PubMed Web of Science ®Google Scholar
Anderson, D. R., Burnham, K. P., and Thompson, W. L. (2000), “Null Hypothesis Testing: Problems, Prevalence, and an Alternative,” Journal of Wildlife Management, 64, 912–923. DOI:10.2307/3803199.
Web of Science ®Google Scholar
Bakan, D. (1966), “The Test of Significance in Psychological Research,” Psychological Bulletin, 66(6), 423–437. DOI:10.1037/h0020412.
PubMed Web of Science ®Google Scholar
Bem, D. J. (2011), “Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect,” Journal of Personality and Social Psychology, 100, 407–425. DOI:10.1037/a0021524.
PubMed Web of Science ®Google Scholar
Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. J., Berk, R., Bollen, K. A., Brembs, B., Brown, L., Camerer, C., and Cesarini, D. (2018), “Redefine Statistical Significance,” Nature Human Behaviour, 2, 6–10. DOI:10.1038/s41562-017-0189-z.
PubMed Web of Science ®Google Scholar
Berger, J. O., and Sellke, T. (1987), “Testing a Point Null Hypothesis: The Irreconciliability of p Values and Evidence,” Journal of the American Statistical Association, 82, 112–122. DOI:10.2307/2289131.
Web of Science ®Google Scholar
Berkson, J. (1938), “Some Difficulties of Interpretation Encountered in the Application of the Chi-Square Test,” Journal of the American Statistical Association, 33, 526–536. DOI:10.1080/01621459.1938.10502329.
Google Scholar
Boring, E. G. (1919), “Mathematical vs. Scientific Significance,” Psychological Bulletin, 16, 335–338. DOI:10.1037/h0074554.
Google Scholar
Briggs, W. M. (2016), Uncertainty: The Soul of Modeling, Probability and Statistics, New York: Springer.
Google Scholar
Carlin, J. B. (2016), “Is Reform Possible Without a Paradigm Shift?” The American Statistician, 901, 10 (supplemental material to the ASA statement on p-values and statistical significance).
Google Scholar
Carney, D. R., Cuddy, A. J., and Yap, A. J. (2010), “Power Posing: Brief Nonverbal Displays Affect Neuroendocrine Levels and Risk Tolerance,” Psychological Science, 21, 1363–1368. DOI:10.1177/0956797610383437.
PubMed Web of Science ®Google Scholar
Cochran, W. G. (1976), “Early Development of Techniques in Comparative Experimentation,” in On the History of Statistics and Probability, New York: Dekker.
Google Scholar
Cohen, J. (1994), “The Earth is Round (p <.05),” American Psychologist, 49, 997–1003.
Web of Science ®Google Scholar
Cowles, M., and Davis, C. (1982), “On the Origins of the.05 Level of Significance,” American Psychologist, 44, 1276–1284.
Google Scholar
Cox, D. R. (1977), “The Role of Significance Tests,” Scandinavian Journal of Statistics, 4, 49–70.
Web of Science ®Google Scholar
Cox, D. R. (1982), “Statistical Significance Tests,” British Journal of Clinical Pharmacology, 14, 325–331. DOI:10.1111/j.1365-2125.1982.tb01987.x.
PubMed Web of Science ®Google Scholar
Cramer, H. (1955), The Elements of Probability Theory, New York: Wiley.
Google Scholar
Drummond, G. (2015), “Most of the Time, P Is an Unreliable Marker, So We Need No Exact Cut-Off,” British Journal of Anaesthesia, 116, 894–894. DOI:10.1093/bja/aew146.
Web of Science ®Google Scholar
Edwards, W., Lindman, H., and Savage, L. J. (1963), “Bayesian Statistical Inference for Psychological Research,” Psychological Review, 70, 193. DOI:10.1037/h0044139.
Web of Science ®Google Scholar
Eysenck, H. J. (1960), “The Concept of Statistical Significance and the Controversy About One-Tailed Tests,” Psychological Review, 67, 269. DOI:10.1037/h0048412.
PubMed Web of Science ®Google Scholar
Fisher, R. A. (1926), “The Arrangement of Field Experiments,” Journal of the Ministry of Agriculture, 33, 503–513.
Google Scholar
Fisher, R. A. (1956), Statistical Methods and Scientific Inference, New York: Hafner Publishing Co.
Google Scholar
Freeman, P. R. (1993), “The Role of p-Values in Analysing Trial Results,” Statistics in Medicine, 12, 1443–1452.
PubMed Web of Science ®Google Scholar
Gelman, A. (2015), “The Connection Between Varying Treatment Effects and the Crisis of Unreplicable Research: A Bayesian Perspective,” Journal of Management, 41, 632–643. DOI:10.1177/0149206314525208.
Web of Science ®Google Scholar
Gelman, A. (2016), “The Problems With p-Values Are Not Just With p-Values,” The American Statistician, 70, 10 (supplemental material to the ASA statement on p-values and statistical significance).
Google Scholar
Gelman, A. (2017), “The Failure of Null Hypothesis Significance Testing When Studying Incremental Changes, and What to do About It,” Personality and Social Psychology Bulletin, 44, 16–23. DOI:10.1177/0146167217729162.
PubMed Web of Science ®Google Scholar
Gelman, A., and Auerbach, J. (2016a), “Age-Aggregation Bias in Mortality Trends,” Proceedings of the National Academy of Sciences of the United States of America, 113, E816–E817. DOI:10.1073/pnas.1523465113.
PubMed Web of Science ®Google Scholar
Gelman, A., and Auerbach, J. (2016b), “Mortality Trends by Race/Ethnicity, Sex, Age and State,” Technical Report, Columbia University.
Google Scholar
Gelman, A., and Carlin, J. (2014), “Beyond Power Calculations Assessing Type s (Sign) and Type m (magnitude) Errors,” Perspectives on Psychological Science, 9, 641–651. DOI:10.1177/1745691614551642.
Web of Science ®Google Scholar
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2014), Bayesian Data Analysis (3rd ed.), Boca Raton, FL: Chapman and Hall/CRC.
Google Scholar
Gelman, A., and Loken, E. (2014), “The Statistical Crisis in Science,” American Scientist, 102, 460–465. DOI:10.1511/2014.111.460.
Web of Science ®Google Scholar
Gelman, A., and Robert, C. P. (2014), “Revised Evidence for Statistical Standards,” Proceedings of the National Academy of Sciences of the United States of America, 111, E1933–E1933. DOI:10.1073/pnas.1322995111.
PubMed Web of Science ®Google Scholar
Gelman, A., and Stern, H. (2006), “The Difference Between ‘Significant’ and ‘Not Significant’ Is Not Itself Statistically Significant,” The American Statistician, 60, 328–331. DOI:10.1198/000313006X152649.
Web of Science ®Google Scholar
Gigerenzer, G. (1987). The Probabilistic Revolution. Vol. II: Ideas in the Sciences (Vol. II), Cambridge, MA: MIT Press.
Google Scholar
Gigerenzer, G. (2004), “Mindless Statistics,” Journal of Socio-Economics, 33, 587–606. DOI:10.1016/j.socec.2004.09.033.
Google Scholar
Gigerenzer, G. (2018), “Statistical Rituals: The Replication Delusion and How We Got There,” Advances in Methods and Practices in Psychological Science, 1, 198–218. DOI:10.1177/2515245918771329.
Google Scholar
Gigerenzer, G., Krauss, S., and Vitouch, O. (2004), “The Null Ritual: What You Always Wanted to Know About Null Hypothesis Testing But Were Afraid to Ask,” in Handbook on Quantitative Methods in the Social Sciences, Thousand Oaks, CA: Sage Publications, Inc., pp. 389–406.
Google Scholar
Gill, J. (1999), “The Insignificance of Null Hypothesis Significance Testing,” Political Research Quarterly, 52, 647–674. DOI:10.1177/106591299905200309.
Web of Science ®Google Scholar
Greenland, S. (2017), “Invited Commentary: The Need for Cognitive Science in Methodology,” American Journal of Epidemiology, 186, 639–646. DOI:10.1093/aje/kwx259.
PubMed Web of Science ®Google Scholar
Greenland, S., and Poole, C. (2013), “Living With Statistics in Observational Research,” Epidemiology, 24, 73–78. DOI:10.1097/EDE.0b013e3182785a49.
PubMed Web of Science ®Google Scholar
Haller, H., and Krauss, S. (2002), “Misinterpretations of Significance: a Problem Students Share With Their Teachers?,” Methods of Psychological Research, 7, 1–20, http://www.mpr-online.de.
Google Scholar
Holman, C. J., Arnold-Reed, D. E., de Klerk, N., McComb, C., and English, D. R. (2001), “A Psychometric Experiment in Causal Inference to Estimate Evidential Weights Used by Epidemiologists,” Epidemiology, 12, 246–255. DOI:10.1097/00001648-200103000-00019.
PubMed Web of Science ®Google Scholar
Hubbard, R. (2004), “Alphabet Soup: Blurring the Distinctions Between p’s and α’s in Psychological Research,” Theory and Psychology, 14, 295–327. DOI:10.1177/0959354304043638.
Web of Science ®Google Scholar
Hubbard, R., and Lindsay, R. M. (2008), “Why p Values Are Not a Useful Measure of Evidence in Statistical Significance Testing,” Theory and Psychology, 18, 69–88. DOI:10.1177/0959354307086923.
Web of Science ®Google Scholar
Hunter, J. E. (1997), “Needed: A Ban on the Significance Test,” Psychological Science, 8, 3–7. DOI:10.1111/j.1467-9280.1997.tb00534.x.
Web of Science ®Google Scholar
Hurlbert, S. H., and Lombardi, C. M. (2009), “Final Collapse of the Neyman–Pearson Decision Theoretic Framework and Rise of the Neofisherian,” Annales Zoologici Fennici, 46, 311–349. DOI:10.5735/086.046.0501.
Web of Science ®Google Scholar
Ioannidis, J. P. A. (2005), “Why Most Published Research Findings Are False,” PLoS Medicine, 2, e124. DOI:10.1371/journal.pmed.0020124.
PubMed Web of Science ®Google Scholar
Johnson, V. E. (2013a), “Revised Standards for Statistical Evidence,” Proceedings of the National Academy of Sciences of the United States of America, 110, 19313–19317. DOI:10.1073/pnas.1313476110.
PubMed Web of Science ®Google Scholar
Johnson, V. E. (2013b), “Uniformly Most Powerful Bayesian Tests,” Annals of Statistics, 41, 1716–1741. DOI:10.1214/13-AOS1123.
PubMed Web of Science ®Google Scholar
Kamary, K., Mengersen, K., Robert, C., and Rousseau, J. (2014), “Testing Hypotheses as a Mixture Estimation Model,” Technical Report, https://arxiv.org/pdf/1214.4436.pdf.
Google Scholar
Lehmann, E. L. (1993), Testing Statistical Hypotheses, New York: Chapman and Hall.
Google Scholar
Lemoine, N. P., Hoffman, A., Felton, A. J., Baur, L., Chaves, F., Gray, J., Yu, Q., and Smith, M. D. (2016), “Underappreciated Problems of Low Replication in Ecological Field Studies,” Ecology, 97, 2554–2561. DOI:10.1002/ecy.1506.
PubMed Web of Science ®Google Scholar
McCloskey, D. N., and Ziliak, S. (1996), “The Standard Error of Regression,” Journal of Economic Literature, 34, 97–114.
Web of Science ®Google Scholar
McShane, B. B., and Böckenholt, U. (2014), “You Cannot Step Into the Same River Twice: When Power Analyses Are Optimistic,” Perspectives on Psychological Science, 9, 612–625. DOI:10.1177/1745691614548513.
Web of Science ®Google Scholar
McShane, B. B., and Böckenholt, U. (2017), “Single Paper Meta-Analysis: Benefits for Study Summary, Theory-Testing, and Replicability,” Journal of Consumer Research, 43, 1048–1063.
Web of Science ®Google Scholar
McShane, B. B., and Böckenholt, U. (2018), “Multilevel Multivariate Meta-Analysis With Application to Choice Overload,” Psychometrika, 83, 255–271. DOI:10.1007/s11336-017-9571-z.
PubMed Web of Science ®Google Scholar
McShane, B. B., and Gal, D. (2016), “Blinding Us to the Obvious? The Effect of Statistical Training on the Evaluation of Evidence,” Management Science, 62, 1707–1718. DOI:10.1287/mnsc.2015.2212.
Web of Science ®Google Scholar
McShane, B. B., and Gal, D. (2017), “Statistical Significance and the Dichotomization of Evidence,” Journal of the American Statistical Association, 112, 885–895. DOI:10.1080/01621459.2017.1289846.
Web of Science ®Google Scholar
Meehl, P. E. (1978), “Theoretical Risks and Tabular Asterisks: Sir Karl, Sir Ronald, and the Slow Progress of Soft Psychology,” Journal of Counseling and Clinical Psychology, 46, 806–834. DOI:10.1037/0022-006X.46.4.806.
Web of Science ®Google Scholar
Meehl, P. E. (1990), “Why Summaries of Research on Psychological Theories Are Often uninterpretable,” Psychological Reports, 66, 195–244. DOI:10.2466/pr0.1990.66.1.195.
Web of Science ®Google Scholar
Mitchell, S., Gelman, A., Ross, R., Chen, J., Bari, S., Huynh, U. K., Harris, M. W., Sachs, S. E., Stuart, E. A., Feller, A., and Makela, S. (2018), “The Millennium Villages Project: A Retrospective, Observational, Endline Evaluation,” The Lancet, 6, e500–e513. DOI:10.1016/S2214-109X(18)30065-2.
PubMed Web of Science ®Google Scholar
Morrison, D. E., and Henkel, R. E. (1970), The Significance Test Controversy, Chicago: Aldine.
Google Scholar
Oakes, M. (1986), Statistical Inference: A Commentary for the Social and Behavioral Sciences, New York: Wiley.
Google Scholar
Pericchi, L., Pereira, C. A., and Pérez, M.-E. (2014), “Adaptive Revised Standards for Statistical Evidence,” Proceedings of the National Academy of Sciences of the United States of America, 111, E1935–E1935. DOI:10.1073/pnas.1322191111.
PubMed Web of Science ®Google Scholar
Resnick, B. (2017), “What a Nerdy Debate About p-Values Shows About Science—And How to Fix It,” Technical Report.
Google Scholar
Rosnow, R. L., and Rosenthal, R. (1989), “Statistical Procedures and the Justification of Knowledge in Psychological Science,” American Psychologist, 44, 1276–1284. DOI:10.1037/0003-066X.44.10.1276.
Web of Science ®Google Scholar
Rozenboom, W. W. (1960), “The Fallacy of the Null Hypothesis Significance Test,” Psychological Bulletin, 57, 416–428.
PubMed Web of Science ®Google Scholar
Sawyer, A. G., and Peter, J. P. (1983), “The Significance of Statistical Significance Tests in Marketing Research,” Journal of Marketing Research, 20, 122–133. DOI:10.1177/002224378302000203.
Web of Science ®Google Scholar
Schmidt, F. L. (1996), “Statistical Significance Testing and Cumulative Knowledge in Psychology: Implications for the Training of Researchers,” Psychological Methods, 1, 115–129. DOI:10.1037/1082-989X.1.2.115.
Web of Science ®Google Scholar
Senn, S. S. (2001), “Two Cheers for p-Values?,” Journal of Epidemiology and Biostatistics, 6, 193–204.
PubMedGoogle Scholar
Serlin, R. C., and Lapsley, D. K. (1993), “Rational Appraisal Psychological Research and the Good Enough Principle,” in A Handbook for Data Analysis in the Behavioral Sciences: Methodological Issues, Hillsdale, NJ: Lawrence Erlbaum Associates.
Google Scholar
Simmons, J. P., Nelson, L. D., and Simonsohn, U. (2011), “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant,” Psychological Science, 22, 1359–1366. DOI:10.1177/0956797611417632.
PubMed Web of Science ®Google Scholar
Skipper, J. K., Guenther, A. L., and Nass, G. (1967), “The Sacredness of.05: A Note Concerning the Uses of Statistical Levels of Significance in Social Science,” The American Sociologist, 5, 16–18.
Google Scholar
Smaldino, P. E., and McElreath, R. (2016), “The Natural Selection of Bad Science,” Technical Report, https://arxiv.org/pdf/1605.09511v1.pdf.
Google Scholar
Tackett, J. L., Kushner, S. C., Herzhoff, K., Smack, A. J., and Reardon, K. W. (2014), “Viewing Relational Aggression Through Multiple Lenses: Temperament, Personality, and Personality Pathology,” Development and Psychopathology, 26, 863–877. DOI:10.1017/S0954579414000443.
PubMed Web of Science ®Google Scholar
Trangucci, R., Ali, I., Gelman, A., and Rivers, D. (2018), “Voting Patterns in 2016: Exploration Using Multilevel Regression and Poststratifi-cation (MRP) on Pre-Election Polls,” arXiv preprint arXiv:1802.00842.
Google Scholar
Tukey, J. W. (1991), “The Philosophy of Multiple Comparisons,” Statistical Science, 6, 100–116. DOI:10.1214/ss/1177011945.
Google Scholar
Wasserstein, R. L., and Lazar, N. A. (2016), “The ASA’s Statement on p-Values: Context, Process, and Purpose,” The American Statistician, 70, 129–133. DOI:10.1080/00031305.2016.1154108.
Web of Science ®Google Scholar
Yule, G. U., and Kendall, M. G. (1950), An Introduction to the Theory of Statistics (14th ed.), London: Griffin.
Google Scholar

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Abandon Statistical Significance

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Abandon Statistical Significance

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date