58,652
Views
510
CrossRef citations to date
0
Altmetric
Articles

Abandon Statistical Significance

, , , &
Pages 235-245 | Received 30 Oct 2017, Accepted 06 Sep 2018, Published online: 20 Mar 2019

References

  • Amrhein, V., and Greenland, S. (2018), “Remove, Rather Than Redefine, Statistical Significance,” Nature Human Behaviour, 2, 4. DOI:10.1038/s41562-017-0224-0.
  • Amrhein, V., Korner-Nievergelt, F., and Roth, T. (2017). “The Earth is Flat (p > 0.05): Significance Thresholds and the Crisis of Unreplicable Research,” PeerJ, 5, e3544. DOI:10.7717/peerj.3544.
  • Anderson, D. R., Burnham, K. P., and Thompson, W. L. (2000), “Null Hypothesis Testing: Problems, Prevalence, and an Alternative,” Journal of Wildlife Management, 64, 912–923. DOI:10.2307/3803199.
  • Bakan, D. (1966), “The Test of Significance in Psychological Research,” Psychological Bulletin, 66(6), 423–437. DOI:10.1037/h0020412.
  • Bem, D. J. (2011), “Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect,” Journal of Personality and Social Psychology, 100, 407–425. DOI:10.1037/a0021524.
  • Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. J., Berk, R., Bollen, K. A., Brembs, B., Brown, L., Camerer, C., and Cesarini, D. (2018), “Redefine Statistical Significance,” Nature Human Behaviour, 2, 6–10. DOI:10.1038/s41562-017-0189-z.
  • Berger, J. O., and Sellke, T. (1987), “Testing a Point Null Hypothesis: The Irreconciliability of p Values and Evidence,” Journal of the American Statistical Association, 82, 112–122. DOI:10.2307/2289131.
  • Berkson, J. (1938), “Some Difficulties of Interpretation Encountered in the Application of the Chi-Square Test,” Journal of the American Statistical Association, 33, 526–536. DOI:10.1080/01621459.1938.10502329.
  • Boring, E. G. (1919), “Mathematical vs. Scientific Significance,” Psychological Bulletin, 16, 335–338. DOI:10.1037/h0074554.
  • Briggs, W. M. (2016), Uncertainty: The Soul of Modeling, Probability and Statistics, New York: Springer.
  • Carlin, J. B. (2016), “Is Reform Possible Without a Paradigm Shift?” The American Statistician, 901, 10 (supplemental material to the ASA statement on p-values and statistical significance).
  • Carney, D. R., Cuddy, A. J., and Yap, A. J. (2010), “Power Posing: Brief Nonverbal Displays Affect Neuroendocrine Levels and Risk Tolerance,” Psychological Science, 21, 1363–1368. DOI:10.1177/0956797610383437.
  • Cochran, W. G. (1976), “Early Development of Techniques in Comparative Experimentation,” in On the History of Statistics and Probability, New York: Dekker.
  • Cohen, J. (1994), “The Earth is Round (p <.05),” American Psychologist, 49, 997–1003.
  • Cowles, M., and Davis, C. (1982), “On the Origins of the.05 Level of Significance,” American Psychologist, 44, 1276–1284.
  • Cox, D. R. (1977), “The Role of Significance Tests,” Scandinavian Journal of Statistics, 4, 49–70.
  • Cox, D. R. (1982), “Statistical Significance Tests,” British Journal of Clinical Pharmacology, 14, 325–331. DOI:10.1111/j.1365-2125.1982.tb01987.x.
  • Cramer, H. (1955), The Elements of Probability Theory, New York: Wiley.
  • Drummond, G. (2015), “Most of the Time, P Is an Unreliable Marker, So We Need No Exact Cut-Off,” British Journal of Anaesthesia, 116, 894–894. DOI:10.1093/bja/aew146.
  • Edwards, W., Lindman, H., and Savage, L. J. (1963), “Bayesian Statistical Inference for Psychological Research,” Psychological Review, 70, 193. DOI:10.1037/h0044139.
  • Eysenck, H. J. (1960), “The Concept of Statistical Significance and the Controversy About One-Tailed Tests,” Psychological Review, 67, 269. DOI:10.1037/h0048412.
  • Fisher, R. A. (1926), “The Arrangement of Field Experiments,” Journal of the Ministry of Agriculture, 33, 503–513.
  • Fisher, R. A. (1956), Statistical Methods and Scientific Inference, New York: Hafner Publishing Co.
  • Freeman, P. R. (1993), “The Role of p-Values in Analysing Trial Results,” Statistics in Medicine, 12, 1443–1452.
  • Gelman, A. (2015), “The Connection Between Varying Treatment Effects and the Crisis of Unreplicable Research: A Bayesian Perspective,” Journal of Management, 41, 632–643. DOI:10.1177/0149206314525208.
  • Gelman, A. (2016), “The Problems With p-Values Are Not Just With p-Values,” The American Statistician, 70, 10 (supplemental material to the ASA statement on p-values and statistical significance).
  • Gelman, A. (2017), “The Failure of Null Hypothesis Significance Testing When Studying Incremental Changes, and What to do About It,” Personality and Social Psychology Bulletin, 44, 16–23. DOI:10.1177/0146167217729162.
  • Gelman, A., and Auerbach, J. (2016a), “Age-Aggregation Bias in Mortality Trends,” Proceedings of the National Academy of Sciences of the United States of America, 113, E816–E817. DOI:10.1073/pnas.1523465113.
  • Gelman, A., and Auerbach, J. (2016b), “Mortality Trends by Race/Ethnicity, Sex, Age and State,” Technical Report, Columbia University.
  • Gelman, A., and Carlin, J. (2014), “Beyond Power Calculations Assessing Type s (Sign) and Type m (magnitude) Errors,” Perspectives on Psychological Science, 9, 641–651. DOI:10.1177/1745691614551642.
  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2014), Bayesian Data Analysis (3rd ed.), Boca Raton, FL: Chapman and Hall/CRC.
  • Gelman, A., and Loken, E. (2014), “The Statistical Crisis in Science,” American Scientist, 102, 460–465. DOI:10.1511/2014.111.460.
  • Gelman, A., and Robert, C. P. (2014), “Revised Evidence for Statistical Standards,” Proceedings of the National Academy of Sciences of the United States of America, 111, E1933–E1933. DOI:10.1073/pnas.1322995111.
  • Gelman, A., and Stern, H. (2006), “The Difference Between ‘Significant’ and ‘Not Significant’ Is Not Itself Statistically Significant,” The American Statistician, 60, 328–331. DOI:10.1198/000313006X152649.
  • Gigerenzer, G. (1987). The Probabilistic Revolution. Vol. II: Ideas in the Sciences (Vol. II), Cambridge, MA: MIT Press.
  • Gigerenzer, G. (2004), “Mindless Statistics,” Journal of Socio-Economics, 33, 587–606. DOI:10.1016/j.socec.2004.09.033.
  • Gigerenzer, G. (2018), “Statistical Rituals: The Replication Delusion and How We Got There,” Advances in Methods and Practices in Psychological Science, 1, 198–218. DOI:10.1177/2515245918771329.
  • Gigerenzer, G., Krauss, S., and Vitouch, O. (2004), “The Null Ritual: What You Always Wanted to Know About Null Hypothesis Testing But Were Afraid to Ask,” in Handbook on Quantitative Methods in the Social Sciences, Thousand Oaks, CA: Sage Publications, Inc., pp. 389–406.
  • Gill, J. (1999), “The Insignificance of Null Hypothesis Significance Testing,” Political Research Quarterly, 52, 647–674. DOI:10.1177/106591299905200309.
  • Greenland, S. (2017), “Invited Commentary: The Need for Cognitive Science in Methodology,” American Journal of Epidemiology, 186, 639–646. DOI:10.1093/aje/kwx259.
  • Greenland, S., and Poole, C. (2013), “Living With Statistics in Observational Research,” Epidemiology, 24, 73–78. DOI:10.1097/EDE.0b013e3182785a49.
  • Haller, H., and Krauss, S. (2002), “Misinterpretations of Significance: a Problem Students Share With Their Teachers?,” Methods of Psychological Research, 7, 1–20, http://www.mpr-online.de.
  • Holman, C. J., Arnold-Reed, D. E., de Klerk, N., McComb, C., and English, D. R. (2001), “A Psychometric Experiment in Causal Inference to Estimate Evidential Weights Used by Epidemiologists,” Epidemiology, 12, 246–255. DOI:10.1097/00001648-200103000-00019.
  • Hubbard, R. (2004), “Alphabet Soup: Blurring the Distinctions Between p’s and α’s in Psychological Research,” Theory and Psychology, 14, 295–327. DOI:10.1177/0959354304043638.
  • Hubbard, R., and Lindsay, R. M. (2008), “Why p Values Are Not a Useful Measure of Evidence in Statistical Significance Testing,” Theory and Psychology, 18, 69–88. DOI:10.1177/0959354307086923.
  • Hunter, J. E. (1997), “Needed: A Ban on the Significance Test,” Psychological Science, 8, 3–7. DOI:10.1111/j.1467-9280.1997.tb00534.x.
  • Hurlbert, S. H., and Lombardi, C. M. (2009), “Final Collapse of the Neyman–Pearson Decision Theoretic Framework and Rise of the Neofisherian,” Annales Zoologici Fennici, 46, 311–349. DOI:10.5735/086.046.0501.
  • Ioannidis, J. P. A. (2005), “Why Most Published Research Findings Are False,” PLoS Medicine, 2, e124. DOI:10.1371/journal.pmed.0020124.
  • Johnson, V. E. (2013a), “Revised Standards for Statistical Evidence,” Proceedings of the National Academy of Sciences of the United States of America, 110, 19313–19317. DOI:10.1073/pnas.1313476110.
  • Johnson, V. E. (2013b), “Uniformly Most Powerful Bayesian Tests,” Annals of Statistics, 41, 1716–1741. DOI:10.1214/13-AOS1123.
  • Kamary, K., Mengersen, K., Robert, C., and Rousseau, J. (2014), “Testing Hypotheses as a Mixture Estimation Model,” Technical Report, https://arxiv.org/pdf/1214.4436.pdf.
  • Lehmann, E. L. (1993), Testing Statistical Hypotheses, New York: Chapman and Hall.
  • Lemoine, N. P., Hoffman, A., Felton, A. J., Baur, L., Chaves, F., Gray, J., Yu, Q., and Smith, M. D. (2016), “Underappreciated Problems of Low Replication in Ecological Field Studies,” Ecology, 97, 2554–2561. DOI:10.1002/ecy.1506.
  • McCloskey, D. N., and Ziliak, S. (1996), “The Standard Error of Regression,” Journal of Economic Literature, 34, 97–114.
  • McShane, B. B., and Böckenholt, U. (2014), “You Cannot Step Into the Same River Twice: When Power Analyses Are Optimistic,” Perspectives on Psychological Science, 9, 612–625. DOI:10.1177/1745691614548513.
  • McShane, B. B., and Böckenholt, U. (2017), “Single Paper Meta-Analysis: Benefits for Study Summary, Theory-Testing, and Replicability,” Journal of Consumer Research, 43, 1048–1063.
  • McShane, B. B., and Böckenholt, U. (2018), “Multilevel Multivariate Meta-Analysis With Application to Choice Overload,” Psychometrika, 83, 255–271. DOI:10.1007/s11336-017-9571-z.
  • McShane, B. B., and Gal, D. (2016), “Blinding Us to the Obvious? The Effect of Statistical Training on the Evaluation of Evidence,” Management Science, 62, 1707–1718. DOI:10.1287/mnsc.2015.2212.
  • McShane, B. B., and Gal, D. (2017), “Statistical Significance and the Dichotomization of Evidence,” Journal of the American Statistical Association, 112, 885–895. DOI:10.1080/01621459.2017.1289846.
  • Meehl, P. E. (1978), “Theoretical Risks and Tabular Asterisks: Sir Karl, Sir Ronald, and the Slow Progress of Soft Psychology,” Journal of Counseling and Clinical Psychology, 46, 806–834. DOI:10.1037/0022-006X.46.4.806.
  • Meehl, P. E. (1990), “Why Summaries of Research on Psychological Theories Are Often uninterpretable,” Psychological Reports, 66, 195–244. DOI:10.2466/pr0.1990.66.1.195.
  • Mitchell, S., Gelman, A., Ross, R., Chen, J., Bari, S., Huynh, U. K., Harris, M. W., Sachs, S. E., Stuart, E. A., Feller, A., and Makela, S. (2018), “The Millennium Villages Project: A Retrospective, Observational, Endline Evaluation,” The Lancet, 6, e500–e513. DOI:10.1016/S2214-109X(18)30065-2.
  • Morrison, D. E., and Henkel, R. E. (1970), The Significance Test Controversy, Chicago: Aldine.
  • Oakes, M. (1986), Statistical Inference: A Commentary for the Social and Behavioral Sciences, New York: Wiley.
  • Pericchi, L., Pereira, C. A., and Pérez, M.-E. (2014), “Adaptive Revised Standards for Statistical Evidence,” Proceedings of the National Academy of Sciences of the United States of America, 111, E1935–E1935. DOI:10.1073/pnas.1322191111.
  • Resnick, B. (2017), “What a Nerdy Debate About p-Values Shows About Science—And How to Fix It,” Technical Report.
  • Rosnow, R. L., and Rosenthal, R. (1989), “Statistical Procedures and the Justification of Knowledge in Psychological Science,” American Psychologist, 44, 1276–1284. DOI:10.1037/0003-066X.44.10.1276.
  • Rozenboom, W. W. (1960), “The Fallacy of the Null Hypothesis Significance Test,” Psychological Bulletin, 57, 416–428.
  • Sawyer, A. G., and Peter, J. P. (1983), “The Significance of Statistical Significance Tests in Marketing Research,” Journal of Marketing Research, 20, 122–133. DOI:10.1177/002224378302000203.
  • Schmidt, F. L. (1996), “Statistical Significance Testing and Cumulative Knowledge in Psychology: Implications for the Training of Researchers,” Psychological Methods, 1, 115–129. DOI:10.1037/1082-989X.1.2.115.
  • Senn, S. S. (2001), “Two Cheers for p-Values?,” Journal of Epidemiology and Biostatistics, 6, 193–204.
  • Serlin, R. C., and Lapsley, D. K. (1993), “Rational Appraisal Psychological Research and the Good Enough Principle,” in A Handbook for Data Analysis in the Behavioral Sciences: Methodological Issues, Hillsdale, NJ: Lawrence Erlbaum Associates.
  • Simmons, J. P., Nelson, L. D., and Simonsohn, U. (2011), “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant,” Psychological Science, 22, 1359–1366. DOI:10.1177/0956797611417632.
  • Skipper, J. K., Guenther, A. L., and Nass, G. (1967), “The Sacredness of.05: A Note Concerning the Uses of Statistical Levels of Significance in Social Science,” The American Sociologist, 5, 16–18.
  • Smaldino, P. E., and McElreath, R. (2016), “The Natural Selection of Bad Science,” Technical Report, https://arxiv.org/pdf/1605.09511v1.pdf.
  • Tackett, J. L., Kushner, S. C., Herzhoff, K., Smack, A. J., and Reardon, K. W. (2014), “Viewing Relational Aggression Through Multiple Lenses: Temperament, Personality, and Personality Pathology,” Development and Psychopathology, 26, 863–877. DOI:10.1017/S0954579414000443.
  • Trangucci, R., Ali, I., Gelman, A., and Rivers, D. (2018), “Voting Patterns in 2016: Exploration Using Multilevel Regression and Poststratifi-cation (MRP) on Pre-Election Polls,” arXiv preprint arXiv:1802.00842.
  • Tukey, J. W. (1991), “The Philosophy of Multiple Comparisons,” Statistical Science, 6, 100–116. DOI:10.1214/ss/1177011945.
  • Wasserstein, R. L., and Lazar, N. A. (2016), “The ASA’s Statement on p-Values: Context, Process, and Purpose,” The American Statistician, 70, 129–133. DOI:10.1080/00031305.2016.1154108.
  • Yule, G. U., and Kendall, M. G. (1950), An Introduction to the Theory of Statistics (14th ed.), London: Griffin.