Search in:

The American Statistician Volume 73, 2019 - Issue sup1: Statistical Inference in the 21st Century: A World Beyond p < 0.05

Submit an article Journal homepage

Open access

40,225

Views

184

CrossRef citations to date

Altmetric

Interpreting and Using p

Valid P-Values Behave Exactly as They Should: Some Misleading Criticisms of P-Values and Their Resolution With S-Values

Sander GreenlandDepartment of Epidemiology and Department of Statistics, University of California, Los Angeles, CACorrespondence[email protected]

Pages 106-114 | Received 19 Mar 2018, Accepted 24 Sep 2018, Published online: 20 Mar 2019

Cite this article
https://doi.org/10.1080/00031305.2018.1529625
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

References

Amrhein, V., Korner-Nievergelt, F., and Roth, T. (2017), “The Earth is Flat (p > 0.05): Significance Thresholds and the Crisis of Unreplicable Research,” Peer J, 5, e3544. DOI: 10.7717/peerj.3544.
PubMed Web of Science ®Google Scholar
Amrhein, V., Trafimow, D., and Greenland, S. (2018), “Inferential Statistics are Descriptive Statistics,” The American Statistician, this issue.
PubMed Web of Science ®Google Scholar
Bayarri, M. J., and Berger, J. O. (1999), “Quantifying Surprise in the Data and Model Verification,” in Bayesian Statistics 6, eds. J. M. Bernardo, J.O. Berger, A.P. Dawid, and A. F. M. Smith, Oxford, UK: Oxford University Press, pp. 53–82.
Google Scholar
Bayarri, M. J., and Berger, J. O. (2000), “Values for Composite Null Models,” Journal of the American Statistical Association, 95, 1127–1142. DOI: 10.2307/2669749.
Web of Science ®Google Scholar
Bayarri, M. J., and Berger, J. O. (2004), “The Interplay of Bayesian and Frequentist Analysis,” Statistical Science, 19, 58–80. DOI: 10.1214/088342304000000116.
Web of Science ®Google Scholar
Benjamini, Y. (2016), “It’s Not the P-values’ Fault,” The American Statistician, Online Supplement to ASA Statement on P-values. 70, online supplement 1, available at http://amstat.tandfonline.com/doi/suppl/10.1080/00031305.2016.1154108/suppl_file/xxxx.
Google Scholar
Berger, J. O., and Sellke, T. M. (1987), “Testing a Point Null Hypothesis: The Irreconcilability of P-values and Evidence” (with discussion), Journal of the American Statistical Association, 82 112–139. DOI: 10.2307/2289131.
Web of Science ®Google Scholar
Berger, J. O., and Wolpert, R. L. (1988), “The Likelihood Principle” (with discussion) (2nd ed.), IMS Lecture Notes-Monograph Series, 6, 1–199.
Google Scholar
Berger, R. L., and Boos, D. D. (1994), “P Values Maximized Over a Confidence Set for the Nuisance Parameter,” Journal of the American Statistical Association, 89, 1012–1016. DOI: 10.2307/2290928.
Web of Science ®Google Scholar
Berger, R. L., and Hsu, J. C. (1996), “Bioequivalence Trials, Intersection-Union Tests, and Equivalence Confidence Sets,” Statistical Science, 11, 283–319. DOI: 10.1214/ss/1032280304.
Web of Science ®Google Scholar
Boos, D. D., and Stefanski, L. A. (2011), “P-Value Precision and Reproducibility,” The American Statistician, 65, 213–221. DOI: 10.1198/tas.2011.10129.
PubMed Web of Science ®Google Scholar
Box, G. E. P. (1980), “Sampling and Bayes Inference in Scientific Modeling and Robustness,” Journal of the Royal Statistical Society, Series A, 143, 383–430. DOI: 10.2307/2982063.
Web of Science ®Google Scholar
Casella, G., and Berger, R. L. (1987), “Reconciling Bayesian and Frequentist Evidence in the 1-sided Testing Problem” (with discussion), Journal of the American Statistical Association, 82, 106–135. DOI: 10.1080/01621459.1987.10478396.
Web of Science ®Google Scholar
Casella, G., and Berger, R. L. (1987), “Comment,” Statistical Science, 2, 344–417. DOI: 10.1214/ss/1177013243.
Google Scholar
Cohen, J. (1994), “The Earth is Round (p < 0.05),” American Psychology, 47, 997–1003.
Web of Science ®Google Scholar
Cox, D. R., and Donnelly, C. A. (2011), Principle of Applied Statistics, Cambridge, UK: Cambridge University Press.
Google Scholar
Cox, D. R., and Hinkley, D. V. (1974), Theoretical Statistics, New York: Chapman and Hall.
Google Scholar
Edwards, A. W. F. (1992), Likelihood (2nd ed.), Baltimore, MD: Johns Hopkins University Press.
Google Scholar
Fisher, R. A. (1925), Statistical Methods for Research Workers, Edinburgh, UK: Oliver and Boyd.
Google Scholar
Fraundorf, P. (2017), “Examples of Surprisal,” available at http://www.umsl.edu/∼fraundorfp/egsurpri.html.
Google Scholar
Gelman, A. (2013), “P Values and Statistical Practice,” Epidemiology, 24, 69–72. DOI: 10.1097/EDE.0b013e31827886f7.
PubMed Web of Science ®Google Scholar
Gelman, A., and Stern, H. (2006), “The Difference Between ‘Significant’ and ‘Not Significant’ is not Itself Statistically Significant,” The American Statistician, 60, 328–331. DOI: 10.1198/000313006X152649.
Web of Science ®Google Scholar
Gigerenzer, G. (2004), “Mindless Statistics,” Journal of Socio-Economics, 33, 587–606. DOI: 10.1016/j.socec.2004.09.033.
Google Scholar
Good, I. J. (1956), “The Surprise Index for the Multivariate Normal Distribution,” The Annals of Mathematical Statistics, 27, 1130–1135. DOI: 10.1214/aoms/1177728079.
Google Scholar
Good, I. J. (1983), “Some Logic and History of Hypothesis Testing,” in Philosophical Foundations of Economics, ed. J. C. Pitt, Dordrecht: D. Reidel, pp. 149–174. Reprinted as Ch. 14 in Good, I.J. (1983), Good Thinking, Minneapolis, MN: University of Minnesota Press, pp. 129–148.
Google Scholar
Goodman, S. N. (1992), “A Comment on Replication, p-values and Evidence,” Statistics in Medicine, 11, 875–879. DOI: 10.1002/sim.4780110705.
PubMed Web of Science ®Google Scholar
Goodman, S. N. (1999), “Towards Evidence-Based Medical Statistics, I: The P-value Fallacy,” Annals of Internal Medicine, 130, 995–1004. DOI: 10.7326/0003-4819-130-12-199906150-00008.
PubMed Web of Science ®Google Scholar
Greenland, S. (2004), “The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics,” Wake Forest Law Review, 39, 291–310.
Google Scholar
Greenland, S. (2017), “The Need for Cognitive Science in Methodology,” American Journal of Epidemiology, 186, 639–645. DOI: 10.1093/aje/kwx259.
PubMed Web of Science ®Google Scholar
Greenland, S. (2018), “The Unconditional Information in P-values, and Its Refutational Interpretation via S-values,” manuscript.
Google Scholar
Greenland, S., and Poole, C. (2013), “Living with Statistics in Observational Research,” Epidemiology (Cambridge, Mass.), 24, 73–78. DOI: 10.1097/EDE.0b013e3182785a49.
PubMed Web of Science ®Google Scholar
Greenland, S., Senn, S.J., Rothman, K.J., Carlin, J.C., Poole, C., Goodman, S.N., and Altman, D.G. (2016), “Statistical Tests, Confidence Intervals, and Power: A Guide to Misinterpretations,” The American Statistician, 70, online supplement 1, available at http://amstat.tandfonline.com/doi/suppl/10.1080/00031305.2016.1154108/suppl_file/utas_a_1154108_sm5368.pdf; reprinted in the European Journal of Epidemiology, 31, 337–350. DOI: 10.1007/s10654-016-0149-3.
Web of Science ®Google Scholar
Hoekstra, R., Finch, S., Kiers, H. A. L., and Johnson, A. (2006), “Probability as Certainty: Dichotomous Thinking and the Misuse of p-values,” Psychonomic Bulletin & Review, 13, 1033–1037. DOI: 10.3758/BF03213921.
PubMed Web of Science ®Google Scholar
Hubbard, R., and Bayarri, M. J. (2003), “Confusion Over Measures of Evidence (p’s) Versus Errors (α’s) in Classical Statistical Testing,” The American Statistician, 57, 171–177. DOI: 10.1198/0003130031856.
Web of Science ®Google Scholar
Hubbard, R., and Lindsay, R. M. (2008), “Why P Values Are Not a Useful Measure of Evidence in Statistical Significance Testing,” Theory & Psychology, 18, 69–88. DOI: 10.1177/0959354307086923.
Web of Science ®Google Scholar
Hurlbert, S. H., and Lombardi, C. M. (2009), “Final Collapse of the Neyman–Pearson Decision Theoretic Framework and Rise of the neoFisherian,” Annales Zoologici Fennici, 46, 311–349. DOI: 10.5735/086.046.0501.
Web of Science ®Google Scholar
Kuffner, T. A., & Walker, S. G. (2017), “Why Are p-values Controversial?” The American Statistician, in Press, 1. DOI: 10.1080/00031305.2016.1277161.
Google Scholar
Lakens, D., Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A. J., Argamon, S. E., Baguley, T., Becker, R. B., Benning, S. D., Bradford, D. E., Buchanan, E. M., Caldwell, A. R., Van Calster, B., Carlsson, R., Chen, S.-C., Chung, B., Colling, L. J., Collins, G. S., Crook, Z., Cross, E. S., Daniels, S., Danielsson, H., DeBruine, L., Dunleavy, D. J., Earp, B. D., Feist, M. I., Ferrell, J. D., Field, J. G., Fox, N. W., Friesen, A., Gomes, C., Gonzalez-Marquez, M., Grange, J. A., Grieve, A. P., Guggenberger, R., Grist, J., van Harmelen, A.-L., Hasselman, F., Hochard, K. D., Hoffarth, M. R., Holmes, N. P., Ingre, M., Isager, P. M., Isotalus, H. K., Johansson, C., Juszczyk, K., Kenny, D. A., Khalil, A. A., Konat, B., Lao, J., Larsen, E. G., Lodder, G. M. A., Lukavský, J., Madan, C. R., Manheim, D., Martin, S. R., Martin, A. E., Mayo, D. G., McCarthy, R. J., McConway, K., McFarland, C., Nio, A. Q. X., Nilsonne, G., de Oliveira, C. L., de Xivry, J.-J. O., Parsons, S., Pfuhl, G., Quinn, K. A., Sakon, J. J., Saribay, S. A., Schneider, I. K., Selvaraju, M., Sjoerds, Z., Smith, S. G., Smits, T., Spies, J. R., Sreekumar, V., Steltenpohl, C. N., Stenhouse, N., Świa̧tkowski, W., Vadillo, M. A., Van Assen, M. A. L. M., Williams, M. N., Williams, S. E., Williams, D. R., Yarkoni, T., Ziano, I., & Zwaan, R. A.) (2018), “Justify Your Alpha: A Response to ‘Redefine Statistical Significance,” Nature Human Behaviour, 2, 168–171.
Google Scholar
Lane, D. (1988), “Discussion of Berger and Wolpert,” IMS Lecture Notes-Monograph, 6, 175–181.
Google Scholar
Lang, J. M., Rothman, K. J., and Cann, C. I. (1998), “That Confounded P-value,” Epidemiology (Cambridge, Mass.), 9, 7—8.
PubMed Web of Science ®Google Scholar
LeCam, L. (1988), “Discussion of Berger and Wolpert,” IMS Lecture Notes-Monograph, 6, 182–185.
Google Scholar
Lehmann, E. L. (1986), Testing Statistical Hypotheses, New York: Wiley.
Google Scholar
Lindeman, M., & Stark, P. B. (2012), “A Gentle Introduction to Risk-limiting Audits,” IEEE Security & Privacy, 10, 42–49. DOI: 10.1109/MSP.2012.56.
Web of Science ®Google Scholar
MacKay, D. J. C. (2003), Information Theory, Inference, and Learning Algorithms, Cambridge, Cambridge University Press, sec. 2.4, available at http://www.inference.org.uk/mackay/itila/book.html
Google Scholar
McShane, B. B., and Gal, D. (2017), “Statistical Significance and the Dichotomization of Evidence” (with discussion), Journal of the American Statistical Association, 112, 885–908. DOI: 10.1080/01621459.2017.1289846.
Web of Science ®Google Scholar
McShane, B. B., Gal, D., Gelman, A., Robert, C., and Tackett, J. L. (2018), “Abandon Statistical Significance,” The American Statistician, this issue.
PubMed Web of Science ®Google Scholar
Merriam-Webster Dictionary (2017), “Null,” available at https://www.merriam-webster.com/dictionary/null.
Google Scholar
Murdoch, D. J., Tsai, Y.-L., and Adcock, J. (2008), “P-Values are Random Variables,” The American Statistician, 62, 242–245. DOI: 10.1198/000313008X332421.
Web of Science ®Google Scholar
Neyman, J. (1977), “Frequentist Probability and Frequentist Statistics,” Synthese, 36, 97–131. DOI: 10.1007/BF00485695.
Web of Science ®Google Scholar
Oxford Living Dictionary (2017), “Null,” available at https://en.oxforddictionaries.com/definition/null.
Google Scholar
Perezgonzalez, J. D. (2015), “P-values as Percentiles. Commentary on: ‘Null Hypothesis Significance Tests. A Mix-up of two Different Theories: the Basis for Widespread Confusion and Numerous Misinterpretations’,” Frontiers in Psychology, 6, 341.
PubMed Web of Science ®Google Scholar
Poole, C. (1987a), “Beyond the Confidence Interval,” American Journal of Public Health, 77, 195–199.
PubMed Web of Science ®Google Scholar
Poole, C. (1987b), “Confidence Intervals Exclude Nothing,” American Journal of Public Health, 77, 492–493.
PubMed Web of Science ®Google Scholar
Ritov, Y., Bickel, P. J., Gamst, A. C., and Kleijn, B. J. K. (2014), “The Bayesian Analysis of Complex, High-Dimensional Models: Can It Be CODA?” Statistical Science, 29, 619–639. DOI: 10.1214/14-STS483.
Web of Science ®Google Scholar
Robins, J. M., and Wasserman, L. (2000), “Conditioning, Likelihood, and Coherence: A Review of Some Foundational Concepts,” Journal of the American Statistical Association, 95, 1340–1346. DOI: 10.1080/01621459.2000.10474344.
Web of Science ®Google Scholar
Royall, R. R. (1986), “The Effect of Sample Size on the Meaning of Significance Tests,” The American Statistician, 40, 313–315. DOI: 10.2307/2684616.
Web of Science ®Google Scholar
Royall, R. R. (1997), Statistical Inference: A Likelihood Paradigm, New York: Chapman and Hall.
Google Scholar
Schervish, M. J. (1996), “P-values: What They Are and What They Are Not,” The American Statistician, 50, 203–206. DOI: 10.2307/2684655.
Web of Science ®Google Scholar
Sellke, T. M., Bayarri, M. J., and Berger, J. O. (2001), “Calibration of p Values for Testing Precise Null Hypotheses,” The American Statistician, 55, 62–71. DOI: 10.1198/000313001300339950.
Web of Science ®Google Scholar
Senn, S. J. (2001), “Two Cheers for P-Values,” Journal of Epidemiology and Biostatistics, 6, 193–204.
PubMedGoogle Scholar
Senn, S. J. (2002), “Letter to the Editor re: Goodman 1992,” Statistics in Medicine, 21, 2437–2444.
PubMed Web of Science ®Google Scholar
Senn, S. J. (2008), Statistical Issues in Drug Development (2nd ed.), New York: Wiley.
Google Scholar
Shannon, C.E. (1948), “A Mathematical Theory of Communication,” Bell System Technical Journal, 27, 379–423, 623–656. DOI: 10.1002/j.1538-7305.1948.tb00917.x.
Web of Science ®Google Scholar
Spanos, A. (2013), “Who Should Be Afraid of the Jeffreys–Lindley Paradox?” Philosophy of Science, 80, 73–93. DOI: 10.1086/668875.
Web of Science ®Google Scholar
Walsh, P., Rothenberg, S. J., and Bang, H. (2018), “Safety of Ibuprofen in Infants Younger than Six Months: A Retrospective Cohort Study,” PLoS One, 13, e0199493, available at DOI: 10.1371/journal.pone.0199493.
PubMed Web of Science ®Google Scholar
Wasserstein, R. L., and Lazar, N. A. (2016), “The ASA’s Statement on p-values: Context, Process and Purpose,” The American Statistician, 70, 129–133. DOI: 10.1080/00031305.2016.1154108.
Web of Science ®Google Scholar
Wellek, S. (2010), Testing Statistical Hypotheses of Equivalence and Noninferiority (2nd ed.), New York: Chapman & Hall.
Google Scholar
Ziliak, S. T., and McCloskey, D. N. (2008), The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice and Lives, Ann Arbor, MI: University of Michigan Press.
Google Scholar

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Valid P-Values Behave Exactly as They Should: Some Misleading Criticisms of P-Values and Their Resolution With S-Values

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Valid P-Values Behave Exactly as They Should: Some Misleading Criticisms of P-Values and Their Resolution With S-Values

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date