40,225
Views
184
CrossRef citations to date
0
Altmetric
Interpreting and Using p

Valid P-Values Behave Exactly as They Should: Some Misleading Criticisms of P-Values and Their Resolution With S-Values

Pages 106-114 | Received 19 Mar 2018, Accepted 24 Sep 2018, Published online: 20 Mar 2019

References

  • Amrhein, V., Korner-Nievergelt, F., and Roth, T. (2017), “The Earth is Flat (p > 0.05): Significance Thresholds and the Crisis of Unreplicable Research,” Peer J, 5, e3544. DOI: 10.7717/peerj.3544.
  • Amrhein, V., Trafimow, D., and Greenland, S. (2018), “Inferential Statistics are Descriptive Statistics,” The American Statistician, this issue.
  • Bayarri, M. J., and Berger, J. O. (1999), “Quantifying Surprise in the Data and Model Verification,” in Bayesian Statistics 6, eds. J. M. Bernardo, J.O. Berger, A.P. Dawid, and A. F. M. Smith, Oxford, UK: Oxford University Press, pp. 53–82.
  • Bayarri, M. J., and Berger, J. O. (2000), “Values for Composite Null Models,” Journal of the American Statistical Association, 95, 1127–1142. DOI: 10.2307/2669749.
  • Bayarri, M. J., and Berger, J. O. (2004), “The Interplay of Bayesian and Frequentist Analysis,” Statistical Science, 19, 58–80. DOI: 10.1214/088342304000000116.
  • Benjamini, Y. (2016), “It’s Not the P-values’ Fault,” The American Statistician, Online Supplement to ASA Statement on P-values. 70, online supplement 1, available at http://amstat.tandfonline.com/doi/suppl/10.1080/00031305.2016.1154108/suppl_file/xxxx.
  • Berger, J. O., and Sellke, T. M. (1987), “Testing a Point Null Hypothesis: The Irreconcilability of P-values and Evidence” (with discussion), Journal of the American Statistical Association, 82 112–139. DOI: 10.2307/2289131.
  • Berger, J. O., and Wolpert, R. L. (1988), “The Likelihood Principle” (with discussion) (2nd ed.), IMS Lecture Notes-Monograph Series, 6, 1–199.
  • Berger, R. L., and Boos, D. D. (1994), “P Values Maximized Over a Confidence Set for the Nuisance Parameter,” Journal of the American Statistical Association, 89, 1012–1016. DOI: 10.2307/2290928.
  • Berger, R. L., and Hsu, J. C. (1996), “Bioequivalence Trials, Intersection-Union Tests, and Equivalence Confidence Sets,” Statistical Science, 11, 283–319. DOI: 10.1214/ss/1032280304.
  • Boos, D. D., and Stefanski, L. A. (2011), “P-Value Precision and Reproducibility,” The American Statistician, 65, 213–221. DOI: 10.1198/tas.2011.10129.
  • Box, G. E. P. (1980), “Sampling and Bayes Inference in Scientific Modeling and Robustness,” Journal of the Royal Statistical Society, Series A, 143, 383–430. DOI: 10.2307/2982063.
  • Casella, G., and Berger, R. L. (1987), “Reconciling Bayesian and Frequentist Evidence in the 1-sided Testing Problem” (with discussion), Journal of the American Statistical Association, 82, 106–135. DOI: 10.1080/01621459.1987.10478396.
  • Casella, G., and Berger, R. L. (1987), “Comment,” Statistical Science, 2, 344–417. DOI: 10.1214/ss/1177013243.
  • Cohen, J. (1994), “The Earth is Round (p < 0.05),” American Psychology, 47, 997–1003.
  • Cox, D. R., and Donnelly, C. A. (2011), Principle of Applied Statistics, Cambridge, UK: Cambridge University Press.
  • Cox, D. R., and Hinkley, D. V. (1974), Theoretical Statistics, New York: Chapman and Hall.
  • Edwards, A. W. F. (1992), Likelihood (2nd ed.), Baltimore, MD: Johns Hopkins University Press.
  • Fisher, R. A. (1925), Statistical Methods for Research Workers, Edinburgh, UK: Oliver and Boyd.
  • Fraundorf, P. (2017), “Examples of Surprisal,” available at http://www.umsl.edu/∼fraundorfp/egsurpri.html.
  • Gelman, A. (2013), “P Values and Statistical Practice,” Epidemiology, 24, 69–72. DOI: 10.1097/EDE.0b013e31827886f7.
  • Gelman, A., and Stern, H. (2006), “The Difference Between ‘Significant’ and ‘Not Significant’ is not Itself Statistically Significant,” The American Statistician, 60, 328–331. DOI: 10.1198/000313006X152649.
  • Gigerenzer, G. (2004), “Mindless Statistics,” Journal of Socio-Economics, 33, 587–606. DOI: 10.1016/j.socec.2004.09.033.
  • Good, I. J. (1956), “The Surprise Index for the Multivariate Normal Distribution,” The Annals of Mathematical Statistics, 27, 1130–1135. DOI: 10.1214/aoms/1177728079.
  • Good, I. J. (1983), “Some Logic and History of Hypothesis Testing,” in Philosophical Foundations of Economics, ed. J. C. Pitt, Dordrecht: D. Reidel, pp. 149–174. Reprinted as Ch. 14 in Good, I.J. (1983), Good Thinking, Minneapolis, MN: University of Minnesota Press, pp. 129–148.
  • Goodman, S. N. (1992), “A Comment on Replication, p-values and Evidence,” Statistics in Medicine, 11, 875–879. DOI: 10.1002/sim.4780110705.
  • Goodman, S. N. (1999), “Towards Evidence-Based Medical Statistics, I: The P-value Fallacy,” Annals of Internal Medicine, 130, 995–1004. DOI: 10.7326/0003-4819-130-12-199906150-00008.
  • Greenland, S. (2004), “The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics,” Wake Forest Law Review, 39, 291–310.
  • Greenland, S. (2017), “The Need for Cognitive Science in Methodology,” American Journal of Epidemiology, 186, 639–645. DOI: 10.1093/aje/kwx259.
  • Greenland, S. (2018), “The Unconditional Information in P-values, and Its Refutational Interpretation via S-values,” manuscript.
  • Greenland, S., and Poole, C. (2013), “Living with Statistics in Observational Research,” Epidemiology (Cambridge, Mass.), 24, 73–78. DOI: 10.1097/EDE.0b013e3182785a49.
  • Greenland, S., Senn, S.J., Rothman, K.J., Carlin, J.C., Poole, C., Goodman, S.N., and Altman, D.G. (2016), “Statistical Tests, Confidence Intervals, and Power: A Guide to Misinterpretations,” The American Statistician, 70, online supplement 1, available at http://amstat.tandfonline.com/doi/suppl/10.1080/00031305.2016.1154108/suppl_file/utas_a_1154108_sm5368.pdf; reprinted in the European Journal of Epidemiology, 31, 337–350. DOI: 10.1007/s10654-016-0149-3.
  • Hoekstra, R., Finch, S., Kiers, H. A. L., and Johnson, A. (2006), “Probability as Certainty: Dichotomous Thinking and the Misuse of p-values,” Psychonomic Bulletin & Review, 13, 1033–1037. DOI: 10.3758/BF03213921.
  • Hubbard, R., and Bayarri, M. J. (2003), “Confusion Over Measures of Evidence (p’s) Versus Errors (α’s) in Classical Statistical Testing,” The American Statistician, 57, 171–177. DOI: 10.1198/0003130031856.
  • Hubbard, R., and Lindsay, R. M. (2008), “Why P Values Are Not a Useful Measure of Evidence in Statistical Significance Testing,” Theory & Psychology, 18, 69–88. DOI: 10.1177/0959354307086923.
  • Hurlbert, S. H., and Lombardi, C. M. (2009), “Final Collapse of the Neyman–Pearson Decision Theoretic Framework and Rise of the neoFisherian,” Annales Zoologici Fennici, 46, 311–349. DOI: 10.5735/086.046.0501.
  • Kuffner, T. A., & Walker, S. G. (2017), “Why Are p-values Controversial?” The American Statistician, in Press, 1. DOI: 10.1080/00031305.2016.1277161.
  • Lakens, D., Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A. J., Argamon, S. E., Baguley, T., Becker, R. B., Benning, S. D., Bradford, D. E., Buchanan, E. M., Caldwell, A. R., Van Calster, B., Carlsson, R., Chen, S.-C., Chung, B., Colling, L. J., Collins, G. S., Crook, Z., Cross, E. S., Daniels, S., Danielsson, H., DeBruine, L., Dunleavy, D. J., Earp, B. D., Feist, M. I., Ferrell, J. D., Field, J. G., Fox, N. W., Friesen, A., Gomes, C., Gonzalez-Marquez, M., Grange, J. A., Grieve, A. P., Guggenberger, R., Grist, J., van Harmelen, A.-L., Hasselman, F., Hochard, K. D., Hoffarth, M. R., Holmes, N. P., Ingre, M., Isager, P. M., Isotalus, H. K., Johansson, C., Juszczyk, K., Kenny, D. A., Khalil, A. A., Konat, B., Lao, J., Larsen, E. G., Lodder, G. M. A., Lukavský, J., Madan, C. R., Manheim, D., Martin, S. R., Martin, A. E., Mayo, D. G., McCarthy, R. J., McConway, K., McFarland, C., Nio, A. Q. X., Nilsonne, G., de Oliveira, C. L., de Xivry, J.-J. O., Parsons, S., Pfuhl, G., Quinn, K. A., Sakon, J. J., Saribay, S. A., Schneider, I. K., Selvaraju, M., Sjoerds, Z., Smith, S. G., Smits, T., Spies, J. R., Sreekumar, V., Steltenpohl, C. N., Stenhouse, N., Świa̧tkowski, W., Vadillo, M. A., Van Assen, M. A. L. M., Williams, M. N., Williams, S. E., Williams, D. R., Yarkoni, T., Ziano, I., & Zwaan, R. A.) (2018), “Justify Your Alpha: A Response to ‘Redefine Statistical Significance,” Nature Human Behaviour, 2, 168–171.
  • Lane, D. (1988), “Discussion of Berger and Wolpert,” IMS Lecture Notes-Monograph, 6, 175–181.
  • Lang, J. M., Rothman, K. J., and Cann, C. I. (1998), “That Confounded P-value,” Epidemiology (Cambridge, Mass.), 9, 7—8.
  • LeCam, L. (1988), “Discussion of Berger and Wolpert,” IMS Lecture Notes-Monograph, 6, 182–185.
  • Lehmann, E. L. (1986), Testing Statistical Hypotheses, New York: Wiley.
  • Lindeman, M., & Stark, P. B. (2012), “A Gentle Introduction to Risk-limiting Audits,” IEEE Security & Privacy, 10, 42–49. DOI: 10.1109/MSP.2012.56.
  • MacKay, D. J. C. (2003), Information Theory, Inference, and Learning Algorithms, Cambridge, Cambridge University Press, sec. 2.4, available at http://www.inference.org.uk/mackay/itila/book.html
  • McShane, B. B., and Gal, D. (2017), “Statistical Significance and the Dichotomization of Evidence” (with discussion), Journal of the American Statistical Association, 112, 885–908. DOI: 10.1080/01621459.2017.1289846.
  • McShane, B. B., Gal, D., Gelman, A., Robert, C., and Tackett, J. L. (2018), “Abandon Statistical Significance,” The American Statistician, this issue.
  • Merriam-Webster Dictionary (2017), “Null,” available at https://www.merriam-webster.com/dictionary/null.
  • Murdoch, D. J., Tsai, Y.-L., and Adcock, J. (2008), “P-Values are Random Variables,” The American Statistician, 62, 242–245. DOI: 10.1198/000313008X332421.
  • Neyman, J. (1977), “Frequentist Probability and Frequentist Statistics,” Synthese, 36, 97–131. DOI: 10.1007/BF00485695.
  • Oxford Living Dictionary (2017), “Null,” available at https://en.oxforddictionaries.com/definition/null.
  • Perezgonzalez, J. D. (2015), “P-values as Percentiles. Commentary on: ‘Null Hypothesis Significance Tests. A Mix-up of two Different Theories: the Basis for Widespread Confusion and Numerous Misinterpretations’,” Frontiers in Psychology, 6, 341.
  • Poole, C. (1987a), “Beyond the Confidence Interval,” American Journal of Public Health, 77, 195–199.
  • Poole, C. (1987b), “Confidence Intervals Exclude Nothing,” American Journal of Public Health, 77, 492–493.
  • Ritov, Y., Bickel, P. J., Gamst, A. C., and Kleijn, B. J. K. (2014), “The Bayesian Analysis of Complex, High-Dimensional Models: Can It Be CODA?” Statistical Science, 29, 619–639. DOI: 10.1214/14-STS483.
  • Robins, J. M., and Wasserman, L. (2000), “Conditioning, Likelihood, and Coherence: A Review of Some Foundational Concepts,” Journal of the American Statistical Association, 95, 1340–1346. DOI: 10.1080/01621459.2000.10474344.
  • Royall, R. R. (1986), “The Effect of Sample Size on the Meaning of Significance Tests,” The American Statistician, 40, 313–315. DOI: 10.2307/2684616.
  • Royall, R. R. (1997), Statistical Inference: A Likelihood Paradigm, New York: Chapman and Hall.
  • Schervish, M. J. (1996), “P-values: What They Are and What They Are Not,” The American Statistician, 50, 203–206. DOI: 10.2307/2684655.
  • Sellke, T. M., Bayarri, M. J., and Berger, J. O. (2001), “Calibration of p Values for Testing Precise Null Hypotheses,” The American Statistician, 55, 62–71. DOI: 10.1198/000313001300339950.
  • Senn, S. J. (2001), “Two Cheers for P-Values,” Journal of Epidemiology and Biostatistics, 6, 193–204.
  • Senn, S. J. (2002), “Letter to the Editor re: Goodman 1992,” Statistics in Medicine, 21, 2437–2444.
  • Senn, S. J. (2008), Statistical Issues in Drug Development (2nd ed.), New York: Wiley.
  • Shannon, C.E. (1948), “A Mathematical Theory of Communication,” Bell System Technical Journal, 27, 379–423, 623–656. DOI: 10.1002/j.1538-7305.1948.tb00917.x.
  • Spanos, A. (2013), “Who Should Be Afraid of the Jeffreys–Lindley Paradox?” Philosophy of Science, 80, 73–93. DOI: 10.1086/668875.
  • Walsh, P., Rothenberg, S. J., and Bang, H. (2018), “Safety of Ibuprofen in Infants Younger than Six Months: A Retrospective Cohort Study,” PLoS One, 13, e0199493, available at DOI: 10.1371/journal.pone.0199493.
  • Wasserstein, R. L., and Lazar, N. A. (2016), “The ASA’s Statement on p-values: Context, Process and Purpose,” The American Statistician, 70, 129–133. DOI: 10.1080/00031305.2016.1154108.
  • Wellek, S. (2010), Testing Statistical Hypotheses of Equivalence and Noninferiority (2nd ed.), New York: Chapman & Hall.
  • Ziliak, S. T., and McCloskey, D. N. (2008), The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice and Lives, Ann Arbor, MI: University of Michigan Press.