771
Views
6
CrossRef citations to date
0
Altmetric
Short Technical Note

Null Hypothesis Significance Testing Interpreted and Calibrated by Estimating Probabilities of Sign Errors: A Bayes-Frequentist Continuum

Pages 104-112 | Received 21 Dec 2019, Accepted 12 Jul 2020, Published online: 19 Oct 2020

References

  • Bayarri, M., Benjamin, D. J., Berger, J. O., and Sellke, T. M. (2016), “Rejection Odds and Rejection Ratios: A Proposal for Statistical Practice in Testing Hypotheses,” Journal of Mathematical Psychology, 72, 90–103. DOI: 10.1016/j.jmp.2015.12.007.
  • Begley, C. G., and Ioannidis, J. P. (2015), “Reproducibility in Science,” Circulation Research, 116, 116–126. DOI: 10.1161/CIRCRESAHA.114.303819.
  • Benjamin, D. J., and Berger, J. O. (2019), “Three Recommendations for Improving the Use of p-Values,” The American Statistician, 73, 186–191.
  • Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. J., Berk, R., Bollen, K. A., Brembs, B., Brown, L., Camerer, C., Cesarini, D., Chambers, C. D., Clyde, M., Cook, T. D., De Boeck, P., Dienes, Z., Dreber, A., Easwaran, K., Efferson, C., Fehr, E., Fidler, F., Field, A. P., Forster, M., George, E. I., Gonzalez, R., Goodman, S., Green, E., Green, D. P., Greenwald, A. G., Hadfield, J. D., Hedges, L. V., Held, L., Hua Ho, T., Hoijtink, H., Hruschka, D. J., Imai, K., Imbens, G., Ioannidis, J. P. A., Jeon, M., Jones, J. H., Kirchler, M., Laibson, D., List, J., Little, R., Lupia, A., Machery, E., Maxwell, S. E., McCarthy, M., Moore, D. A., Morgan, S. L., Munafó, M., Nakagawa, S., Nyhan, B., Parker, T. H., Pericchi, L., Perugini, M., Rouder, J., Rousseau, J., Savalei, V., Schönbrodt, F. D., Sellke, T., Sinclair, B., Tingley, D., Van Zandt, T., Vazire, S., Watts, D. J., Winship, C., Wolpert, R. L., Xie, Y., Young, C., Zinman, J., and Johnson, V. E., (2018), “Redefine Statistical Significance,” Nature Human Behaviour, 2, 6–10. DOI: 10.1038/s41562-017-0189-z.
  • Bernardo, J. M. (2011), “Integrated Objective Bayesian Estimation and Hypothesis Testing,” Bayesian Statistics, 9, 1–68.
  • Bickel, D. R. (2011), “Estimating the Null Distribution to Adjust Observed Confidence Levels for Genome-Scale Screening,” Biometrics, 67, 363–370. DOI: 10.1111/j.1541-0420.2010.01491.x.
  • Bickel, D. R. (2012a), “Coherent Frequentism: A Decision Theory Based on Confidence Sets,” Communications in Statistics—Theory and Methods, 41, 1478–1496.
  • Bickel, D. R. (2012b), “Empirical Bayes Interval Estimates That Are Conditionally Equal to Unadjusted Confidence Intervals or to Default Prior Credibility Intervals,” Statistical Applications in Genetics and Molecular Biology, 11, 7.
  • Bickel, D. R. (2013), “Simple Estimators of False Discovery Rates Given as Few as One or Two p-Values Without Strong Parametric Assumptions,” Statistical Applications in Genetics and Molecular Biology, 12, 529–543.
  • Bickel, D. R. (2019a), Genomics Data Analysis: False Discovery Rates and Empirical Bayes Methods, New York: Chapman and Hall/CRC.
  • Bickel, D. R. (2019b), “Maximum Entropy Derived and Generalized Under Idempotent Probability to Address Bayes-Frequentist Uncertainty and Model Revision Uncertainty, Working Paper, DOI: 10.5281/zenodo.2645555.
  • Bickel, D. R. (2019c), “Null Hypothesis Significance Testing Defended and Calibrated by Bayesian Model Checking,” The American Statistician, DOI: 10.1080/00031305.2019.1699443.
  • Bickel, D. R. (2019d), “Sharpen Statistical Significance: Evidence Thresholds and Bayes Factors Sharpened into Occam’s Razor,” Stat, 8, e215.
  • Bickel, D. R. (2020a), “Confidence Distributions and Empirical Bayes Posterior Distributions Unified as Distributions of Evidential Support,” Communications in Statistics—Theory and Methods, DOI: 10.1080/03610926.2020.1790004.(to appear).
  • Bickel, D. R. (2020b), “Interval Estimation, Point Estimation, and Null Hypothesis Significance Testing Calibrated by an Estimated Posterior Probability of the Null Hypothesis,” Working Paper, DOI: 10.5281/zenodo.3694136.
  • Bickel, D. R., and Rahal, A. (2019), “Correcting False Discovery Rates for Their Bias Toward False Positives,” Communications in Statistics—Simulation and Computation, DOI: 10.1080/03610918.2019.1630432.
  • Butler, J. S., and Jones, P. (2018), “Theoretical and Empirical Distributions of the p Value,” METRON, 76, 1–30.
  • Button, K. S., Ioannidis, J. P., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., and Munafò, M. R. (2013), “Power Failure: Why Small Sample Size Undermines the Reliability of Neuroscience,” Nature Reviews Neuroscience, 14, 365. DOI: 10.1038/nrn3475.
  • Carlin, B. P., and Louis, T. A. (2009), Bayesian Methods for Data Analysis (3rd ed.), New York: Chapman & Hall/CRC.
  • Casella, G., and Berger, R. L. (1987), “Reconciling Bayesian and Frequentist Evidence in the One-Sided Testing Problem,” Journal of the American Statistical Association, 82, 106–111.
  • Colquhoun, D. (2017), “The Reproducibility of Research and the Misinterpretation of p-Values,” Royal Society Open Science, 4, 171085.
  • Colquhoun, D. (2019), “The False Positive Risk: A Proposal Concerning What to Do About p-Values,” The American Statistician, 73, 192–201.
  • Cox, D. R. (1977), “The Role of Significance Tests,” Scandinavian Journal of Statistics, 4, 49–70.
  • de Ruiter, J. (2019), “Redefine or Justify? Comments on the Alpha Debate,” Psychonomic Bulletin & Review, 26, 430–433.
  • Dreber, A., Pfeiffer, T., Almenberg, J., Isaksson, S., Wilson, B., Chen, Y., Nosek, B. A., and Johannesson, M. (2015), “Using Prediction Markets to Estimate the Reproducibility of Scientific Research,” Proceedings of the National Academy of Sciences of the United States of America, 112, 15343–15347. DOI: 10.1073/pnas.1516179112.
  • Efron, B. (2010), Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction, Cambridge: Cambridge University Press.
  • Efron, B., and Tibshirani, R., (2002), “Empirical Bayes Methods and False Discovery Rates for Microarrays,” Genetic Epidemiology, 23, 70–86. DOI: 10.1002/gepi.1124.
  • Efron, B., Tibshirani, R., Storey, J. D., and Tusher, V., (2001), “Empirical Bayes Analysis of a Microarray Experiment,” Journal of the American Statistical Association, 96, 1151–1160.
  • Evans, M. (2015), Measuring Statistical Evidence Using Relative Belief, Chapman & Hall/CRC Monographs on Statistics & Applied Probability, New York: CRC Press.
  • Goodman, S. N. (1999), “Toward Evidence-Based Medical Statistics. 2: The Bayes Factor,” Annals of Internal Medicine, 130, 1005–1013. DOI: 10.7326/0003-4819-130-12-199906150-00019.
  • Grandhi, A., Guo, W., and Romano, J. (2019), “Control of Directional Errors in Fixed Sequence Multiple Testing,” Statistica Sinica, 29, 1047–1064.
  • Greenland, S., and Poole, C., (2013), “Living With p Values: Resurrecting a Bayesian Perspective on Frequentisi Statistics,” Epidemiology, 24, 62–68. DOI: 10.1097/EDE.0b013e3182785741.
  • Grundy, P. M. (1956), “Fiducial Distributions and Prior Distributions: An Example in Which the Former Cannot Be Associated With the Latter,” Journal of the Royal Statistical Society, Series B, 18, 217–221.
  • Hannig, J., Iyer, H., Lai, R. C., and Lee, T. C. (2016), “Generalized Fiducial Inference: A Review and New Results,” Journal of the American Statistical Association, 111, 1346–1361.
  • Held, L., and Ott, M. (2016), “How the Maximal Evidence of p-Values Against Point Null Hypotheses Depends on Sample Size,” American Statistician, 70, 335–341.
  • Held, L., and Ott, M. (2018), “On p-Values and Bayes Factors,” Annual Review of Statistics and Its Application, 5, 393–419.
  • Huang, D. W., Sherman, B. T., and Lempicki, R. A. (2009), “Bioinformatics Enrichment Tools: Paths Toward the Comprehensive Functional Analysis of Large Gene Lists,” Nucleic Acids Research, 37, 1–13. DOI: 10.1093/nar/gkn923.
  • Hughes, B. (2018), Psychology in Crisis, London: Palgrave.
  • Hurlbert, S., and Lombardi, C., (2009), “Final Collapse of the Neyman-Pearson Decision Theoretic Framework and Rise of the neoFisherian,” Annales Zoologici Fennici, 46, 311–349.
  • Ioannidis, J. P. (2005), “Why Most Published Research Findings Are False,” PLoS Medicine, 2, e124. DOI: 10.1371/journal.pmed.0020124.
  • Johnson, V., Payne, R., Wang, T., Asher, A., and Mandal, S. (2017), “On the Reproducibility of Psychological Science,” Journal of the American Statistical Association, 112, 1–10. DOI: 10.1080/01621459.2016.1240079.
  • Lakens, D., Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A., Argamon, S. E., Baguley, T., Becker, R. B., Benning, S. D., Bradford, D. E., and Buchanan, E. M. (2018), “Justify Your Alpha,” Nature Human Behaviour, 2, 168.
  • Lindley, D. V. (1958), “Fiducial Distributions and Bayes’ Theorem,” Journal of the Royal Statistical Society, Series B, 20, 102–107.
  • Marsman, M., and Wagenmakers, E. J., (2017), “Three Insights From a Bayesian Interpretation of the One-Sided p Value,” Educational and Psychological Measurement, 77, 529–539. DOI: 10.1177/0013164416669201.
  • Martin, R., and Liu, C. (2014), “A Note on p-Values Interpreted as Plausibilities,” Statistica Sinica, 24, 1703–1716.
  • Mayo, D. G. (2019), “The ASA’s p-Value Project: Why It’s Doing More Harm Than Good (Cont From 11/4/19),” available at http://bit.ly/2LgXMKY.
  • McShane, B. B., Gal, D., Gelman, A., Robert, C., and Tackett, J. L. (2019), “Abandon Statistical Significance,” The American Statistician, 73, 235–245.
  • Montazeri, Z., Yanofsky, C. M., and Bickel, D. R. (2010), “Shrinkage Estimation of Effect Sizes as an Alternative to Hypothesis Testing Followed by Estimation in High-Dimensional Biology: Applications to Differential Gene Expression,” Statistical Applications in Genetics and Molecular Biology, 9, 23.
  • Nadarajah, S., Bityukov, S., and Krasnikov, N. (2015), “Confidence Distributions: A Review,” Statistical Methodology, 22, 23–46.
  • Nieuwenhuis, S., Forstmann, B. U., and Wagenmakers, E. J. (2011), “Erroneous Analyses of Interactions in Neuroscience: A Problem of Significance,” Nature Neuroscience, 14, 1105–1107.
  • Open Science Collaboration (2015), “Estimating the Reproducibility of Psychological Science,” Science, 349, aac4716.
  • Pace, L., and Salvan, A. (1997), Principles of Statistical Inference: From a Neo-Fisherian Perspective, Advanced Series on Statistical Science & Applied Probability, Singapore: World Scientific.
  • Polansky, A. M. (2007), Observed Confidence Levels: Theory and Application, New York: Chapman and Hall.
  • Pratt, J. W. (1965), “Bayesian Interpretation of Standard Inference Statements,” Journal of the Royal Statistical Society, Series B, 27, 169–203.
  • Schachtman, N. A. (2019), “Palavering About p-Values,” available at http://schachtmanlaw.com/palavering-about-p-values/.
  • Sellke, T., Bayarri, M. J., and Berger, J. O. (2001), “Calibration of p Values for Testing Precise Null Hypotheses,” American Statistician, 55, 62–71.
  • Shen, J., Liu, R. Y., and Xie, M.G. (2018), “Prediction With Confidence—A General Framework for Predictive Inference,” Journal of Statistical Planning and Inference, 195, 126–140.
  • Shi, H., and Yin, G. (2020), “Reconnecting p-Value and Posterior Probability Under One- and Two-Sided Tests,” The American Statistician, DOI: 10.1080/00031305.2020.1717621..
  • Singh, K., Xie, M., and Strawderman, W. E. (2007), “Confidence Distribution (CD)—Distribution Estimator of a Parameter,” IMS Lecture Notes Monograph Series 2007, 54, 132–150.
  • Stephens, M. (2016), “False Discovery Rates: A New Deal,” Biostatistics, 18, 275–294.
  • van den Bergh, D., Haaf, J. M., Ly, A., Rouder, J. N., and Wagenmakers, E. J. (2019), “A Cautionary Note on Estimating Effect Size,” PsyArXiv, DOI: 10.31234/osf.io/h6pr8.
  • Vovk, V. G. (1993), “A Logic of Probability, With Application to the Foundations of Statistics,” Journal of the Royal Statistical Society, Series B, 55, 317–341.
  • Wacholder, S., Chanock, S., Garcia-Closas, M., Ghormli, L. E., and Rothman, N. (2004), “Assessing the Probability That a Positive Report Is False: An Approach for Molecular Epidemiology Studies,” Journal of the National Cancer Institute, 96, 434–442. DOI: 10.1093/jnci/djh075.
  • Wasserstein, R. L., and Lazar, N. A., (2016), “The ASA’s Statement on p-Values: Context, Process, and Purpose,” The American Statistician, 70, 129–133.
  • Wasserstein, R. L., Schirm, A. L., and Lazar, N. A. (2019), “Moving to a World Beyond ‘p < 0.05’,” The American Statistician, 73, 1–19.
  • Wilkinson, G. N. (1977), “On Resolving the Controversy in Statistical Inference” (with discussion), Journal of the Royal Statistical Society, Series B, 39, 119–171.
  • Wilson, B. M., and Wixted, J. T., (2018), “The Prior Odds of Testing a True Effect in Cognitive and Social Psychology,” Advances in Methods and Practices in Psychological Science, 1, 186–197.
  • Xie, M. G., and Singh, K. (2013), “Confidence Distribution, the Frequentist Distribution Estimator of a Parameter: A Review,” International Statistical Review, 81, 3–39.
  • Yang, Z., Li, Z., and Bickel, D. R. (2013), “Empirical Bayes Estimation of Posterior Probabilities of Enrichment: A Comparative Study of Five Estimators of the Local False Discovery Rate,” BMC Bioinformatics, 14, 87. DOI: 10.1186/1471-2105-14-87.
  • Yanofsky, C. M., and Bickel, D. R., (2010), “Validation of Differential Gene Expression Algorithms: Application Comparing Fold-Change Estimation to Hypothesis Testing,” BMC Bioinformatics, 11, 63. DOI: 10.1186/1471-2105-11-63.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.