959
Views
5
CrossRef citations to date
0
Altmetric
Reviews

Best uses of p-values and complementary measures in medical research: Recent developments in the frequentist and Bayesian frameworks

, &
Pages 121-142 | Received 02 Oct 2018, Accepted 14 May 2019, Published online: 02 Jul 2019

References

  • Allison, D. B., G. L. Gadbury, M. Heo, J. R. Fernandez, C. K. Lee, T. A. Prolla, and R. Weindruch. 2002. A mixture model approach for the analysis of microarray gene expression data. Computational Statistics & Data Analysis 39:1–20. doi:10.1016/S0167-9473(01)00046-9.
  • Altman, D. G. 2013. Statistics with confidence: Confidence intervals and statistical guidelines. New York: John Wiley & Sons.
  • Bayarri, M. J., D. J. Benjamin, J. O. Berger, and T. Sellke. 2016. Rejection odds and rejection ratios: A proposal for statistical practice in testing hypotheses. Journal of Mathematical Psychology 72:90–103. doi:10.1016/j.jmp.2015.12.007.
  • Bayarri, M. J., and J. O. Berger. 1998. Quantifying surprise in the data and model verification. In Bayesian statistics, ed. J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, 53–82. Oxford: Oxford University Press.
  • Benjamin, D.J., Berger, J.O., Johannesson, M., Nosek, B.A., Wagenmalers, E.J., Berk, R., Bollen, K.A., Brembs, B., Brown, L., Camerer, C., et al. 2018. Redefine statistical significance. Nature Human Behaviour 2:6–10.
  • Benjamini, Y., and Y. Hochberg. 1995. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological) 57 (1):289–300. doi:10.1111/rssb.1995.57.issue-1.
  • Berger, J. O., and J. Mortera. 1999. Default Bayes factors for nonnested hypothesis testing. Journal of the American Statistical Association 94 (446):542–554. doi:10.1080/01621459.1999.10474149.
  • Berger, J. O., and L. R. Pericchi. 1996. The intrinsic bayes factor for model selection and prediction. Journal of the American Statistical Association 91:109–121. doi:10.1080/01621459.1996.10476668.
  • Butler, J. S., and P. Jones. 2018. Theoretical and empirical distributions of the P-Value. Metron 76 (1):1–30. doi:10.1007/s40300-017-0130-2.
  • Carter, R. E., P. M. McKie, and C. B. Storlie. 2017. The fragility index: A p-Value in sheep’s clothing? European Hearth Journal 38:346–348.
  • Casella, G., and R. Berger. 1987. Reconciling Bayesian and frequentist evidence in the one-sided testing problem. Journal of the American Statistical Association 82:106–111. doi:10.1080/01621459.1987.10478396.
  • Chavalarias, D., J. D. Wallach, A. H. Li, and J. P. Ioannidis. 2016. Evolution of reporting P-values in the biomedical literature, 1990-2015. Journal of the American Medical Association 315:1141–1148. doi:10.1001/jama.2016.1952.
  • Cohen, J. 1988. Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum.
  • Colquhoun, D. 2014. An investigation of the false discovery rate and the misinterpretation of P-values. Royal Society Open Science 1:140216. doi:10.1098/rsos.140216.
  • Cumming, G. 2014. The new statistics: Why and how. Psychological Science 25:7–29. doi:10.1177/0956797613518350.
  • De Santis, F. 2017. Contribution to the discussion of ‘a critical evaluation of the current “p-value controversy.”. Biometrical Journal 59 (5):877–879. doi:10.1002/bimj.201700064.
  • Demidenko, E. 2016. The P-value you can’t buy. The American Statistician 70 (1):33–38. doi:10.1080/00031305.2015.1069760.
  • Dienes, Z. 2016. How bayes factors change scientific practice. Journal of Mathematical Psychology 72:78–89. doi:10.1016/j.jmp.2015.10.003.
  • Docherty, K. F., R. T. Campbell, P. S. Jhund, M. C. Petrie, and J. J. McMurray. 2016. How robust are clinical trials in heart failure? European Hearth Journal 38:338–345.
  • Editors. 2001. The value of P. Epidemiology 12(3):286. doi:10.1097/00001648-200105000-00002.
  • Efron, B. 2010. Large-scale inference: Empirical Bayes methods for estimation, testing, and prediction. Cambridge: Cambridge University Press.
  • Efron, B., R. Tibshirani, J. D. Storey, and V. Tusher. 2001. Empirical Bayes analysis of a microarray experiment. Journal of the American Statistical Association 96 (456):1151–1160. doi:10.1198/016214501753382129.
  • Ellenberg, J. 2015. How not to be wrong: The power of mathematical thinking. New York: Penguin Books.
  • Evans, M., and M. Swartz. 1996. Methods for approximating integrals in statistics with special emphasis on Bayesian integration problems. Statistical Science 10:254–272. doi:10.1214/ss/1177009938.
  • Fisher, R. 1925. Statistical methods for research workers. Edinbourgh: Oliver & Boyd.
  • Fisher, R. 1956. Statistical methods and inference. Oxford: Hafner Publishing.
  • Garcia-Pérez, M. A. 2017. Thou shalt not bear false witness against null hypothesis significance testing. Educational and Psychological Measurement 77 (4):631–662. doi:10.1177/0013164416668232.
  • Gelman, A., H. S. Stern, J. B. Carlin, D. B. Dunson, A. Vehtari, and D. B. Rubin. 2013. Bayesian data analysis. New York: Chapman and Hall/CRC.
  • Gigerenzer, G. 1998. We need statistical thinking, not statistical rituals. Behavioral and Brain Sciences 21 (2):199–200. doi:10.1017/S0140525X98281167.
  • Gigerenzer, G. 2004. Mindless statistics. The Journal of Socio-Economics 33 (5):587–606. doi:10.1016/j.socec.2004.09.033.
  • Gigerenzer, G. 2018. Statistical rituals: The replication delusion and how we got there. Advances in Methods and Practices in Psychological Science 1 (2):198–218. doi:10.1177/2515245918771329.
  • Goodman, S. N. 1999. Towards Evidence based medical statistics 1: The p-value fallacy. Annals of Internal Medicine 130:995–1004.
  • Goodman, S. N. (2008). A dirty dozen: Twelve p-value misconceptions. Seminars in Hematology 45 (3):135–140.
  • Greenland, S., S. J. Senn, K. J. Rothman, J. B. Carlin, C. Poole, S. N. Goodman, and D. G. Altman. 2016. Statistical tests, p values, confidence intervals and power: A guide to misinterpretations. European Journal of Epidemiology 31:337–350. doi:10.1007/s10654-016-0149-3.
  • Haller, H., and S. Krauss. 2002. Misinterpretations of significance: A problem students share with their teachers. Methods of Psychological Research 7:1–20.
  • Head, M. L., L. Holman, R. Lanfear, A. T. Kahn, and M. D. Jennions. 2015. The extent and consequences of P-hacking in science. PLoS Biology 13:e1002106. doi:10.1371/journal.pbio.1002106.
  • Held, L., and M. Ott. 2018. On P-values and Bayes factors. Annual Review of Statistics and Its Applications 5:393–419. doi:10.1146/annurev-statistics-031017-100307.
  • Ioannidis, J. P. 2005. Why most published research findings are false. PLoS Medicine 2:e124. doi:10.1371/journal.pmed.0020124.
  • Ioannidis, J. P. 2016. Fit-for-purpose inferential methods: Abandoning/changing p-values versus abandoning/changing research. Supplemental material to the ASA statement on p-Values and statistical significance. The American Statistician, 70.
  • Jeffreys, H. 1935. Some tests of significance, treated by the theory of probability. Proceedings of the Cambridge Philosophy Society 31:203–222.
  • Jeffreys, H. 1961. Theory of probability. 3rd ed. Oxford, UK: Oxford University Press.
  • Johnson, N. L., S. Kotz, and N. Balakrishnan. 1995. Continuous univariate distributions, Volume II. 2nd ed. New York: Wiley.
  • Johnson, V. E., and D. Rossell. 2010. On the use of non-local prior densities in Bayesian hypothesis tests. Journal of the Royal Statistical Society. Series B (Methodological) 72:143–170. doi:10.1111/j.1467-9868.2009.00730.x.
  • Kass, R. E., and A. E. Raftery. 1995. Bayes factors. Journal of the American Statistical Association 90:773–795. doi:10.1080/01621459.1995.10476572.
  • Kruschke, J. K., and T. M. Liddell. 2018. The Bayesian new statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a bayesian perspective. Psychonomic Bulletin and Review 25 (1):178–206. doi:10.3758/s13423-016-1221-4.
  • Lakens, D., F. G. Adolfi, C. J. Albers, F. Anvari, M. A. J. Apps, and S. E. Argamon. 2018. Justify your alpha. Nature Human Behaviour 2 (3):168–171. doi:10.1038/s41562-018-0311-x.
  • Lee, M., F. C. Kuo, G. A. Whitmore, and J. Sklar. 2000. Importance of replication in microarray gene expression studies: Statistical methods and evidence from repetitive CDNA hybridizations. Proceedings of the National Academy of Science USA 97:9834–9839.
  • Leek, J. T., and J. R. Jager. 2017. Is most published research really false? Annual Review of Statistics and Its Applications 4:211–214.
  • Lehmann, E. L. 1959. Testing statistical hypotheses. New York: John Wiley.
  • Liao, J. G., Y. Lin, Z. E. Selvanayagam, and W. J. Shih. 2004. A mixture model for estimating local false discovery rate in DNA microarray analysis. Bioinformatics 20 (16):2694–2701. doi:10.1093/bioinformatics/bth310.
  • Matics, T. J., N. Khan, P. Jani, and J. M. Kane. 2017. Fragility index in a cohort of pediatric randomized controlled trials. Journal of Clinical Medicine 6:79. doi:10.3390/jcm6080079.
  • Miller, A. M. 2016. ASA statement on P-values: Some implications for education. Online discussion of the ASA Statement on statistical significance and p-values. The American Statistician 70.
  • Morey, R. D., R. Hoekstra, N. J. Rouder, M. D. Lee, and E. J. Wagenmakers. 2015. The fallacy of placing confidence in confidence intervals. Psychonomic Bulletin and Review 23:103–123. doi:10.3758/s13423-015-0947-8.
  • Morris, C. 1987. Comment to ‘Testing a Point Null Hypothesis: The irreconciliability of p-values and evidence.’. Journal of the American Statistical Association 82:112–139. doi:10.2307/2289137.
  • Motulsky, H. J. 2015. Common misconceptions about data analysis and statistics. Bristish Journal of Pharmacology 172 (8):2126–2132. doi:10.1111/bph.12884.
  • Murtaugh, P. A. 2014. In defense of P-values. Ecology 95 (3):611–617. doi:10.1890/13-0590.1.
  • Newton, M. A., C. M. Kendziorski, C. S. Richmond, F. R. Blattner, and K. W. Tsui. 2001. On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data. Journal of Computational Biology 8:37–52. doi:10.1089/106652701300099074.
  • Northfelt, D.W., Dezube, B.J., Thommes, J.A., Miller, B.J., Fischl, M.A., Friedman-Kien, A., Kaplan, L.D., Du Mond, C., Mamelok, R.D., Henry, D.H. 1998. Pegylated-liposomal doxorubicin versus doxorubicin, bleomycin, and vincristine in the treatment of AIDS-Related Kaposi’s Sarcoma: Results of a randomized phase III clinical trial. Journal of Clinical Oncology. 16(7):2445–2451. doi:10.1200/JCO.1998.16.7.2445.
  • O’Hagan, A. 1995. Fractional Bayes factors for model comparisons. Journal of the Royal Statistical Society. Series B (Methodological) 56:99–118. doi:10.1111/j.2517-6161.1995.tb02017.x.
  • Open Science Collaboration. 2015. Estimating the reproducibility of psychological science. Science 349. aac4716.
  • Pan, W., J. Lin, and C. T. Le. 2003. A mixture model approach to detecting differentially expressed genes with microarray data. Functional Integration Genomics 3:117–124. doi:10.1007/s10142-003-0085-7.
  • Peel, D., and G. J. McLachlan. 2000. Robust mixture modelling using the t distribution. Statistics and Computing 10:339–348. doi:10.1023/A:1008981510081.
  • Poole, C. 2001. Low P-values or narrow confidence intervals: Which are more durable? Epidemiology 12:291–294. doi:10.1097/00001648-200105000-00005.
  • Pounds, S., and S. W. Morris. 2003. Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of P-values. Bioinformatics 19 (10):1236–1242. doi:10.1093/bioinformatics/btg148.
  • Ridgeon, E. E., P. J. Young, R. Bellomo, M. Mucchetti, R. Lembo, and G. Landoni. 2016. The fragility index in multicenter randomized controlled critical care trials. Critical Care Medicine 44:1278–1284. doi:10.1097/CCM.0000000000001670.
  • Rothman, K. J. (2016). Disengaging from statistical significance. Online discussion of the ASA statement on statistical significance and p-values. The American Statistician 70.
  • Sellke, T. 2012. On the interpretation of p-values. Technical Report. West Lafayette: Department of Statistics, Purdue University.
  • Sellke, T., M. J. Bayarri, and J. O. Berger. 2001. Calibration of P-values for testing precise null hypothesis. The American Statistician 55:62–71. doi:10.1198/000313001300339950.
  • Senn, S. (2016). Are P-values the problem? Online discussion of the ASA statement on statistical significance and p-values. The American Statistician 70.
  • Stang, A., M. Deckert, C. Poole, and K. J. Rothman. 2017. Statistical inference in abstracts of major medical and epidemiology journals 1975-2014: A systematic review. European Journal of Epidemiology 32:21–29. doi:10.1007/s10654-016-0211-1.
  • Szucs, D., and J. Ioannidis. 2017. When null hypothesis significance testing is unsuitable for research: A reassessment. Frontiers in Human Neuroscience 11:390. doi:10.3389/fnhum.2017.00390.
  • Trafimow, D., Amrhein, V., Areshnkoff, C.N., Barrera-Causil, C., Beh, E.J., Bilgiç, Y.K., Bono, R., Bradley, M.T. 2017. Manipulating the alpha level cannot cure significance testing. PeerJ 9:699.
  • Trafimow, D., and M. Marks. 2015. Editorial. Basic and Applied Social Psichology 37:1–2. doi:10.1080/01973533.2015.1012991.
  • van Dyk, D. A. 2014. The role of statistics in the discovery of Higgs Boson. Annual Review of Statistics and Its Applications 1:41–59. doi:10.1146/annurev-statistics-062713-085841.
  • Wagenmakers, E. J. 2007. A practical solution to the pervasive problems of P-values. Psychonomic Bulletin and Review 14 (5):779–804. doi:10.3758/BF03194105.
  • Walsh, M., Srinathan, S.K., McAuley, D.F., Mrkobrada, M., Levine, O., Ribic, C., Molnar, A.O., Dattani, N.D., Burke, A., Guyatt, G., et al. 2014. The statistical significance of randomized control trial results is frequently fragile: A case for a fragility index. Journal of Clinical Epidemiology 67:622–628. doi:10.1016/j.jclinepi.2013.09.012.
  • Wasserstein, R. L., and N. A. Lazar. 2016. The ASA’s statement on p-values: Context, process, and purpose. The American Statistician 70:129–133. doi:10.1080/00031305.2016.1154108.
  • Wasserstein, R. L., A. L. Scirm, and N. A. Lazar. 2019. Moving to a world beyond “p<.05”. The American Statistician 1:1–19.
  • Wellek, S. 2017. A critical evaluation of the current ‘p-value controversy’. Biometrical Journal 59:854–872. doi:10.1002/bimj.201700001.
  • Wetzels, R., D. Matzke, M. D. Lee, J. N. Rouder, G. J. Iverson, and E. J. Wagenmakers. 2011. Statistical evidence in experimental psychology. An empirical comparison using 855 t tests. Perspectives on Psychological Sciences 6 (3):291–298. doi:10.1177/1745691611406923.
  • Winkler, R. L. 2001. Why Bayesian analysis hasn’t caught on in healthcare decision making. International Journal of Technology Assessment in Health Care 17:56–66. doi:10.1017/S026646230110406X.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.