Search in:

Advanced search

Journal of Biopharmaceutical Statistics Volume 30, 2020 - Issue 1

Submit an article Journal homepage

959

Views

CrossRef citations to date

Altmetric

Reviews

Best uses of p-values and complementary measures in medical research: Recent developments in the frequentist and Bayesian frameworks

Piero QuattoDepartment of Economics, Management and Statistics, University of Milan-Bicocca, Milano, Italy

Enrico RipamontiDepartment of Economics, Management and Statistics, University of Milan-Bicocca, Milano, ItalyCorrespondence[email protected]

Donata MarasiniDepartment of Economics, Management and Statistics, University of Milan-Bicocca, Milano, Italy

Pages 121-142 | Received 02 Oct 2018, Accepted 14 May 2019, Published online: 02 Jul 2019

Cite this article
https://doi.org/10.1080/10543406.2019.1632874
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Allison, D. B., G. L. Gadbury, M. Heo, J. R. Fernandez, C. K. Lee, T. A. Prolla, and R. Weindruch. 2002. A mixture model approach for the analysis of microarray gene expression data. Computational Statistics & Data Analysis 39:1–20. doi:10.1016/S0167-9473(01)00046-9.
Web of Science ®Google Scholar
Altman, D. G. 2013. Statistics with confidence: Confidence intervals and statistical guidelines. New York: John Wiley & Sons.
Google Scholar
Bayarri, M. J., D. J. Benjamin, J. O. Berger, and T. Sellke. 2016. Rejection odds and rejection ratios: A proposal for statistical practice in testing hypotheses. Journal of Mathematical Psychology 72:90–103. doi:10.1016/j.jmp.2015.12.007.
PubMed Web of Science ®Google Scholar
Bayarri, M. J., and J. O. Berger. 1998. Quantifying surprise in the data and model verification. In Bayesian statistics, ed. J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, 53–82. Oxford: Oxford University Press.
Google Scholar
Benjamin, D.J., Berger, J.O., Johannesson, M., Nosek, B.A., Wagenmalers, E.J., Berk, R., Bollen, K.A., Brembs, B., Brown, L., Camerer, C., et al. 2018. Redefine statistical significance. Nature Human Behaviour 2:6–10.
PubMed Web of Science ®Google Scholar
Benjamini, Y., and Y. Hochberg. 1995. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological) 57 (1):289–300. doi:10.1111/rssb.1995.57.issue-1.
Web of Science ®Google Scholar
Berger, J. O., and J. Mortera. 1999. Default Bayes factors for nonnested hypothesis testing. Journal of the American Statistical Association 94 (446):542–554. doi:10.1080/01621459.1999.10474149.
Web of Science ®Google Scholar
Berger, J. O., and L. R. Pericchi. 1996. The intrinsic bayes factor for model selection and prediction. Journal of the American Statistical Association 91:109–121. doi:10.1080/01621459.1996.10476668.
Web of Science ®Google Scholar
Butler, J. S., and P. Jones. 2018. Theoretical and empirical distributions of the P-Value. Metron 76 (1):1–30. doi:10.1007/s40300-017-0130-2.
Web of Science ®Google Scholar
Carter, R. E., P. M. McKie, and C. B. Storlie. 2017. The fragility index: A p-Value in sheep’s clothing? European Hearth Journal 38:346–348.
PubMed Web of Science ®Google Scholar
Casella, G., and R. Berger. 1987. Reconciling Bayesian and frequentist evidence in the one-sided testing problem. Journal of the American Statistical Association 82:106–111. doi:10.1080/01621459.1987.10478396.
Web of Science ®Google Scholar
Chavalarias, D., J. D. Wallach, A. H. Li, and J. P. Ioannidis. 2016. Evolution of reporting P-values in the biomedical literature, 1990-2015. Journal of the American Medical Association 315:1141–1148. doi:10.1001/jama.2016.1952.
PubMed Web of Science ®Google Scholar
Cohen, J. 1988. Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum.
Google Scholar
Colquhoun, D. 2014. An investigation of the false discovery rate and the misinterpretation of P-values. Royal Society Open Science 1:140216. doi:10.1098/rsos.140216.
PubMed Web of Science ®Google Scholar
Cumming, G. 2014. The new statistics: Why and how. Psychological Science 25:7–29. doi:10.1177/0956797613518350.
PubMed Web of Science ®Google Scholar
De Santis, F. 2017. Contribution to the discussion of ‘a critical evaluation of the current “p-value controversy.”. Biometrical Journal 59 (5):877–879. doi:10.1002/bimj.201700064.
PubMedGoogle Scholar
Demidenko, E. 2016. The P-value you can’t buy. The American Statistician 70 (1):33–38. doi:10.1080/00031305.2015.1069760.
PubMed Web of Science ®Google Scholar
Dienes, Z. 2016. How bayes factors change scientific practice. Journal of Mathematical Psychology 72:78–89. doi:10.1016/j.jmp.2015.10.003.
Web of Science ®Google Scholar
Docherty, K. F., R. T. Campbell, P. S. Jhund, M. C. Petrie, and J. J. McMurray. 2016. How robust are clinical trials in heart failure? European Hearth Journal 38:338–345.
Web of Science ®Google Scholar
Editors. 2001. The value of P. Epidemiology 12(3):286. doi:10.1097/00001648-200105000-00002.
PubMed Web of Science ®Google Scholar
Efron, B. 2010. Large-scale inference: Empirical Bayes methods for estimation, testing, and prediction. Cambridge: Cambridge University Press.
Google Scholar
Efron, B., R. Tibshirani, J. D. Storey, and V. Tusher. 2001. Empirical Bayes analysis of a microarray experiment. Journal of the American Statistical Association 96 (456):1151–1160. doi:10.1198/016214501753382129.
Web of Science ®Google Scholar
Ellenberg, J. 2015. How not to be wrong: The power of mathematical thinking. New York: Penguin Books.
Google Scholar
Evans, M., and M. Swartz. 1996. Methods for approximating integrals in statistics with special emphasis on Bayesian integration problems. Statistical Science 10:254–272. doi:10.1214/ss/1177009938.
Web of Science ®Google Scholar
Fisher, R. 1925. Statistical methods for research workers. Edinbourgh: Oliver & Boyd.
Google Scholar
Fisher, R. 1956. Statistical methods and inference. Oxford: Hafner Publishing.
Google Scholar
Garcia-Pérez, M. A. 2017. Thou shalt not bear false witness against null hypothesis significance testing. Educational and Psychological Measurement 77 (4):631–662. doi:10.1177/0013164416668232.
PubMed Web of Science ®Google Scholar
Gelman, A., H. S. Stern, J. B. Carlin, D. B. Dunson, A. Vehtari, and D. B. Rubin. 2013. Bayesian data analysis. New York: Chapman and Hall/CRC.
Google Scholar
Gigerenzer, G. 1998. We need statistical thinking, not statistical rituals. Behavioral and Brain Sciences 21 (2):199–200. doi:10.1017/S0140525X98281167.
Web of Science ®Google Scholar
Gigerenzer, G. 2004. Mindless statistics. The Journal of Socio-Economics 33 (5):587–606. doi:10.1016/j.socec.2004.09.033.
Google Scholar
Gigerenzer, G. 2018. Statistical rituals: The replication delusion and how we got there. Advances in Methods and Practices in Psychological Science 1 (2):198–218. doi:10.1177/2515245918771329.
Google Scholar
Goodman, S. N. 1999. Towards Evidence based medical statistics 1: The p-value fallacy. Annals of Internal Medicine 130:995–1004.
PubMed Web of Science ®Google Scholar
Goodman, S. N. (2008). A dirty dozen: Twelve p-value misconceptions. Seminars in Hematology 45 (3):135–140.
Google Scholar
Greenland, S., S. J. Senn, K. J. Rothman, J. B. Carlin, C. Poole, S. N. Goodman, and D. G. Altman. 2016. Statistical tests, p values, confidence intervals and power: A guide to misinterpretations. European Journal of Epidemiology 31:337–350. doi:10.1007/s10654-016-0149-3.
PubMed Web of Science ®Google Scholar
Haller, H., and S. Krauss. 2002. Misinterpretations of significance: A problem students share with their teachers. Methods of Psychological Research 7:1–20.
Google Scholar
Head, M. L., L. Holman, R. Lanfear, A. T. Kahn, and M. D. Jennions. 2015. The extent and consequences of P-hacking in science. PLoS Biology 13:e1002106. doi:10.1371/journal.pbio.1002106.
PubMed Web of Science ®Google Scholar
Held, L., and M. Ott. 2018. On P-values and Bayes factors. Annual Review of Statistics and Its Applications 5:393–419. doi:10.1146/annurev-statistics-031017-100307.
Web of Science ®Google Scholar
Ioannidis, J. P. 2005. Why most published research findings are false. PLoS Medicine 2:e124. doi:10.1371/journal.pmed.0020124.
PubMed Web of Science ®Google Scholar
Ioannidis, J. P. 2016. Fit-for-purpose inferential methods: Abandoning/changing p-values versus abandoning/changing research. Supplemental material to the ASA statement on p-Values and statistical significance. The American Statistician, 70.
Google Scholar
Jeffreys, H. 1935. Some tests of significance, treated by the theory of probability. Proceedings of the Cambridge Philosophy Society 31:203–222.
Google Scholar
Jeffreys, H. 1961. Theory of probability. 3rd ed. Oxford, UK: Oxford University Press.
Google Scholar
Johnson, N. L., S. Kotz, and N. Balakrishnan. 1995. Continuous univariate distributions, Volume II. 2nd ed. New York: Wiley.
Google Scholar
Johnson, V. E., and D. Rossell. 2010. On the use of non-local prior densities in Bayesian hypothesis tests. Journal of the Royal Statistical Society. Series B (Methodological) 72:143–170. doi:10.1111/j.1467-9868.2009.00730.x.
Web of Science ®Google Scholar
Kass, R. E., and A. E. Raftery. 1995. Bayes factors. Journal of the American Statistical Association 90:773–795. doi:10.1080/01621459.1995.10476572.
Web of Science ®Google Scholar
Kruschke, J. K., and T. M. Liddell. 2018. The Bayesian new statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a bayesian perspective. Psychonomic Bulletin and Review 25 (1):178–206. doi:10.3758/s13423-016-1221-4.
PubMed Web of Science ®Google Scholar
Lakens, D., F. G. Adolfi, C. J. Albers, F. Anvari, M. A. J. Apps, and S. E. Argamon. 2018. Justify your alpha. Nature Human Behaviour 2 (3):168–171. doi:10.1038/s41562-018-0311-x.
Web of Science ®Google Scholar
Lee, M., F. C. Kuo, G. A. Whitmore, and J. Sklar. 2000. Importance of replication in microarray gene expression studies: Statistical methods and evidence from repetitive CDNA hybridizations. Proceedings of the National Academy of Science USA 97:9834–9839.
PubMed Web of Science ®Google Scholar
Leek, J. T., and J. R. Jager. 2017. Is most published research really false? Annual Review of Statistics and Its Applications 4:211–214.
Web of Science ®Google Scholar
Lehmann, E. L. 1959. Testing statistical hypotheses. New York: John Wiley.
Google Scholar
Liao, J. G., Y. Lin, Z. E. Selvanayagam, and W. J. Shih. 2004. A mixture model for estimating local false discovery rate in DNA microarray analysis. Bioinformatics 20 (16):2694–2701. doi:10.1093/bioinformatics/bth310.
PubMed Web of Science ®Google Scholar
Matics, T. J., N. Khan, P. Jani, and J. M. Kane. 2017. Fragility index in a cohort of pediatric randomized controlled trials. Journal of Clinical Medicine 6:79. doi:10.3390/jcm6080079.
Web of Science ®Google Scholar
Miller, A. M. 2016. ASA statement on P-values: Some implications for education. Online discussion of the ASA Statement on statistical significance and p-values. The American Statistician 70.
Web of Science ®Google Scholar
Morey, R. D., R. Hoekstra, N. J. Rouder, M. D. Lee, and E. J. Wagenmakers. 2015. The fallacy of placing confidence in confidence intervals. Psychonomic Bulletin and Review 23:103–123. doi:10.3758/s13423-015-0947-8.
Web of Science ®Google Scholar
Morris, C. 1987. Comment to ‘Testing a Point Null Hypothesis: The irreconciliability of p-values and evidence.’. Journal of the American Statistical Association 82:112–139. doi:10.2307/2289137.
Web of Science ®Google Scholar
Motulsky, H. J. 2015. Common misconceptions about data analysis and statistics. Bristish Journal of Pharmacology 172 (8):2126–2132. doi:10.1111/bph.12884.
PubMed Web of Science ®Google Scholar
Murtaugh, P. A. 2014. In defense of P-values. Ecology 95 (3):611–617. doi:10.1890/13-0590.1.
PubMed Web of Science ®Google Scholar
Newton, M. A., C. M. Kendziorski, C. S. Richmond, F. R. Blattner, and K. W. Tsui. 2001. On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data. Journal of Computational Biology 8:37–52. doi:10.1089/106652701300099074.
PubMed Web of Science ®Google Scholar
Northfelt, D.W., Dezube, B.J., Thommes, J.A., Miller, B.J., Fischl, M.A., Friedman-Kien, A., Kaplan, L.D., Du Mond, C., Mamelok, R.D., Henry, D.H. 1998. Pegylated-liposomal doxorubicin versus doxorubicin, bleomycin, and vincristine in the treatment of AIDS-Related Kaposi’s Sarcoma: Results of a randomized phase III clinical trial. Journal of Clinical Oncology. 16(7):2445–2451. doi:10.1200/JCO.1998.16.7.2445.
PubMed Web of Science ®Google Scholar
O’Hagan, A. 1995. Fractional Bayes factors for model comparisons. Journal of the Royal Statistical Society. Series B (Methodological) 56:99–118. doi:10.1111/j.2517-6161.1995.tb02017.x.
Google Scholar
Open Science Collaboration. 2015. Estimating the reproducibility of psychological science. Science 349. aac4716.
Web of Science ®Google Scholar
Pan, W., J. Lin, and C. T. Le. 2003. A mixture model approach to detecting differentially expressed genes with microarray data. Functional Integration Genomics 3:117–124. doi:10.1007/s10142-003-0085-7.
PubMedGoogle Scholar
Peel, D., and G. J. McLachlan. 2000. Robust mixture modelling using the t distribution. Statistics and Computing 10:339–348. doi:10.1023/A:1008981510081.
Web of Science ®Google Scholar
Poole, C. 2001. Low P-values or narrow confidence intervals: Which are more durable? Epidemiology 12:291–294. doi:10.1097/00001648-200105000-00005.
PubMed Web of Science ®Google Scholar
Pounds, S., and S. W. Morris. 2003. Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of P-values. Bioinformatics 19 (10):1236–1242. doi:10.1093/bioinformatics/btg148.
PubMed Web of Science ®Google Scholar
Ridgeon, E. E., P. J. Young, R. Bellomo, M. Mucchetti, R. Lembo, and G. Landoni. 2016. The fragility index in multicenter randomized controlled critical care trials. Critical Care Medicine 44:1278–1284. doi:10.1097/CCM.0000000000001670.
PubMed Web of Science ®Google Scholar
Rothman, K. J. (2016). Disengaging from statistical significance. Online discussion of the ASA statement on statistical significance and p-values. The American Statistician 70.
Google Scholar
Sellke, T. 2012. On the interpretation of p-values. Technical Report. West Lafayette: Department of Statistics, Purdue University.
Google Scholar
Sellke, T., M. J. Bayarri, and J. O. Berger. 2001. Calibration of P-values for testing precise null hypothesis. The American Statistician 55:62–71. doi:10.1198/000313001300339950.
Web of Science ®Google Scholar
Senn, S. (2016). Are P-values the problem? Online discussion of the ASA statement on statistical significance and p-values. The American Statistician 70.
Google Scholar
Stang, A., M. Deckert, C. Poole, and K. J. Rothman. 2017. Statistical inference in abstracts of major medical and epidemiology journals 1975-2014: A systematic review. European Journal of Epidemiology 32:21–29. doi:10.1007/s10654-016-0211-1.
PubMed Web of Science ®Google Scholar
Szucs, D., and J. Ioannidis. 2017. When null hypothesis significance testing is unsuitable for research: A reassessment. Frontiers in Human Neuroscience 11:390. doi:10.3389/fnhum.2017.00390.
PubMed Web of Science ®Google Scholar
Trafimow, D., Amrhein, V., Areshnkoff, C.N., Barrera-Causil, C., Beh, E.J., Bilgiç, Y.K., Bono, R., Bradley, M.T. 2017. Manipulating the alpha level cannot cure significance testing. PeerJ 9:699.
Google Scholar
Trafimow, D., and M. Marks. 2015. Editorial. Basic and Applied Social Psichology 37:1–2. doi:10.1080/01973533.2015.1012991.
Web of Science ®Google Scholar
van Dyk, D. A. 2014. The role of statistics in the discovery of Higgs Boson. Annual Review of Statistics and Its Applications 1:41–59. doi:10.1146/annurev-statistics-062713-085841.
Web of Science ®Google Scholar
Wagenmakers, E. J. 2007. A practical solution to the pervasive problems of P-values. Psychonomic Bulletin and Review 14 (5):779–804. doi:10.3758/BF03194105.
PubMed Web of Science ®Google Scholar
Walsh, M., Srinathan, S.K., McAuley, D.F., Mrkobrada, M., Levine, O., Ribic, C., Molnar, A.O., Dattani, N.D., Burke, A., Guyatt, G., et al. 2014. The statistical significance of randomized control trial results is frequently fragile: A case for a fragility index. Journal of Clinical Epidemiology 67:622–628. doi:10.1016/j.jclinepi.2013.09.012.
PubMed Web of Science ®Google Scholar
Wasserstein, R. L., and N. A. Lazar. 2016. The ASA’s statement on p-values: Context, process, and purpose. The American Statistician 70:129–133. doi:10.1080/00031305.2016.1154108.
Web of Science ®Google Scholar
Wasserstein, R. L., A. L. Scirm, and N. A. Lazar. 2019. Moving to a world beyond “p<.05”. The American Statistician 1:1–19.
Google Scholar
Wellek, S. 2017. A critical evaluation of the current ‘p-value controversy’. Biometrical Journal 59:854–872. doi:10.1002/bimj.201700001.
PubMed Web of Science ®Google Scholar
Wetzels, R., D. Matzke, M. D. Lee, J. N. Rouder, G. J. Iverson, and E. J. Wagenmakers. 2011. Statistical evidence in experimental psychology. An empirical comparison using 855 t tests. Perspectives on Psychological Sciences 6 (3):291–298. doi:10.1177/1745691611406923.
Web of Science ®Google Scholar
Winkler, R. L. 2001. Why Bayesian analysis hasn’t caught on in healthcare decision making. International Journal of Technology Assessment in Health Care 17:56–66. doi:10.1017/S026646230110406X.
PubMed Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Best uses of p-values and complementary measures in medical research: Recent developments in the frequentist and Bayesian frameworks

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Best uses of p-values and complementary measures in medical research: Recent developments in the frequentist and Bayesian frameworks

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date