Abstract
Despite much writing about the uses and misuses of p-values, one point is frequently misunderstood: p-values do not measure evidence. Even so, the statistics literature often says that p-values do measure evidence. The purpose of the present article is to argue through several examples that p-values do not measure evidence. This article is not about other aspects of p-values. Specifically, it is not about sharp cutoffs, effect sizes, decision making, or other potential measures of evidence. We felt compelled to write about whether p-values measure evidence because it is our opinion that misunderstanding evidence and how to measure it contributes to misunderstanding science and to the so-called replication crisis.
Notes
1 The term critical region usually refers to the region of the sample space that entails rejecting a null hypothesis in a test of pre-specified size. Here we use the term differently, to mean the region of the sample space that favors over
at least as much as the observed x. I.e. the critical region is the region whose
probability is measured by the p-value.
2 We specify non-decreasing densities so that the critical region is always If one specifies that the critical region is always
then there is no need to restrict attention to non-decreasing densities.
3 One might also consider a fifth friend E who can spare the time to watch Randi for only two minutes and who also observes (0, 1).
4 Birnbaum (Citation1962) and Berger and Wolpert (Citation1988) use CP and LP to prove the Likelihood Principle LP. We don’t need LP here so refer the reader to those sources for the proof.
5 Schervish (Citation1996) usually uses the term support, but seems to treat support and evidence as synonyms.