3,645
Views
21
CrossRef citations to date
0
Altmetric
Editorial

Editorial

&

In our editorial last year, we banned the null hypothesis significance testing procedure (Trafimow & Marks, Citation2015). Since then, we have received numerous communications from a wide variety of sources. Our review of these communications suggests that more researchers now understand the basic problem of inverse inference associated with the null hypothesis significance testing procedure than prior to our 2015 editorial: The probability of the finding (or one more extreme) given the null hypothesis (p) is not the same as the probability of the null hypothesis given the finding, nor does p provide a strong basis for drawing conclusions about the probability of the null hypothesis given the finding. Without a strong basis for drawing a conclusion about the probability of the null hypothesis given the finding, there is little justification for rejecting the null hypothesis simply because p < .05. This rejection—the heart of the null hypothesis significance testing procedure—commits the inverse inference fallacy.

We are heartened that researchers seem to understand this fallacy to a greater extent than previously. However, many researchers continue to believe that p remains useful for various other reasons. We consider these reasons next.

  • The replication claim: p informs researchers of the probability that the finding would replicate if the experiment were performed again. This claim is wrong, unless one makes untenable assumptions about the effect size, uses invalid logic, or both. For example, Greenwald, Gonzalez, Harris, and Guthrie (Citation1996) purported to show how p is related to the probability of replication (see Figure 1), but they had to assume that the sample effect size obtained with the first sample equals the population effect size. If this were a tenable assumption, there would be no need of replication; researchers would know the population effect size without the replication. More recently, Killeen (Citation2005) suggested a different formula ostensibly providing the probability of replication as a function of p (the prep statistic). But Trafimow, MacDonald, Rice, and Clason (Citation2010) demonstrated that prep is rarely close to the true probability of replication. In summary, p tells us very little about the probability of replication. The replication claim fails to stand up to close scrutiny in any of the forms published thus far.

  • The chance claim: p informs researchers of the probability that the obtained finding was due to chance. This claim, too, is wrong, because p is a conditional probability. One cannot compute the probability of the finding due to chance unless one knows the population effect size. And if one knows the population effect size, there is no need to do the research.

  • The elimination claim; p provides researchers with a logically valid way to eliminate chance as an alternative explanation for the finding. Because the chance claim is wrong, this claim also is implausible, unless there is a complication that sophisticated defenders of p sometimes provide. Specifically, the argument is that if chance is the only operating factor, under the assumption of the null hypothesis, the finding is unlikely to be true. Therefore, it is not plausible that only chance is operating, under the assumption of the null hypothesis. Unfortunately, this argument commits the inverse inference fallacy in another guise. That is, notwithstanding all the explanations, the argument boils down to inferring the probability of A given B from the probability of B given A, which is invalid. Specifically, the probability of the finding given the combination of the null hypothesis being true and only chance operating is not the same thing as the probability of this combination given the finding. But the goal of the “sophisticated” argument is to reject the combination. This does not work. Because p provides an insufficient basis for rejecting the null hypothesis, it also provides an insufficient basis for rejecting the conjunction of the null hypothesis and another assumption (viz., that only chance is operating).

  • The proper-use claim: Some researchers argue that the many problems critics have associated with the use of p are due to misuse of p. If it were not misused, these researchers claim, p would be valuable. The problem with this argument is that no one who has made it has explained what that value is. What conclusions can validly be drawn from p, other than the conclusion that p provides the probability of the finding (or one more extreme) given the null hypothesis, which is true by definition and hence trivial? We are not told. As we have explained in the foregoing bullet points, any use of p to draw conclusions about hypotheses, replication, the role of chance, and so on, constitutes misuse. Of course p is not misused if no conclusions are drawn from it. But if the only way to avoid misusing p is to draw no conclusions from it, then wherein lies its value? We are not told.

  • The p-is-just-good-to-know claim: Some researchers use the ultimate fallback that p is just good to know. This amounts to an argument by fiat. As a general principle, scientists should distrust argumentation by fiat. Why is p good to know? What conclusions can researchers derive from p where the derivations are logically valid and involve sound premises? Despite the many communications we have received, nobody has answered these questions. Therefore, the p-is-just-good-to-know claim is insufficient.

In conclusion, although we are pleased that so many more researchers now seem to understand that p is insufficient for a logically valid inference about the probability of the null hypothesis, we wish more researchers understood that alternative justifications for p also fail. The replication claim, the chance claim, the elimination claim, the proper-use claim, and the p-is-just-good-to-know claim all collapse when subjected to close scrutiny. Hence, we reiterate the message from our 2015 editorial. The ban on p values will continue. But please stay tuned for comments pertaining to effect sizes in the 2017 editorial.

References

  • Greenwald, A. G., Gonzalez, R, Harris, R. J., & Guthrie, D. (1996). Effect sizes and p values: What should be reported and what should be replicated? Psychophysiology, 33, 175–183. doi:10.1111/j.1469-8986.1996.tb02121.x
  • Killeen, P. R. (2005). An alternative to null-hypothesis significance tests. Psychological Science, 16, 345–353. doi:10.1111/j.0956-7976.2005.01538.x
  • Trafimow, D., MacDonald, J., Rice, S., & Clason, D. L. (2010). How often is prep close to the true replication probability? Psychological Methods, 15, 300–307. doi:10.1037/a0018533
  • Trafimow, D., & Marks, M. (2015). Editorial. Basic and Applied Social Psychology, 37, 1–2. doi:10.1080/01973533.2015.1012991

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.