21,418
Views
49
CrossRef citations to date
0
Altmetric
Teacher's Corner

The Wilcoxon–Mann–Whitney Procedure Fails as a Test of Medians

ORCID Icon, , &
Pages 278-286 | Received 01 Jan 2015, Published online: 15 Mar 2018
 

ABSTRACT

To illustrate and document the tenuous connection between the Wilcoxon–Mann–Whitney (WMW) procedure and medians, its relationship to mean ranks is first contrasted with the relationship of a t-test to means. The quantity actually tested: Pr ^(X1<X2)+ Pr ^(X1=X2)/2 is then described and recommended as the basis for an alternative summary statistic that can be employed instead of medians. In order to graphically represent an estimate of the quantity: Pr(X1 < X2) + Pr(X1 = X2)/2, use of a bubble plot, an ROC curve and a dominance diagram are illustrated. Several counter-examples (real and constructed) are presented, all demonstrating that the WMW procedure fails to be a test of medians. The discussion also addresses another, less common and perhaps less clear cut, but potentially even more important misconception: that the WMW procedure requires continuous data in order to be valid. Discussion of other issues surrounding the question of the WMW procedure and medians is presented, along with the authors' teaching experience with the topic. SAS code used for the examples is included as supplementary material.

Supplementary Materials

The online supplementary materials contain the counterexamples presented in the article, and the SAS programs.

Acknowledgments

The authors are grateful to Elizabeth Stewart (Henry Ford Hospital) for help with formatting, to Elizabeth Furest (Henry Ford Hospital) for editorial assistance, and to the referees and editors for their careful review and comments.

Notes

1 Pr ^(X1<X2)+ Pr ^(X1=X2)/2 represents the sample estimate. (For instance U/n1n2, for the Mann–Whitney formulation.) For the underlying population quantity: p′′ = Pr(X1 < X2) + Pr(X1 = X2)/2, the evaluation is over all possible values of X1 and X2. (For continuous distributions Pr(X1 = X2)/2 is equal to 0.)

2 The computing formula is different, but to a close approximation, if D is the number of different levels observed and Pc is the proportion of observations tied at the cth level, the ties adjusted variance is equal to (1-c=1DPc3) times the variance without ties.

3 More formally, Lehmann states the condition as the “max (di/N) is bounded away from 1 as N tends to infinity.” Or that there exists a positive number ϵ < 1, such that for all i, di/N <= 1 - ϵ, where the di are the numbers of observations tied at each possible value. (The sum d1 + ⋅⋅⋅ + de = N.)

4 Reiczige et al. (Citation2005) called such a shift “simply nonsense” for their example of parasite infection counts.

5 This edition includes a somewhat qualifying footnote, which reads: “*Adjustments for ties are available with the Wilcoxon rank sum test. Consult the references at the end of this chapter.” (Presumably the newest (2017) edition of the other McClave and Sincich, text has the same qualifier.)

6 The basic form of the fallacy is given “If A then B” and not A, erroneously concluding not B. In this case A would be “the shift alternative holds” and B would be “the WMW test is valid.”

7 In the rare situation (for instance due to one or more extreme outliers) when the differences between the raw means and the mean ranks go in different directions, Pr ^(X < Y) will be equal to 1 – c, instead of being equal to c.

8 More discussion about the WMW and Wilcoxon signed rank tests can be found in Divine et al. (Citation2013).