ABSTRACT
The parametric Welch t-test and the non-parametric Wilcoxon–Mann–Whitney, empirical and exponential empirical likelihood tests are commonly used for hypothesis testing of two population means. In order to circumvent the inflated type I error problem of the non-parametric likelihood testing procedures, a simple calibration using the t distribution and bootstrapping is proposed. Those testing procedures are then being compared via extensive Monte Carlo simulations on the grounds of type I error and power. Evidence is provided supporting that (a) the t calibration and bootstrap improve the type I error of the non-parametric likelihoods, (b) the Welch t-test attains the type I error and produces high levels of power, and (c) the Wilcoxon–Mann–Whitney test produces inflated type I error while computation of the exact p-value is not feasible in the presence of ties. An application to real gene expression data illustrates the computational superiority of the Welch t-test.
Disclosure statement
No potential conflict of interest was reported by the authors.
ORCID
Michail Tsagris http://orcid.org/0000-0002-2049-3063
Abdulaziz Alenazi http://orcid.org/0000-0002-4758-8871
Nikolaos Pandis http://orcid.org/0000-0003-0258-468X
Notes
1 For a list of small sample improvements (in terms of the estimated probability of type I error) in the case of one population mean see [Citation15].
2 The 3rd and 4th cumulants of the test statistic are and
respectively.
3 A great advantage of bootstrap is the absence of any parametric assumptions about the data or the test statistic.
4 This is the asymptotically normal confidence interval for the true probability of type I error based on 1000 simulations.
5 The term exact stems from computing the p-value based on all possible permutations.
6 When the data come from humans or mice.
7 From a biological point of interest, the data have been uniformly pre-processed, curated and automatically annotated.
8 The generalization to more samples is straightforward.