Abstract
One of the most basic topics in many introductory statistical methods texts is inference for a population mean, μ. The primary tool for confidence intervals and tests is the Student t sampling distribution. Although the derivation requires independent identically distributed normal random variables with constant variance, σ2, most authors reassure the readers about some robustness to the normality and constant variance assumptions. Some point out that if one is concerned about assumptions, one may statistically test these prior to reliance on the Student t. Most software packages provide optional test results for both (a) the Gaussian assumption and (b) homogeneity of variance. Many textbooks advise only informal graphical assessments, such as certain scatterplots for independence, others for constant variance, and normal quantile–quantile plots for the adequacy of the Gaussian model. We concur with this recommendation. As convincing evidence against formal tests of (a), such as the Shapiro–Wilk, we offer a simulation study of the tails of the resulting conditional sampling distributions of the Studentized mean. We analyze the results of systematically screening all samples from normal, uniform, exponential, and Cauchy populations. This pretest does not correct the erroneous significance levels and makes matters worse for the exponential. In practice, we conclude that graphical diagnostics are better than a formal pretest. Furthermore, rank or permutation methods are recommended for exact validity in the symmetric case.
Mathematics Subject Classification:
Acknowledgment
We would like to thank Ian Harris and Lynne Stokes for their comments on a preliminary version of the article. We greatly appreciate the issues raised by Pat Carmack, Mike Ernst, and Rob Easterling.
Notes
**-exact type-I error rate=4.88%