Abstract
Boxplots are among the most widely used exploratory data analysis (EDA) tools in statistical practice. Typical applications of boxplots include eliciting information about the underlying distribution (shape, location, etc.) as well as identifying possible outliers. This article focuses on a modification using a type of lower and upper fences similar in concept to those used in a traditional boxplot; however, instead of constructing the upper and lower fences using the upper and lower quartiles, respectively, and a multiple of the interquartile range (IQR), multiples of the upper and the lower semi-interquartile ranges (SIQR), respectively, measured from the sample median, are used. Any observation beyond the proposed fences is labeled a potential outlier. An exact expression for the probability that at least one sample observation is wrongly classified as an outlier, the so-called “some-outside rate per sample” (Hoaglin et al. (1986)), is derived for the family of location-scale distributions and is used in the determination of the fence constants. Tables for the fence constants are provided for a number of well-known location-scale distributions along with some illustrations with data; the performance of the outlier detection rule is explored in a simulation study.
Acknowledgments
The authors thank the associate editor and an anonymous referee for their helpful comments which helped improve the manuscript. This research was supported in part by the SARChI Chair at the University of Pretoria, South Africa.