Abstract
Finding the “K most likely outliers”, i.e. those K observations whose removal from the data most reduces the sum of squared residuals from a linear model, can require lengthy computation. The identification and use of equivalence classes among all possible outlier cell configurations in a two-way table results in a relatively efficient algorithm, which can itself be shortened at the cost of some uncertainty in the results. For general linear models, it is shown that under certain ideal circumstances, the K most likely outliers procedure and two simpler procedures can be expected to identify outlier cells correctly. In practice, the K most likely outliers method is recommended as being safer. Finding the K most likely outliers is shown to be equivalent to minimizing the joint probability of K residuals, after adjusting for differing covariance matrices.