Abstract
The question of how to handle outliers in data sets has been the subject of heated debate for several centuries. In a regression setting there is the added difficulty that the observation can be outlying in terms of the response variable or the explanatory variables. Various approaches have been used to deal with this, from rejection of the outlying observation, through accommodation, to changing the model assumptions. Generally inference in regression assumes a classical linear model with normally distributed errors. Is it reasonable to make inference in the traditional manner while using outlier rejection or accommodation techniques? The increase in interest in this subject over the last few years has resulted in many more methods for dealing with outliers becoming available in current statistical software. Three high breakdown methods which handle outliers and provide inference are examined by simulation to discover whether inference in this setting is reliable, under what conditions the outliers are handled appropriately, and which of the methods is most suitable for a particular situation. Outliers were generated using either a mean shift model, or a heavy tailed distribution. The proportion of outliers was varied, as was their location in the design space.