703
Views
8
CrossRef citations to date
0
Altmetric
Statistical Inference

Valid Inference Corrected for Outlier Removal

ORCID Icon &
Pages 323-334 | Received 23 Jul 2018, Accepted 02 Aug 2019, Published online: 01 Oct 2019
 

Abstract

Ordinary least square (OLS) estimation of a linear regression model is well-known to be highly sensitive to outliers. It is common practice to (1) identify and remove outliers by looking at the data and (2) to fit OLS and form confidence intervals and p-values on the remaining data as if this were the original data collected. This standard “detect-and-forget” approach has been shown to be problematic, and in this article we highlight the fact that it can lead to invalid inference and show how recently developed tools in selective inference can be used to properly account for outlier detection and removal. Our inferential procedures apply to a general class of outlier removal procedures that includes several of the most commonly used approaches. We conduct simulations to corroborate the theoretical results, and we apply our method to three real datasets to illustrate how our inferential results can differ from the traditional detect-and-forget strategy. A companion R package, outference, implements these new procedures with an interface that matches the functions commonly used for inference with lm in R. Supplementary materials for this article are available online.

Supplementary Materials

Supplementary materials for this manuscript: For brevity, we collect proofs of most theoretical results, some additional simulation results, and the implementation details in the online supplementary materials. (.pdf file)

R package outference: R package containing code to perform the inferential methods described in this article. (available at https://github.com/shuxiaoc/outference)

R scripts R scripts to reproduce all figures and simulation results in this article. (.zip file)

Acknowledgments

The authors thank an associate editor for pointing us to the green rating dataset.

Additional information

Funding

The authors gratefully acknowledge the support from an NSF CAREER grant DMS-1653017.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.