Search in:

Journal of Computational and Graphical Statistics Volume 29, 2020 - Issue 2

Submit an article Journal homepage

703

Views

CrossRef citations to date

Altmetric

Statistical Inference

Valid Inference Corrected for Outlier Removal

Shuxiao Chena Department of Statistics, University of Pennsylvania, Philadelphia, PA; Correspondence[email protected]

https://orcid.org/0000-0001-9028-1624 View further author information

Jacob Bienb Data Sciences and Operations, University of Southern California, Los Angeles, CAView further author information

Pages 323-334 | Received 23 Jul 2018, Accepted 02 Aug 2019, Published online: 01 Oct 2019

Cite this article
https://doi.org/10.1080/10618600.2019.1660180
CrossMark

Sample our Mathematics & Statistics journals, sign in here to start your FREE access for 14 days

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions
Read this article /doi/full/10.1080/10618600.2019.1660180?needAccess=true

Abstract

Ordinary least square (OLS) estimation of a linear regression model is well-known to be highly sensitive to outliers. It is common practice to (1) identify and remove outliers by looking at the data and (2) to fit OLS and form confidence intervals and p-values on the remaining data as if this were the original data collected. This standard “detect-and-forget” approach has been shown to be problematic, and in this article we highlight the fact that it can lead to invalid inference and show how recently developed tools in selective inference can be used to properly account for outlier detection and removal. Our inferential procedures apply to a general class of outlier removal procedures that includes several of the most commonly used approaches. We conduct simulations to corroborate the theoretical results, and we apply our method to three real datasets to illustrate how our inferential results can differ from the traditional detect-and-forget strategy. A companion R package, outference, implements these new procedures with an interface that matches the functions commonly used for inference with lm in R. Supplementary materials for this article are available online.

Keywords:

Confidence intervals
Linear regression
Outlier
p-value
Selective inference

Supplementary Materials

Supplementary materials for this manuscript: For brevity, we collect proofs of most theoretical results, some additional simulation results, and the implementation details in the online supplementary materials. (.pdf file)

R package outference: R package containing code to perform the inferential methods described in this article. (available at https://github.com/shuxiaoc/outference)

R scripts R scripts to reproduce all figures and simulation results in this article. (.zip file)

Acknowledgments

The authors thank an associate editor for pointing us to the green rating dataset.

Additional information

Funding

The authors gratefully acknowledge the support from an NSF CAREER grant DMS-1653017.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related Research Data

A journey in single steps: robust one-step M-estimation in linear regression

Source: Elsevier BV

[Influential Observations, High Leverage Points, and Outliers in Linear Regression]: Comment: Aspects of Diagnostic Regression Analysis

Source: The Institute of Mathematical Statistics

Theory and Methods

Source: Wiley

HIGH BREAKDOWN-POINT AND HIGH EFFICIENCY ROBUST ESTIMATES FOR REGRESSION

Source: The Institute of Mathematical Statistics

Regression Shrinkage and Selection via the Lasso

Source: Wiley

Procedures for the Identification of Multiple Outliers in Linear Models

Source: Informa UK Limited

A method for simultaneous variable selection and outlier identification in linear regression

Source: Elsevier BV

Valid Inference Corrected for Outlier Removal

Source: Figshare

Outlier Detection Using Nonconvex Penalized Regression

Source: arXiv

Procedures for the Identification of Multiple Outliers in Linear Models

Source: Informa UK Limited

Adjusted Bayesian Inference

Source: Wiley

Robust Estimation of a Location Parameter

Source: Institute of Mathematical Statistics

Least Median of Squares Regression

Source: Informa UK Limited

Doing Well by Doing Good: Green Office Buildings

Source: eScholarship, University of California

A Robust Version of the Probability Ratio Test

Source: The Institute of Mathematical Statistics

HIGH-DIMENSIONAL INFLUENCE MEASURE

Source: arXiv

Detection of Influential Observation in Linear Regression

Source: Informa UK Limited

A Note on Restricted Maximum Likelihood Estimation with an Alternative Outlier Model

Source: Wiley

Least Median of Squares Regression

Source: Informa UK Limited

Selective inference with a randomized response

Source: arXiv

Linking provided by

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Valid Inference Corrected for Outlier Removal

Related Research Data

Information for

Open access

Opportunities

Help and information

Valid Inference Corrected for Outlier Removal

Abstract

Supplementary Materials

Acknowledgments

Additional information

Funding

Reprints and Corporate Permissions

Academic Permissions

Related Research Data

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature