Abstract
Principal component analysis is a widely used technique that provides an optimal lower-dimensional approximation to multivariate or functional datasets. These approximations can be very useful in identifying potential outliers among high-dimensional or functional observations. In this article, we propose a new class of estimators for principal components based on robust scale estimators. For a fixed dimension q, we robustly estimate the q-dimensional linear space that provides the best prediction for the data, in the sense of minimizing the sum of robust scale estimators of the coordinates of the residuals. We also study an extension to the infinite-dimensional case. Our method is consistent for elliptical random vectors, and is Fisher consistent for elliptically distributed random elements on arbitrary Hilbert spaces. Numerical experiments show that our proposal is highly competitive when compared with other methods. We illustrate our approach on a real dataset, where the robust estimator discovers atypical observations that would have been missed otherwise. Supplementary materials for this article are available online.
Additional information
Notes on contributors
Graciela Boente
Graciela Boente is Full Professor, Departamento de Matemáticas, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Ciudad Universitaria, Pabellón 1, Buenos Aires 1428, Argentina (E-mail: [email protected]). She also has a researcher position at the CONICET. Matías Salibian-Barrera is Associate Professor, Department of Statistics, University of British Columbia, 3182 Earth Sciences Building, 22007 Main Mall, Vancouver, BC, V6T 1Z4, Canada (E-mail: [email protected]). This research was partially supported by Grants pip 112-201101-00339 from conicet, pict 0397 from anpcyt, and w276 from the Universidad de Buenos Aires at Buenos Aires, Argentina (G. Boente) and Discovery Grant of the Natural Sciences and Engineering Research Council of Canada (M. Salibián Barrera). The authors thank the associate editor and three anonymous referees for valuable comments that led to an improved version of the original article.
Matías Salibian-Barrera
Graciela Boente is Full Professor, Departamento de Matemáticas, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Ciudad Universitaria, Pabellón 1, Buenos Aires 1428, Argentina (E-mail: [email protected]). She also has a researcher position at the CONICET. Matías Salibian-Barrera is Associate Professor, Department of Statistics, University of British Columbia, 3182 Earth Sciences Building, 22007 Main Mall, Vancouver, BC, V6T 1Z4, Canada (E-mail: [email protected]). This research was partially supported by Grants pip 112-201101-00339 from conicet, pict 0397 from anpcyt, and w276 from the Universidad de Buenos Aires at Buenos Aires, Argentina (G. Boente) and Discovery Grant of the Natural Sciences and Engineering Research Council of Canada (M. Salibián Barrera). The authors thank the associate editor and three anonymous referees for valuable comments that led to an improved version of the original article.