7,847
Views
25
CrossRef citations to date
0
Altmetric
Articles

Evaluation of robust outlier detection methods for zero-inflated complex data

ORCID Icon, &
Pages 1144-1167 | Received 19 Nov 2018, Accepted 19 Sep 2019, Published online: 27 Sep 2019

References

  • C. Aggarwal, Outlier Analysis, Springer, New York, 2013.
  • J. Aitchison, The Statistical Analysis of Compositional Data, Chapman & Hall, London, 1986.
  • A. Alfons and M. Templ, Estimation of social exclusion indicators from complex surveys: the R package laeken, J. Stat. Softw. 54 (2013), pp. 1–25. doi: 10.18637/jss.v054.i15
  • A. Alfons, M. Templ and P. Filzmoser, Robust estimation of economic indicators from survey samples based on Pareto tail modelling, J. R. Stat. Soc. C-Appl. 62 (2013), pp. 271–286. doi: 10.1111/j.1467-9876.2012.01063.x
  • F. Bacon and J. Devey, Novum Organum, in Library of universal literature: Science, P. F. Collier (ed.), 1902.
  • V. Barnett and T. Lewis, Outliers in Statistical Data, Wiley Series in Probability & Statistics, Wiley, 1994.
  • C. Béguin and B. Hulliger, Multivariate outlier detection in incomplete survey data: the epidemic algorithm and transformed rank correlations, J. R. Stat. Soc. Ser. A 167 (2004), pp. 275–294. doi: 10.1046/j.1467-985X.2003.00753.x
  • C. Béguin and B. Hulliger, The BACON-EEM algorithm for multivariate outlier detection in incomplete survey data, Surv. Methodol. 34 (2008), pp. 91–103.
  • M. Bill and B. Hulliger, Incomplete business survey data, Aust. J. Stat. 45 (2016), pp. 3–23. doi: 10.17713/ajs.v45i1.86
  • N. Billor, A.S. Hadi and P.F. Vellemann, BACON: blocked adaptative computationally-efficient outlier nominators, Comput. Stat. Data Anan. 34 (2000), pp. 279–298. doi: 10.1016/S0167-9473(99)00101-2
  • G.E.P. Box and D.R. Cox, An analysis of transformations, J. R. Stat. Soc. B Meth. 26 (1964), pp. 211–252.
  • G. Brys, M. Hubert and A. Struyf, A robust measure of skewness, J. Comput. Graph. Stat. 13 (2014), pp. 996–1017. doi: 10.1198/106186004X12632
  • R. Chambers, A. Hentges and X. Zhao, Robust automatic methods for outlier and error detection, J. R. Stat. Soc. A Stat. 167 (2004), pp. 323–339. doi: 10.1111/j.1467-985X.2004.00748.x
  • M. Danilov, V.J. Yohai and R.H. Zamar, Robust estimation of multivariate location and scatter in the presence of missing data, J. Am. Stat. Assoc. 107 (2012), pp. 1178–1186. doi: 10.1080/01621459.2012.699792
  • P.L. Davies, Asymptotic behavior of S-estimators of multivariate location parameters and dispersion matrices, Ann. Stat. 15 (1987), pp. 1269–1292. doi: 10.1214/aos/1176350505
  • T. De Waal, Statistical data editing, in: Handbook of Statistics 29A. Sample Surveys: Design, Methods and Applications, D. Peffermann and C. Rao, eds., Elsevier B. V., Amsterdam, The Netherlands, 2009, pp. 187–214.
  • S.J. Devlin, R. Gnanadesikan and J.R. Kettenring, Robust estimation of dispersion matrices and principal components, J. Am. Stat. Assoc. 76 (1981), pp. 354–362. doi: 10.1080/01621459.1981.10477654
  • D.L. Donoho, Breakdown properties of multivariate location estimators, Ph.D thesis, Harvard University, Boston 1982.
  • O. Dupriez, Building a household consumption database for the calculation of poverty PPPs, Technical Note, Draft 1.0, World Bank 2007.
  • D. Dupuis and M.P. Victoria-Feser, A robust prediction error criterion for Pareto modelling of upper tails, Can. J. Stat. 34 (2006), pp. 639–658. doi: 10.1002/cjs.5550340406
  • F. Edgeworth, Xxxiii. the choice of means, Philos. Magaz. Ser. 5 24 (1887), pp. 268–271. doi: 10.1080/14786448708628093
  • J. Egozcue, V. Pawlowsky-Glahn, G. Mateu-Figueras and C. Barceló-Vidal, Isometric logratio transformations for compositional data analysis, Math. Geol. 35 (2003), pp. 279–300. doi: 10.1023/A:1023818214614
  • P. Filzmoser, J. Gussenbauer and M. Templ, Detecting outliers in household consumption survey data, Tech. rep., Vienna University of Technology, Vienna, Austria, deliverable 4. Final Report. Contract with the world bank (1157976), 2016.
  • P. Filzmoser and K. Hron, Outlier detection for compositional data using robust methods, Math. Geosci. 40 (2008), pp. 233–248. doi: 10.1007/s11004-007-9141-5
  • P. Filzmoser, K. Hron and M. Templ, Applied Compositional Data Analysis, Springer Series in Statistics, Springer, Cham, 2018.
  • P. Filzmoser, A. Ruiz-Gazen and C. Thomas-Agnan, Identification of local multivariate outliers, Statistical Papers 55 (2014), pp. 29–47. doi: 10.1007/s00362-013-0524-z
  • R. Fried, Robust filtering of time series with trends, J. Nonparametr. Stat. 16 (2004), pp. 313–328. doi: 10.1080/10485250410001656444
  • C. Gini, Variabilitae mutabilita, Tipografia di Paolo Cuppin, Tipogr. di P. Cuppini, Bologna, 1912, pp. 221–382.
  • R. Gnanadesikan and J.R. Kettenring, Robust estimates, residuals, and outlier detection with multiresponse data, Biometrics 28 (1972), pp. 81–124. doi: 10.2307/2528963
  • D. Hawkins, Identification of Outliers, Monographs on applied probability and statistics, Chapman and Hall, London, New York, 1980.
  • H. Huang, K. Mehrotra and C. Mohan, Rank-based outlier detection, J. Stat. Comput. Simul. 83 (2013), pp. 518–531. doi: 10.1080/00949655.2011.621124
  • M. Hubert and E. Vandervieren, An adjusted boxplot for skewed distributions, Comput. Stat. Data. Anal. 52 (2008), pp. 5186–5201. doi: 10.1016/j.csda.2007.11.008
  • B. Hulliger, Johann Heinrich Lambert: an admirable applied statistician, Bullet. Swiss Stat. Soc. 14 (2013), pp. 4–10.
  • B. Hulliger, A. Alfons, P. Filzmoser, A. Meraner, T. Schoch and M. Templ, Robust methodology for laeken indicators, Research Project Report WP4 – D4.2, FP7-SSH-2007-217322 AMELI, 2011.
  • C. Kleiber and S. Kotz, Statistical Size Distributions in Economics and Actuarial Sciences, John Wiley and Sons, Hoboken, NJ, 2003. ISBN 0-471-15064-9.
  • A. Kowarik and M. Templ, Imputation with the R package VIM, J. Stat. Softw. 74 (2016), pp. 1–16. doi: 10.18637/jss.v074.i07
  • J.H. Lambert. Lambert's Photometrie. Translation into German by E. Anding. Wilhelm Engelmann, Leibzig, 1760/1892. Original work in Latin published 1760 by Klett.
  • H. Lee and Y. Van Hui, Outliers detection in time series, J. Stat. Comput. Simul. 45 (1993), pp. 77–95. doi: 10.1080/00949659308811473
  • A. Leung, V. Yohai and R. Zamar, Multivariate location and scatter matrix estimation under cellwise and casewise contamination, 2016. Available at arXiv:1609.00402
  • H.P. Lopuhaä, On the relation between S-estimators and M-estimators of multivariate location and covariance, Ann. Stat. 17 (1989), pp. 1662–1683. doi: 10.1214/aos/1176347386
  • M.O. Lorenz, Methods for measuring the concentration of wealth, Amer. Stat. Assoc. 9 (1905), pp. 209–219.
  • A. Marazzi and V. Yohai, Robust Box-Cox transformations based on minimum residual autocorrelations, Comput. Stat. Data Anal. 50 (2006), pp. 2752–2768. doi: 10.1016/j.csda.2005.04.007
  • R.A. Maronna, Robust M-estimators of multivariate location and scatter, Ann. Stat. 1 (1976), pp. 51–67. doi: 10.1214/aos/1176343347
  • R.A. Maronna and R.H. Zamar, Robust estimation of location and dispersion for high-dimensional datasets, Technometrics 44 (2002), pp. 307–317. doi: 10.1198/004017002188618509
  • R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, version 3.5.3. 2019.
  • D.M. Rocke, Robustness properties of S-estimators of multivariate location and shape in high dimension, Ann. Stat. 24 (1996), pp. 1327–1345. doi: 10.1214/aos/1032526972
  • P.J. Rousseeuw, Multivariate estimation with high breakdown point, in: W. Grossmann, G. Pflug, I. Vincze, and W. Wertz, eds., Mathematical Statistics and Applications Vol. B, Reidel Publishing, Dordrecht, 1985, pp. 283–297.
  • P.J. Rousseeuw and A.M. Leroy, Robust Regression and Outlier Detection, John Wiley & Sons Inc., New York, NY, 1987.
  • P.J. Rousseeuw and K. Van Driessen, A fast algorithm for the minimum covariance determinant estimator, Technometrics 41 (1999), pp. 212–223. doi: 10.1080/00401706.1999.10485670
  • W.A. Stahel, Breakdown of covariance estimators, Research Report 31, ETH Zürich, Fachgruppe für Statistik 1981.
  • W.A. Stahel, Robuste Schätzungen: Infinitesimale Optimalität und Schätzungen von Kovarianzmatrizen, Ph.D. thesis no. 6881, Swiss Federal Institute of Technology (ETH), Zürich 1981.
  • K.S. Tatsuoka and D.E. Tyler, The uniqueness of S and M-functionals under nonelliptical distributions, Ann Stat 28 (2000), pp. 1219–1243. doi: 10.1214/aos/1015956714
  • M. Templ, K. Hron and P. Filzmoser, robCompositions: An R-package for Robust Statistical Analysis of Compositional Data, John Wiley and Sons, Hoboken, NJ. 2011.
  • M. Templ, K. Hron and P. Filzmoser, Exploratory tools for outlier detection in compositional data with structural zeros, J. Appl. Stat. 44 (2017), pp. 734–752. doi: 10.1080/02664763.2016.1182135
  • T. Todorov and P. Filzmoser, An object oriented framework for robust multivariate analysis, J. Stat. Softw. 32 (2009), pp. 1–47. doi: 10.18637/jss.v032.i03
  • V. Todorov, M. Templ and P. Filzmoser, Detection of multivariate outliers in business survey data with incomplete information, Adv. Data. Anal. Classif. 5 (2011), pp. 37–56. doi: 10.1007/s11634-010-0075-2
  • P. Van Kerm Extreme incomes and the estimation of poverty and inequality indicators from EU-SILC, IRISS Working Paper Series, 2007-01, CEPS/INSTEAD 2007.
  • E. Vandervieren and M. Hubert, An adjusted boxplot for skewed distributions, Comput. Stat. Data Anal. 52 (2008), pp. 5186–5201. doi: 10.1016/j.csda.2007.11.008
  • B. Vandewalle, J. Beirlant, A. Christmann and M. Hubert, A robust estimator for the tail index of Pareto-type distributions, Comput. Stat. Data Anal. 51 (2007), pp. 6252–6268. doi: 10.1016/j.csda.2007.01.003
  • A. Zimek and P. Filzmoser, There and back again: Outlier detection between statistical reasoning and data mining algorithms, WIREs Data Mining and Knowledge Discovery, 8, doi:10.1002/widm.1280