Figures & data
Table 1. Overview of univariate and multivariate outlier detection methods addressed.
Figure 1. Top: Estimates of the Gini coefficient (left) and variance of the Gini coefficient (right) for the Albanian data set after univariate outlier detection methods as well as outlier imputation have been applied. Bottom: Share of upper and lower outliers for each outlier detection scheme applied to the Albanian data set.
![Figure 1. Top: Estimates of the Gini coefficient (left) and variance of the Gini coefficient (right) for the Albanian data set after univariate outlier detection methods as well as outlier imputation have been applied. Bottom: Share of upper and lower outliers for each outlier detection scheme applied to the Albanian data set.](/cms/asset/45156b4b-1c4b-49c5-901d-2a8fd01d4e8a/cjas_a_1671961_f0001_oc.jpg)
Figure 2. Top: Estimates of Gini coefficient (left) and variance of Gini coefficient (right) for the Albanian data set after multivariate outlier detection methods as well as outlier imputation have been applied. Bottom: Share of outliers detected by multivariate outlier detection methods for the Albanian household data.
![Figure 2. Top: Estimates of Gini coefficient (left) and variance of Gini coefficient (right) for the Albanian data set after multivariate outlier detection methods as well as outlier imputation have been applied. Bottom: Share of outliers detected by multivariate outlier detection methods for the Albanian household data.](/cms/asset/ba1f6ab4-8e23-4f7a-9f69-f23c99564bb0/cjas_a_1671961_f0002_oc.jpg)
Figure 3. Estimated Gini coefficients for different levels of ε and different outlier detection methods. The dashed line indicates a baseline representing the median of the Gini coefficients of the uncontaminated data.
![Figure 3. Estimated Gini coefficients for different levels of ε and different outlier detection methods. The dashed line indicates a baseline representing the median of the Gini coefficients of the uncontaminated data.](/cms/asset/f5e02356-504a-4c4d-85b1-851d5f0f74d3/cjas_a_1671961_f0003_oc.jpg)
Figure 4. Boxplots of successfully detected artificial outliers, where the whole observation was contaminated, for different outlier detection methods and different levels of ε.
![Figure 4. Boxplots of successfully detected artificial outliers, where the whole observation was contaminated, for different outlier detection methods and different levels of ε.](/cms/asset/32174b06-2aff-440f-b03a-02c6f68d3b3f/cjas_a_1671961_f0004_oc.jpg)
Figure 5. Boxplots of successfully detected artificial outliers, where only single cells were contaminated, for different outlier detection methods and different levels of ε.
![Figure 5. Boxplots of successfully detected artificial outliers, where only single cells were contaminated, for different outlier detection methods and different levels of ε.](/cms/asset/05575404-e7d9-4051-a57d-1e0ca79b36cc/cjas_a_1671961_f0005_oc.jpg)
Figure 6. Share of false/positive outliers to number of clean data points for different outlier detection methods and different levels of ε.
![Figure 6. Share of false/positive outliers to number of clean data points for different outlier detection methods and different levels of ε.](/cms/asset/779ba255-9b17-4d0d-acd0-750b7f415595/cjas_a_1671961_f0006_oc.jpg)
Table 2. Percentages of misclassified observations for different levels of ε for the univariate methods.
Figure 7. Boxplots of calculated Gini coefficients for different outlier detection methods and different levels of ε. The dashed line indicates a baseline representing the median of the Gini coefficients of the uncontaminated data.
![Figure 7. Boxplots of calculated Gini coefficients for different outlier detection methods and different levels of ε. The dashed line indicates a baseline representing the median of the Gini coefficients of the uncontaminated data.](/cms/asset/69f7b63a-5bf2-44cb-8467-2c6865c08a3c/cjas_a_1671961_f0007_oc.jpg)
Figure 8. Boxplots of successfully detected artificial outliers, where the whole observation was contaminated, for different outlier detection methods and different levels of ε.
![Figure 8. Boxplots of successfully detected artificial outliers, where the whole observation was contaminated, for different outlier detection methods and different levels of ε.](/cms/asset/6e61b502-7fb4-46c0-b1b2-556c671934b4/cjas_a_1671961_f0008_oc.jpg)
Figure 9. Boxplots of successfully detected artificial outliers, where only single cells were contaminated, for different outlier detection methods and different levels of ε.
![Figure 9. Boxplots of successfully detected artificial outliers, where only single cells were contaminated, for different outlier detection methods and different levels of ε.](/cms/asset/3f34d097-e6ce-4d52-89bf-77ae42cd6ea6/cjas_a_1671961_f0009_oc.jpg)
Figure 10. Boxplots of share of false/positive outliers to number of clean data points for different outlier detection methods and different levels of ε.
![Figure 10. Boxplots of share of false/positive outliers to number of clean data points for different outlier detection methods and different levels of ε.](/cms/asset/5971053c-d90d-4ef4-a416-c0be3b93559b/cjas_a_1671961_f0010_oc.jpg)