Abstract
This article introduces DOBIN, a new approach to select a set of basis vectors tailored for outlier detection. DOBIN has a simple mathematical foundation and can be used as a dimension reduction tool for outlier detection tasks. We demonstrate the effectiveness of DOBIN on an extensive data repository, by comparing the performance of outlier detection methods using DOBIN and other bases. We further illustrate the utility of DOBIN as an outlier visualization tool. The R package dobin implements this basis construction. Supplementary materials for this article are available online.
Supplementary Materials
R package dobin: This package contains the DOBIN basis construction.
Datasets: Datasets discussed in Section 5 are available at Kandanaarachchi et al. (Citation2019). The character network from the novel Les Misérables was taken from the R package SOMbrero (Vialaneix et al. Citation2019). The book vectors dataset used in Section 4.5 is available at the github repository (https://github.com/sevvandi/Outlier-Basis).
Scripts: The script Figures_For_Paper_1.R contains the R code used to conduct experiments and produce graphs in Section 3. The script Figures_For_Paper_2.R contains the R code used in Section 4. And finally, the script Figures_For_Paper_3.R contains the R code used to produce the graphs in Section 5. The original computation on the data repository detailed in Section 5 was conducted using the MonARCH HPC Cluster. The results of this computation are available at the github repository (https://github.com/sevvandi/Outlier-Basis).
Other R-packages: In addition to the R package dobin we have also used the R packages OutliersO3 (Unwin Citation2019b), HDoutliers (Fraley 2018), isolationForest (Liu Citation2009), DMwR (Torgo Citation2010), pROC (Robin et al. Citation2011), ggplot2 (Wickham Citation2016), mbgraphic (Grimm Citation2019), gridExtra (Auguie Citation2017), tsutils (Kourentzes Citation2019), dplyr (Wickham et al. Citation2019), and tidyr (Wickham and Henry Citation2020).
Fraley, C. (2018), “HDoutliers: Leland Wilkinson’s Algorithm for Detecting Multidimensional Outliers,” R Package Version 1.0, available at https://CRAN.R-project.org/package=HDoutliers.