511
Views
3
CrossRef citations to date
0
Altmetric
Articles

Dimension Reduction for Outlier Detection Using DOBIN

ORCID Icon & ORCID Icon
Pages 204-219 | Received 22 Aug 2019, Accepted 03 Aug 2020, Published online: 28 Sep 2020
 

Abstract

This article introduces DOBIN, a new approach to select a set of basis vectors tailored for outlier detection. DOBIN has a simple mathematical foundation and can be used as a dimension reduction tool for outlier detection tasks. We demonstrate the effectiveness of DOBIN on an extensive data repository, by comparing the performance of outlier detection methods using DOBIN and other bases. We further illustrate the utility of DOBIN as an outlier visualization tool. The R package dobin implements this basis construction. Supplementary materials for this article are available online.

Supplementary Materials

R package dobin: This package contains the DOBIN basis construction.

Datasets: Datasets discussed in Section 5 are available at Kandanaarachchi et al. (Citation2019). The character network from the novel Les Misérables was taken from the R package SOMbrero (Vialaneix et al. Citation2019). The book vectors dataset used in Section 4.5 is available at the github repository (https://github.com/sevvandi/Outlier-Basis).

Scripts: The script Figures_For_Paper_1.R contains the R code used to conduct experiments and produce graphs in Section 3. The script Figures_For_Paper_2.R contains the R code used in Section 4. And finally, the script Figures_For_Paper_3.R contains the R code used to produce the graphs in Section 5. The original computation on the data repository detailed in Section 5 was conducted using the MonARCH HPC Cluster. The results of this computation are available at the github repository (https://github.com/sevvandi/Outlier-Basis).

Other R-packages: In addition to the R package dobin we have also used the R packages OutliersO3 (Unwin Citation2019b), HDoutliers (Fraley 2018), isolationForest (Liu Citation2009), DMwR (Torgo Citation2010), pROC (Robin et al. Citation2011), ggplot2 (Wickham Citation2016), mbgraphic (Grimm Citation2019), gridExtra (Auguie Citation2017), tsutils (Kourentzes Citation2019), dplyr (Wickham et al. Citation2019), and tidyr (Wickham and Henry Citation2020).

Fraley, C. (2018), “HDoutliers: Leland Wilkinson’s Algorithm for Detecting Multidimensional Outliers,” R Package Version 1.0, available at https://CRAN.R-project.org/package=HDoutliers.

Additional information

Funding

Funding was provided by the Australian Research Council through the Linkage Project LP160101885. This research was also supported by the Monash eResearch Centre and eSolutions-Research Support Services through the MonARCH HPC Cluster.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 180.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.