Abstract
Cluster analysis is an important technique of explorative data mining. It refers to a collection of statistical methods for learning the structure of data by solely exploring pairwise distances or similarities. Often meaningful structures are not detectable in these high-dimensional feature spaces. Relevant features can be obfuscated by noise from irrelevant measurements. These observations led to the design of subspace clustering algorithms, which can identify clusters that originate from different subsets of features. Hunting for clusters in arbitrary subspaces is intractable due to the infinite search space spanned by all feature combinations. In this work, we present a subspace clustering algorithm that can be applied for exhaustively screening all feature combinations of small- or medium-sized datasets (approximately 30 features). Based on a robustness analysis via subsampling we are able to identify a set of stable candidate subspace cluster solutions.
Funding
This work was funded in part by grants to HAK from the German Science Foundation (DFG, SFB1074, Project Z1) and the German Federal Ministry of Education and Research (BMBF) within the framework GERONTOSYS (Forschungskern SyStaR, project ID 0315894A).