Abstract
Feature selection methods are used to obtain relevant feature subset from the original feature space that is of high dimension for efficient classification and clustering of data. Most real world datasets are of multi-cluster nature with correlation amongst the features. This paper proposes a new method of multi-cluster feature selection, called Efficient Multi-Cluster Feature Selection (EMCFS). It obtains only the features that can best preserve the multiple cluster structure of the data. It employs the anchor graph to build the adjacency matrix of much reduced dimension than the feature space. The eigen vector values of the graph Laplacian model the underlying geometric structure of the data. The experimental result on TDT2 and Reuters-21578 text data set demonstrates the efficiency of the proposed method. A comparison of EMCFS with the original Multi-Cluster Feature Selection (MCFS) demonstrates its improved accuracy and reduced execution time, making it a promising method for real world high dimensional datasets.
Subject Classification:
Keywords: