20
Views
1
CrossRef citations to date
0
Altmetric
Articles

Efficient multi-cluster feature selection on text data

&
 

Abstract

Feature selection methods are used to obtain relevant feature subset from the original feature space that is of high dimension for efficient classification and clustering of data. Most real world datasets are of multi-cluster nature with correlation amongst the features. This paper proposes a new method of multi-cluster feature selection, called Efficient Multi-Cluster Feature Selection (EMCFS). It obtains only the features that can best preserve the multiple cluster structure of the data. It employs the anchor graph to build the adjacency matrix of much reduced dimension than the feature space. The eigen vector values of the graph Laplacian model the underlying geometric structure of the data. The experimental result on TDT2 and Reuters-21578 text data set demonstrates the efficiency of the proposed method. A comparison of EMCFS with the original Multi-Cluster Feature Selection (MCFS) demonstrates its improved accuracy and reduced execution time, making it a promising method for real world high dimensional datasets.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.