104
Views
14
CrossRef citations to date
0
Altmetric
Original Articles

Incremental Forward Feature Selection with Application to Microarray Gene Expression Data

, &
Pages 827-840 | Received 17 Jul 2007, Accepted 05 Mar 2008, Published online: 10 Sep 2008
 

Abstract

In this study, the authors propose a new feature selection scheme, the incremental forward feature selection, which is inspired by incremental reduced support vector machines. In their method, a new feature is added into the current selected feature subset if it will bring in the most extra information. This information is measured by using the distance between the new feature vector and the column space spanned by current feature subset. The incremental forward feature selection scheme can exclude highly linear correlated features that provide redundant information and might degrade the efficiency of learning algorithms. The method is compared with the weight score approach and the 1-norm support vector machine on two well-known microarray gene expression data sets, the acute leukemia and colon cancer data sets. These two data sets have a very few observations but huge number of genes. The linear smooth support vector machine was applied to the feature subsets selected by these three schemes respectively and obtained a slightly better classification results in the 1-norm support vector machine and incremental forward feature selection. Finally, the authors claim that the rest of genes still contain some useful information. The previous selected features are iteratively removed from the data sets and the feature selection and classification steps are repeated for four rounds. The results show that there are many distinct feature subsets that can provide enough information for classification tasks in these two microarray gene expression data sets.

Notes

Golub = Golub et al., Citation1999; Weston (2001) = Weston et al., Citation2001; Guyon = Guyon et al., Citation2002; Zhu = Zhu et al., Citation2004; N/A = denote not available results.

Weston (2001) = Weston et al., Citation2001; Guyon = Guyon et al., Citation2002; Weston (2003) = Weston et al., Citation2003.

Round 1 = select genes from the original data set; Round 2 = select genes from the remaining genes of Round 1; Round 3 = select genes from the remaining genes of Round 2; Round 4 = select genes from the remaining genes of Round 3.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.