147
Views
13
CrossRef citations to date
0
Altmetric
Methodology

Feature selection and survival modeling in The Cancer Genome Atlas

&
Pages 57-62 | Published online: 16 Sep 2013
 

Abstract

Purpose

Personalized medicine is predicated on the concept of identifying subgroups of a common disease for better treatment. Identifying biomarkers that predict disease subtypes has been a major focus of biomedical science. In the era of genome-wide profiling, there is controversy as to the optimal number of genes as an input of a feature selection algorithm for survival modeling.

Patients and methods

The expression profiles and outcomes of 544 patients were retrieved from The Cancer Genome Atlas. We compared four different survival prediction methods: (1) 1-nearest neighbor (1-NN) survival prediction method; (2) random patient selection method and a Cox-based regression method with nested cross-validation; (3) least absolute shrinkage and selection operator (LASSO) optimization using whole-genome gene expression profiles; or (4) gene expression profiles of cancer pathway genes.

Results

The 1-NN method performed better than the random patient selection method in terms of survival predictions, although it does not include a feature selection step. The Cox-based regression method with LASSO optimization using whole-genome gene expression data demonstrated higher survival prediction power than the 1-NN method, but was outperformed by the same method when using gene expression profiles of cancer pathway genes alone.

Conclusion

The 1-NN survival prediction method may require more patients for better performance, even when omitting censored data. Using preexisting biological knowledge for survival prediction is reasonable as a means to understand the biological system of a cancer, unless the analysis goal is to identify completely unknown genes relevant to cancer biology.

Acknowledgments

This study was funded by a faculty start-up grant in the Division of Informatics, Department of Pathology, University of Alabama at Birmingham (UAB) School of Medicine. We are indebted to Noah Simon at Stanford University for discussions about his R package, Coxnet.

Disclosure

The authors report no conflicts of interest in this work.