Abstract
A combination of microarrays with classification methods is a promising approach to supporting clinical management decisions in oncology. The aim of this paper is to systematically benchmark the role of classification models. Each classification model is a combination of one feature extraction method and one classification method. We consider four feature extraction methods and five classification methods, from which 20 classification models can be derived. The feature extraction methods are t-statistics, non-parametric Wilcoxon statistics, ad hoc signal-to-noise statistics, and principal component analysis (PCA), and the classification methods are Fisher linear discriminant analysis (FLDA), the support vector machine (SVM), the k nearest-neighbour classifier (kNN), diagonal linear discriminant analysis (DLDA), and diagonal quadratic discriminant analysis (DQDA). Twenty randomizations of each of three binary cancer classification problems derived from publicly available datasets are examined. PCA plus FLDA is found to be the optimal classification model.
Acknowledgements
We would like to thank all the members of the Medical Image Computing Group of the Institute of Automation of the Chinese Academy of Sciences for their sincere help and support. We are also grateful to the reviewers for their advice. This work was supported by the National Key Basic Research and Development Program (1973), Grant No. 2007CB512304.