44
Views
8
CrossRef citations to date
0
Altmetric
Research Article

Gaining Confidence on Molecular Classification through Consensus Modeling and Validation

, , , , , , , & show all
Pages 59-68 | Received 25 Jul 2005, Accepted 02 Nov 2005, Published online: 09 Oct 2008
 

Abstract

Current advances in genomics, proteomics, and metabonomics would result in a constellation of benefits in human health. Classification applying supervised learning methods to omics data as one of the molecular classification approaches has enjoyed its growing role in clinical application. However, the utility of a molecular classifier will not be fully appreciated unless its quality is carefully validated. A clinical omics data is usually noisy with the number of independent variables far more than the number of subjects and, possibly, with a skewed subject distribution. Given that, the consensus approach holds an advantage over a single classifier. Thus, the focus of this review is mainly placed on how validating a molecular classifier using Decision Forest (DF), a robust consensus approach. We recommended that a molecular classifier has to be assessed with respect to overall prediction accuracy, prediction confidence and chance correlation, which can be readily achieved in DF. The commonalities and differences between external validation and cross-validation are also discussed for perspective use of these methods to validate a DF classifier. In addition, the advantages of using consensus approaches for identification of potential biomarkers are also rationalized. Although specific DF examples are used in this review, the provided rationales and recommendations should be equally applicable to other consensus methods.

ABBREVIATIONS AND GLOSSARIES
DF=

Decision Forest

DT=

Decision Tree

HC=

High confidence

LC=

Low confidence

SELDI-TOF MS=

Surface Enhanced Laser Deposition/Ionization Time-Of-Flight Mass Spectrometry

SRBCTs=

Small Round Blue Cell Tumors

Subjects and samples=

these two terms are used interchangeably for the patients and healthy individuals

Training set=

A group of subjects is used to develop a classifier

Test set=

A set of subjects is used to challenge a classifier, which is not included in the training set

ABBREVIATIONS AND GLOSSARIES
DF=

Decision Forest

DT=

Decision Tree

HC=

High confidence

LC=

Low confidence

SELDI-TOF MS=

Surface Enhanced Laser Deposition/Ionization Time-Of-Flight Mass Spectrometry

SRBCTs=

Small Round Blue Cell Tumors

Subjects and samples=

these two terms are used interchangeably for the patients and healthy individuals

Training set=

A group of subjects is used to develop a classifier

Test set=

A set of subjects is used to challenge a classifier, which is not included in the training set

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.