44
Views
8
CrossRef citations to date
0
Altmetric
Research Article

Gaining Confidence on Molecular Classification through Consensus Modeling and Validation

, , , , , , , & show all
Pages 59-68 | Received 25 Jul 2005, Accepted 02 Nov 2005, Published online: 09 Oct 2008
 

Abstract

Current advances in genomics, proteomics, and metabonomics would result in a constellation of benefits in human health. Classification applying supervised learning methods to omics data as one of the molecular classification approaches has enjoyed its growing role in clinical application. However, the utility of a molecular classifier will not be fully appreciated unless its quality is carefully validated. A clinical omics data is usually noisy with the number of independent variables far more than the number of subjects and, possibly, with a skewed subject distribution. Given that, the consensus approach holds an advantage over a single classifier. Thus, the focus of this review is mainly placed on how validating a molecular classifier using Decision Forest (DF), a robust consensus approach. We recommended that a molecular classifier has to be assessed with respect to overall prediction accuracy, prediction confidence and chance correlation, which can be readily achieved in DF. The commonalities and differences between external validation and cross-validation are also discussed for perspective use of these methods to validate a DF classifier. In addition, the advantages of using consensus approaches for identification of potential biomarkers are also rationalized. Although specific DF examples are used in this review, the provided rationales and recommendations should be equally applicable to other consensus methods.

ABBREVIATIONS AND GLOSSARIES
DF=

Decision Forest

DT=

Decision Tree

HC=

High confidence

LC=

Low confidence

SELDI-TOF MS=

Surface Enhanced Laser Deposition/Ionization Time-Of-Flight Mass Spectrometry

SRBCTs=

Small Round Blue Cell Tumors

Subjects and samples=

these two terms are used interchangeably for the patients and healthy individuals

Training set=

A group of subjects is used to develop a classifier

Test set=

A set of subjects is used to challenge a classifier, which is not included in the training set

ABBREVIATIONS AND GLOSSARIES
DF=

Decision Forest

DT=

Decision Tree

HC=

High confidence

LC=

Low confidence

SELDI-TOF MS=

Surface Enhanced Laser Deposition/Ionization Time-Of-Flight Mass Spectrometry

SRBCTs=

Small Round Blue Cell Tumors

Subjects and samples=

these two terms are used interchangeably for the patients and healthy individuals

Training set=

A group of subjects is used to develop a classifier

Test set=

A set of subjects is used to challenge a classifier, which is not included in the training set

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 65.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 704.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.