85
Views
4
CrossRef citations to date
0
Altmetric
Original Articles

Optimal Strategies for Sequential Validation of Significant Features from High-Dimensional Genomic Data

, , , , , & show all
Pages 447-460 | Published online: 11 Jun 2012
 

Abstract

High-dimensional genomic studies play a key role in identifying critical features that are significantly associated with a phenotypic outcome. The two most important examples are the detection of (1) differentially expressed genes from genome-wide gene expression studies and (2) single-nucleotide polymorphisms (SNPs) from genome-wide association studies. Such experiments are often associated with high noise levels, and the validity of statistical conclusions suffers from low sample size compared to large number of features. The corresponding multiple testing problem calls for the identification of optimal strategies for controlling the numbers of false discoveries and false nondiscoveries. In addition, a frequent validation problem is that features identified as important in one study are often less so in another study. Adjustment for multiple testing in both studies separately increases the risk of missing the crucial features even further. These problems can be addressed by sequential validation strategies, where only significant features identified in one study enter as candidates in the next study. The quality associated with different studies, for example, in terms of noise levels, may vary considerably. By performing simulation studies it is possible to demonstrate that the optimal order for this stepwise procedure is to sort experimental studies according to their quality in descending order. The impact of the method for multiple testing adjustment (Bonferroni-Holm, FDR) was also analyzed. Finally, the sequential validation strategy was applied to three large breast cancer studies with gene expression measurements, confirming the crucial impact of the order of the validation steps in a real-world application.

Acknowledgments

Miriam Lohr and Claudia Köllmann contributed equally to this work. The work of Miriam Lohr, Birte Hellwig, and Jörg Rahnenführer has been supported by the German Research Foundation (DFG, grant RA 870/5-1 and RA 870/4-1). Claudia Köllmann and Katja Ickstadt have been supported by the Research Training School on Statistical Modelling of the German Research Foundation. Jörg Rahnenführer and Katja Ickstadt have also been supported by the German Research Foundation within the Collaborative Research Center SFB 876 “Providing Information by Resource-Constrained Analysis,” projects A3 and C4.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 482.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.