125
Views
7
CrossRef citations to date
0
Altmetric
Research Article

Data-driven analysis approach for biomarker discovery using molecular-profiling technologies

, , , , , , , , , , , , , & show all
Pages 153-172 | Received 08 Nov 2004, Published online: 08 Oct 2008
 

Abstract

High-throughput molecular-profiling technologies provide rapid, efficient and systematic approaches to search for biomarkers. Supervised learning algorithms are naturally suited to analyse a large amount of data generated using these technologies in biomarker discovery efforts. The study demonstrates with two examples a data-driven analysis approach to analysis of large complicated datasets collected in high-throughput technologies in the context of biomarker discovery. The approach consists of two analytic steps: an initial unsupervised analysis to obtain accurate knowledge about sample clustering, followed by a second supervised analysis to identify a small set of putative biomarkers for further experimental characterization. By comparing the most widely applied clustering algorithms using a leukaemia DNA microarray dataset, it was established that principal component analysis-assisted projections of samples from a high-dimensional molecular feature space into a few low dimensional subspaces provides a more effective and accurate way to explore visually and identify data structures that confirm intended experimental effects based on expected group membership. A supervised analysis method, shrunken centroid algorithm, was chosen to take knowledge of sample clustering gained or confirmed by the first step of the analysis to identify a small set of molecules as candidate biomarkers for further experimentation. The approach was applied to two molecular-profiling studies. In the first study, PCA-assisted analysis of DNA microarray data revealed that discrete data structures exist in rat liver gene expression and correlated with blood clinical chemistry and liver pathological damage in response to a chemical toxicant diethylhexylphthalate, a peroxisome-proliferator-activator receptor agonist. Sixteen genes were then identified by shrunken centroid algorithm as the best candidate biomarkers for liver damage. Functional annotations of these genes revealed roles in acute phase response, lipid and fatty acid metabolism and they are functionally relevant to the observed toxicities. In the second study, 26 urine ions identified from a GC/MS spectrum, two of which were glucose fragment ions included as positive controls, showed robust changes with the development of diabetes in Zucker diabetic fatty rats. Further experiments are needed to define their chemical identities and establish functional relevancy to disease development.

Acknowledgements

The authors thank Dr Seppo J. Karrila for critical comments.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.