Identifying High-Dimensional Biomarkers for Personalized Medicine via Variable Importance Ranking

Songjoon Baek Division of Personalized Nutrition and Medicine–Biometry Branch, National Center for Toxicological Research, FDA, Jefferson, Arkansas, USA

Hojin Moon Department of Mathematics and Statistics, California State University, Long Beach, California, USACorrespondence[email protected]

Hongshik Ahn Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA

Ralph L. Kodell Department of Biostatistics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA

Chien-Ju Lin Division of Personalized Nutrition and Medicine–Biometry Branch, National Center for Toxicological Research, FDA, Jefferson, Arkansas, USA

James J. Chen Division of Personalized Nutrition and Medicine–Biometry Branch, National Center for Toxicological Research, FDA, Jefferson, Arkansas, USA

Abstract

We apply robust classification algorithms to high-dimensional genomic data to find biomarkers, by analyzing variable importance, that enable a better diagnosis of disease, an earlier intervention, or a more effective assignment of therapies. The goal is to use variable importance ranking to isolate a set of important genes that can be used to classify life-threatening diseases with respect to prognosis or type to maximize efficacy or minimize toxicity in personalized treatment of such diseases. A ranking method and present several other methods to select a set of important genes to use as genomic biomarkers is proposed, and the performance of the selection procedures in patient classification by cross-validation is evaluated. The various selection algorithms are applied to published high-dimensional genomic data sets using several well-known classification methods. For each data set, a set of genes selected on the basis of variable importance that performed the best in classification is reported. That classification algorithm with the proposed ranking method is shown to be competitive with other selection methods for discovering genomic biomarkers underlying both adverse and efficacious outcomes for improving individualized treatment of patients for life-threatening diseases.

Key Words:

ACKNOWLEDGMENTS

Hojin Moon's research was partially supported by the Scholarly and Creative Activities Committee (SCAC) Award from CSULB. Hongshik Ahn's research was partially supported by the Faculty Research Participation Program at the NCTR administered by the Oak Ridge Institute for Science and Education through an interagency agreement between USDOE and USFDA.

Notes

¹ k∗ = 1 with CERPWFM, CERPMDI, RFMDA, RFMDI, SVMRFE, BW; k∗ = 3 with the t-test. Values in boldface indicate lymphoma data.

¹ k∗ = 1 with SVMRFE; k = 3∗ with CERPWFM, CERPMDI, RFMDA, RFMDI; k∗ = 5 with BW, the t-test.

Values in boldface indicate pediatric AML data.

Note: Since the selected genes from the t-test and the BW ratio are the same, only the t-test is reported. DLDA classification algorithm is used for illustration. PPV and NPV stand for positive and negative predictive values, respectively.

T = Set of genes selected by the t-test; C = set of genes selected by CERP; T ∩ C = common set of genes selected by the t-test and CERP; T ∪ C = combined set of genes selected by the t-test or CERP; (T − C) ∪ (C − T) = combined mutually exclusive set of genes selected by the t-test or CERP.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Identifying High-Dimensional Biomarkers for Personalized Medicine via Variable Importance Ranking

Related Research Data

Information for

Open access

Opportunities

Help and information

Identifying High-Dimensional Biomarkers for Personalized Medicine via Variable Importance Ranking

Abstract

ACKNOWLEDGMENTS

Notes

Reprints and Corporate Permissions

Academic Permissions

Related Research Data

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature