Abstract
To meet the requirements of providing accurate, robust, and interpretable prediction of bioactivity, a modified uncorrelated linear discriminant analysis (M-ULDA) model was developed. In addition, a feature selection method called recursive feature elimination (RFE), originally used for support vector machine (SVM), was introduced and modified to fit the scheme of ULDA. From the evaluation of six pharmaceutical datasets, the M-UDLA coupled with RFE showed better or comparable classification accuracy with respect to other well-studied methods such as SVM and decision trees. The RFE used for ULDA has the advantage of increasing the computational speed and provides useful insights into biochemical mechanisms related to pharmaceutical activity by significantly reducing the number of variables used for the final model.
Acknowledgements
This work was financially supported by the National Nature Foundation Committee of the People's Republic of China (Grant No. 20875104) and the International Cooperation Project on Traditional Chinese Medicines of Ministry of Science and Technology of China (Grant Nos 2006DFA41090 and 2007DFA40680). The studies met with the approval of the university's review board. We are grateful to all employees of our institute for their encouragement and support of this research.