293
Views
12
CrossRef citations to date
0
Altmetric
Original Articles

ENSEMBLING REGRESSION MODELS TO IMPROVE THEIR PREDICTIVITY: A CASE STUDY IN QSAR (QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIPS) WITH COMPUTATIONAL CHEMOMETRICS

, &
Pages 261-281 | Published online: 12 Mar 2009
 

Abstract

The last several years have seen an increasing emphasis on mathematical models, both based on statistics and on machine-learning. Today Bayesian nets, neural nets, support vector machines (SVM), and induction trees, are commonly used in the analysis of scientific data. Moreover, a recent emphasis in the modelling community is on improving the performance of classifiers through ensembling more different and accurate models in order to reduce the prediction error. Ensembling in fact is a way of taking advantage of good models that make errors in different parts of the data space. We will outline the developments in model construction and evaluation through those techniques justify their use and propose some quantitative structure activity relationships (QSAR) and models based on ensembling. The models presented here are in the area of predicting acute toxicity for the purpose of regulatory systems. The emphasis is on the better performances of ensembles, since the general goal of delivering usable QSAR models requires others that are out of the scope of this article.

We kindly acknowledge the EU projects ION for providing financial support, and Demetra for supporting Tushar Garg during his stage in Milan. Special thanks to the group of Istituto Mario Negri, Milano, and to BCX (France) for the preparation of data set.

Notes

The test set is composed of compounds with the following IDs: 32, 46, 48, 54, 75, 176, 194, 216, 277, 282, 347, 361, 385, 391, 423, 425, 431, 434, 439.

Subdivision of the data in training and test data is based on toxicity values only, and was done this way to obtain similar distributed data sets:

1. Sort toxicity values y = − Log(Toxicity [mmol/kg]).

2. 1 of 6 compounds of the sorted toxicity list is selected for the test set.

www.demetra-tox.net

www.cs.waikato.ac.nz/ml/weka

www.vcclab.org

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.