123
Views
29
CrossRef citations to date
0
Altmetric
Original Articles

Prediction of chemical toxicity with local support vector regression and activity-specific kernels

&
Pages 413-431 | Received 28 Apr 2008, Accepted 21 Jul 2008, Published online: 04 Dec 2010
 

Abstract

We propose a new kernel, based on 2-D structural chemical similarity, that integrates activity-specific information from the training data, and a new approach to applicability domain estimation that takes feature significances and activity distributions into consideration. The new kernel provides superior results than the well-established Tanimoto kernel, and activity-sensitive feature selection enhances prediction quality. Validation of local support vector regression models based on this kernel has been preformed with three publicly available datasets from the DSSTox project. One of them (Fathead Minnow Acute Toxicity) has been already modelled by other groups, and serves as a benchmark dataset, the other two (Maximum Recommended Therapeutic Dose, IRIS Lifetime Cancer Risk) have been modelled for the first time according to the knowledge of the authors. For all three models predictive accuracies increase with the prediction confidences that indicate the applicability domain. Depending on the confidence cutoff for acceptable predictions we were able to achieve > 90% predictions within 1 log unit of the experimental data for all datasets.

Acknowledgements

Financial support for this work was provided by Nestec Ltd. We would like to thank P. Mazzatorta (Nestec) for discussions about the FDAMDD dataset, A.M. Richard (U.S. EPA) for providing the DSSTox datasets, and Alexandros Karatzoglou (Vienna University of Technology) for advice and discussion on the kernlab package Citation13, Citation28. We appreciate the work of the free software projects gathered in the Blue Obelisk group Citation26.

Notes

Notes

1. For qualitative activities (classification), the chi-square test can be used.

2. The sample size for the test was not restricted.

3. Activity is commonly measured in quantities needed to obtain a toxicological effect, so higher values mean lower activity.

4. Unknown fragments are also weighted with 0.

5. The median is used rather than the mean to reduce the sensitivity to eventual skew in the data.

6. The rmse and q 2 value of 0.92 at significance threshold 0.5 for the significance-weighted kernel results from a single severely mispredicted data point that did not occur for the other values. Leaving it out gives a value of 0.74.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.