123
Views
29
CrossRef citations to date
0
Altmetric
Original Articles

Prediction of chemical toxicity with local support vector regression and activity-specific kernels

&
Pages 413-431 | Received 28 Apr 2008, Accepted 21 Jul 2008, Published online: 04 Dec 2010
 

Abstract

We propose a new kernel, based on 2-D structural chemical similarity, that integrates activity-specific information from the training data, and a new approach to applicability domain estimation that takes feature significances and activity distributions into consideration. The new kernel provides superior results than the well-established Tanimoto kernel, and activity-sensitive feature selection enhances prediction quality. Validation of local support vector regression models based on this kernel has been preformed with three publicly available datasets from the DSSTox project. One of them (Fathead Minnow Acute Toxicity) has been already modelled by other groups, and serves as a benchmark dataset, the other two (Maximum Recommended Therapeutic Dose, IRIS Lifetime Cancer Risk) have been modelled for the first time according to the knowledge of the authors. For all three models predictive accuracies increase with the prediction confidences that indicate the applicability domain. Depending on the confidence cutoff for acceptable predictions we were able to achieve > 90% predictions within 1 log unit of the experimental data for all datasets.

Acknowledgements

Financial support for this work was provided by Nestec Ltd. We would like to thank P. Mazzatorta (Nestec) for discussions about the FDAMDD dataset, A.M. Richard (U.S. EPA) for providing the DSSTox datasets, and Alexandros Karatzoglou (Vienna University of Technology) for advice and discussion on the kernlab package Citation13, Citation28. We appreciate the work of the free software projects gathered in the Blue Obelisk group Citation26.

Notes

Notes

1. For qualitative activities (classification), the chi-square test can be used.

2. The sample size for the test was not restricted.

3. Activity is commonly measured in quantities needed to obtain a toxicological effect, so higher values mean lower activity.

4. Unknown fragments are also weighted with 0.

5. The median is used rather than the mean to reduce the sensitivity to eventual skew in the data.

6. The rmse and q 2 value of 0.92 at significance threshold 0.5 for the significance-weighted kernel results from a single severely mispredicted data point that did not occur for the other values. Leaving it out gives a value of 0.74.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 543.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.