147
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Optimal selection of learning data for highly accurate QSAR prediction of chemical biodegradability: a machine learning-based approach

ORCID Icon, , & ORCID Icon
Pages 729-743 | Received 13 Jun 2023, Accepted 19 Aug 2023, Published online: 07 Sep 2023

References

  • OECD guideline for testing of chemicals, 1992. Available at http://www.oecd.org/chemicalsafety/risk-assessment/1948209.pdf.
  • J. Tunkel, P.H. Howard, R.S. Boethling, W. Stiteler, and H. Loonen, Predicting ready biodegradability in the Japanese ministry of international trade and industry test, Environ. Toxicol. Chem. 19 (2000), pp. 2478–2485. doi:10.1002/etc.5620191013.
  • N.H. Dimitrova, I.A. Dermen, N.D. Todorova, K.G. Vasilev, S.D. Dimitrov, O.G. Mekenyan, Y. Ikenaga, T. Aoyagi, Y. Zaitsu, and C. Hamaguchi, CATALOGIC 301C model – Validation and improvement, SAR QSAR Environ. Res. 28 (2017), pp. 511–524. doi:10.1080/1062936X.2017.1343255.
  • J. Jaworska, S. Dimitrov, N. Nikolova, and Mekenyan, Probabilistic assessment of biodegradability based on metabolic pathways: CATABOL system, SAR QSAR Environ. Res. 13 (2002), pp. 307–323. doi:10.1080/10629360290002794.
  • S. Dimitrov, T. Pavlov, N. Dimitrova, D. Georgieva, D. Nedelcheva, A. Kesova, R. Vasilev, and O. Mekenyan, Simulation of chemical metabolism for fate and hazard assessment. II CATALOGIC simulation of abiotic and microbial degradation, SAR QSAR Environ. Res. 22 (2011), pp. 719–755. doi:10.1080/1062936X.2011.623322.
  • K. Huang and H. Zhang, Classification and regression machine learning models for predicting aerobic ready and inherent biodegradation of organic chemicals in water, Environ. Sci. Technol. 56 (2022), pp. 12755–12764. doi:10.1021/acs.est.2c01764.
  • National Institute of Technology and Evaluation. Available at https://www.nite.go.jp/.
  • D. Weininger, SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules, J. Chem. Inf. Comput. Sci. 28 (1988), pp. 31–36. doi:10.1021/ci00057a005.
  • Japan Chemicals Collaborative Knowledge Database. Available at https://www.nite.go.jp/chem/jcheck/search.action?request_locale=en.
  • H. Moriwaki, Y. Tian, N. Kawashita, and T. Takagi, Mordred: A molecular descriptor calculator, J. Cheminform. 10 (2018), pp. 1–14. doi:10.1186/s13321-018-0258-y.
  • G. Landrum, P. Tosco, B. Kelley, S. Riniker, P. Gedeck, N. Schneider, R. Vianello, A. Dalke, R.R. Schmidt, B. Cole, A. Savelyev, S. Turk, M. Swain, A. Vaucher, D. Nealschneider, M. Wojcikowski, A. Pahl, J.P. Ebejer, F. Berenger, A. Stretton, N. O’Boyle, D. Cosgrove, P. Fuller, J.H. Jensen, G. Sforna, K. Leswing, S. Leung, and J. vanSanten, RDKit: 2019 03 4 (Q1 2019) Release (2019); software available at https://rdkit.org/.
  • Alvascience, alvaDesc (software for molecular descriptors calculation) version, 2.0.12, 2022; software available at https://www.alvascience.com.
  • P. Ambure, A.G. Skretna, M.N.D.S. Cordeiro, and K. Roy, New workflow for QSAR model development from small data set, J. Chem. Inf. Model. 59 (2019), pp. 4070–4076. doi:10.1021/acs.jcim.9b00476.
  • M. Eklund, U. Norinder, S. Boyer, and L. Carlsson, Choosing feature selection and learning algorithms in QSAR, J. Chem. Inf. Model. 54 (2014), pp. 837–843. doi:10.1021/ci400573c.
  • D.W. Salt, L. Maccari, M. Botta, and M.G. Ford, Variable selection and specification of robust QSAR models from multicollinear data, J. Comput. Aided Mol. Des. 18 (2004), pp. 495–509. doi:10.1007/s10822-004-5203-7.
  • K. Takeda and K. Kimbara, Development of estimating algorithm for biodegradation of chemicals using clustering and learning algorithm, Proceedings of the 14th International Symposium on Process Systems Engineering, Kyoto, 2022.
  • E.G. Hinton, Connectionist learning procedures, Artif. Intell. 40 (1989), pp. 185–234. doi:10.1016/0004-3702(89)90049-0.
  • S.J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky, An interior-point method for large-scale L1-regularized least squares, IEEE J. Sel. Top. Signal Process. 1 (2007), pp. 606–617. doi:10.1109/JSTSP.2007.910971.
  • J. Platt, Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods, Advances in Large-Margin Classifires, MIT Press, MA, 2000.
  • T. Chen and C. Guestrin, XGBOOST: A scalable tree boosting system, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, CA, 2016.
  • S.M. Ludberg and S.I. Lee, A unified approach to interpreting model predictions, Proceedings of the 31st Conference on Neural Information Processing Systems, CA, 2017.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.