187
Views
7
CrossRef citations to date
0
Altmetric
Articles

A novel approach to generate robust classification models to predict developmental toxicity from imbalanced datasets

&
Pages 711-727 | Received 06 Feb 2014, Accepted 29 Apr 2014, Published online: 07 Aug 2014

References

  • UN Economic Commission for Europe, Globally Harmonized System of Classification and Labelling of Chemicals (GHS), 3rd revised edition, United Nations, New York, 2009.
  • Regulation of (EC) No. 1907/2006 of the European Parliament and of the Council, December 2006 concerning the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH), establishing a European Chemicals Agency, amending Directive 1999/45/EC and repealing Council Regulation (EEC) No 793/93 and Commission Regulation (EC) No 1488/94 as well as Council Directive 76/769/EEC and Commission Directives 91/155/EEC, 93/67/EEC, 93/105/EC and 2000/21/EC, Off. J. Eur. Union, L396 (2007), pp. 1–849.
  • US Environmental Protection Agency, Laws & Regulations, http://www2.epa.gov/laws-regulations.
  • ReProTect, http://www.reprotect.eu/.
  • Council Regulation (EEC) No. 793/93 of 23 March 1993 on the evaluation and control of the risks of existing substances, Off. J. Eur. Commun. L84 (1993). pp. 1–15.
  • M. Panigel, Placental perfusion experiments, Am. J. Obstet. Gynecol. 84 (1962), pp. 1664–1683.
  • T.I. Ala-Kokko, P. Myllynen, and K. Vahakangas, Ex vivo perfusion of the human placental cotyledon: Implications for anesthetic pharmacology, Int. J. Obstet. Anesth 9 (2000), pp. 26–38.10.1054/ijoa.1999.0312
  • P. Pienimaki, A.L. Hartikainen, P. Arvela, T. Partanen, R. Herva, O. Pelkonen, and K. Vahakangas, Carbamazepine and its metabolites in human perfused placenta and in maternal and cord blood, Epilepsia 36 (1995), pp. 241–248.10.1111/epi.1995.36.issue-3
  • H. Schneider, M. Panigel, and J. Dancis, Transfer across the perfused human placenta of antipyrine, sodium and leucine, Am. J. Obstet. Gynecol. 114 (1972), pp. 822–828.
  • AltTox.org, Reproductive & Developmental Toxicity: The Way Forward, http://www.alttox.org/ttrc/toxicity-tests/repro-dev-tox/way-forward/.
  • H. Spielmann, Predicting the risk of developmental toxicity from in vitro assays, Toxicol. Appl. Pharmacol 207 (2005), pp. 375–380.10.1016/j.taap.2005.01.049
  • A. Cassano, A. Manganaro, T. Martin, D. Young, N. Piclin, M. Pintore, D. Bigoni, and E. Benfenati, CAESAR models for developmental toxicity, Chem. Central J. S4 (2010), pp. 1–11.
  • J. Devillers, A. Chezeau, and E. Thybaud, PLS-QSAR of the adult and developmental toxicity of chemicals to Hydra attenuata, SAR QSAR Environ. Res. 13 (2002), pp. 705–712.10.1080/1062936021000043445
  • V.C. Arena, N.B. Sussman, S. Mazumdar, S. Yu, and O.T. Macina, The utility of structure-activity relationship (SAR) models for prediction and covariate selection in developmental toxicity: Comparative analysis of logistic regression and decision tree models, SAR QSAR Environ. Res. 15 (2004), pp. 1–18.10.1080/1062936032000169633
  • N.B. Sussman, V.C. Arena, S. Yu, S. Mazumdar, and B.P. Thampatty, Decision tree SAR models for developmental toxicity based on an FDA/TERIS database, SAR QSAR Environ. Res 14 (2003), pp. 83–96.10.1080/1062936031000073126
  • DEREK, http://www.lhasalimited.org/products/derek-nexus.htm.
  • HazardExpert, http://www.compudrug.com/hazardexpertpro.
  • OncoLogic, http://www.epa.gov/oppt/sf/pubs/oncologic.htm.
  • ToxMatch, http://ihcp.jrc.ec.europa.eu/our_labs/predictive_toxicology/qsar_tools/toxmatch.
  • M. Hewitt, C.M. Ellison, S.J. Enoch, J.C. Madden, and M.T.D. Cronin, Integrating (Q)SAR models, expert systems and read-across approaches for the prediction of developmental toxicity, Reprod. Toxicol 30 (2010), pp. 147–160.10.1016/j.reprotox.2009.12.003
  • E. Julien, C.C. Willhite, A.M. Richard, and J.M. DeSesso, Challenges in constructing statistically based structure-activity relationship models for developmental toxicity, Birth Defects Res. (Part A) 70 (2004), pp. 902–911.10.1002/(ISSN)1542-0760
  • R.J. Kavlock, Structure-activity approaches in the screening of environmental agents for developmental toxicity, Reprod. Toxicol 7 (1993), pp. S113–S116.10.1016/0890-6238(93)90076-J
  • J.M. DeSesso and S.B. Harris, 1996. Principles underlying developmental toxicology, in Toxicology and Risk Assessment, A.M. Fan and L.W. Chang, eds., Marcel Dekker, New York, 1996, pp. 37–56.
  • M. Novič and M. Vračko, QSAR models for reproductive toxicity and endocrine disruption activity, Molecules 15 (2010), pp. 1987–1999.10.3390/molecules15031987
  • T.W. Schultz and D.A. Dawson, 1990. Structure-activity relationships for teratogenicity and developmental toxicity, in Practical Applications of Quantitative Structure-Activity Relationships (QSAR) in Environmental Chemistry and Toxicology, W. Karcher and J. Devillers, eds., Kluwer Academic Publishers, Dordrecht, Netherlands, 1990, pp. 389–409.
  • Developmental Toxicity, http://www.caesar-project.eu/index.php?page=results&section=endpoint&ne=5.
  • N. Japkowicz and S. Stephen, The class imbalance problem: A systematic study, Intell. Data Anal. 6 (2002), pp. 429–449.
  • N. Japkowicz, Class imbalances: Are we focusing on the right issue? In Workshop on Learning from Imbalanced Data Sets II, N. Chawla, N. Japkowicz, and A. Kolcz, eds., Washington DC, 2000, pp. 1–7.
  • N. Japkowicz, C. Mayers, and M. Gluck, A novelty detection approach to classification, in Proceedings of the 14th Joint Conference on Artificial Intelligence, Montreal, 20–25 August 1995, Morgan Kaufman, CA, USA, 1995Morgan Kaufman, CA, USA, 1995, , pp. 518–523.
  • N. Japkowicz, The class imbalance problem: Significance and strategies, in Proceedings of the International Conference on Artificial Intelligence (ICAI) 2000, pp. 111–117.
  • N.V. Chawla, K.W. Bowyer, L.O. Hall, and W.P. Kegelmeyer, SMOTE: Synthetic minority over-sampling technique, J. Artif. Intell. Res. 16 (2002), pp. 321–357.
  • N. Chawla, A. Lazarevik, L.O. Hall, and K.W. Bowyer, 2003, Smoteboost: Improving prediction of the minority class in boosting, in Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), N. Lavrac, D. Gamberger, H. Blockeel, and L. Todrovski, eds., Springer Berlin, 2003, pp. 107–119.
  • N. Chawla, N. Japkowicz, and A. Kolcz, Editorial: Special issue on learning from imbalanced datasets, Sigkdd Explor 6 (2004), pp. 1–6.10.1145/1007730
  • M. Kubat and S. Matwin, Addressing the curse of imbalanced training set: One-sided selection, in Proceedings of the 14th International Conference on Machine Learning, Fisher, ed., Morgan Kaufman, 1997, pp. 179–186.
  • G.E.A.P.A. Batista, R.C. Prati, and M.C. Monard, A study of the behavior of several methods for balancing machine learning training data, Sigkdd Explor. 6 (2004), pp. 20–29.
  • S. Visa and A. Ralescu, Issues in mining imbalanced datasets: A review paper, in Proceedings of the 16th Midwest Artificial Intelligence and Cognitive Science Conference, Dayton, USA, 2005, pp. 67–73.
  • Z.H. Zhou, and X.Y. Liu, Training cost-sensitive neural networks with methods addressing the class imbalance problem, IEEE Trans. Knowl. Data Eng. (2006), pp. 63–77.10.1109/TKDE.2006.17
  • C.X. Ling and C. Li, Decision Trees with Minimal Costs, Proceedings of the 21st International conference on Machine Learning, Canada, Alberta, ACM, New York, 2005, pp. 69–76.
  • J. Leskovec and J. Shawe-Taylor, Linear Programming Boosting for Uneven Datasets, Proceedings of 20th International Conference on Machine Learning, in Proceedings of 20th International Conference of Machine Learning, T. Fawcett and N. Mishra, eds., AAAI Press, CA, USA, pp. 456–463.
  • S. Wang and X. Yao, Diversity analysis on imbalanced data sets by using ensemble models, in Proceedings of IEEE Symposium on Computational Intelligence and Data Mining, IEEE Catalog No. CFD091DM, TN, USA, 2009, pp. 324–331.
  • A. Liu, J. Ghosh, and C. Martin, Generative Over-sampling for Mining Imbalanced Datasets, in Proceedings of International Conference on Data Mining, R. Stahlbock, S.F. Crone, and S. Lersmann, eds., CSREA Press, USA, 2007, pp. 66–72.
  • Y. Tang, Y.-Q. Zhang, N.V. Chawla, and S. Krasser, SVMs modeling for highly imbalanced classification, J. LaTex Class Files 1 (2002), pp. 1–11.
  • H. Han, W.-Y. Wang, and B.-H. Mao, Borderline-smote: A new over sampling method in imbalanced datasets learning, in Advances in Intelligent Computing, D.S. Huang, X.-P. Zhang, G.-B. Huang, eds., Springer, Berlin, 2005, pp. 878–887.
  • G.C. Briggs, R K. Freeman, and S.J. Yaffe, Drugs in Pregnancy and Lactation, 6th edition, Lippincott, Williams and Wilkins, Philadelphia, 2002.
  • The Nmitli-biosuite Team, BioSuite: A comprehensive bioinformatics software package (a unique industry–academia collaboration), Curr. Sci. 92 (2007), pp. 29–38.
  • J.H. Holland, Adaptation in Natural and Artificial Systems, MIT Press, Cambridge, MA, 1975.
  • D.E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, Reading, MA, 1989.
  • L. Davis, Handbook of Genetic Algorithms, Van Nostrand Reinhold, New York, 1991.
  • T.M. Mitchell, Machine Learning, McGraw Hill, New York, 1997.
  • W. Banzhaf, P. Nordin, R.E. Keller, and F.D. Francone, Genetic Programming: An Introduction, Morgan Kauffman Publishers, San Francisco, 1998.10.1007/BFb0055923
  • V.N. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, 1995.10.1007/978-1-4757-2440-0
  • Y. Xue, Z.R. Li, C.W. Yap, L.Z. Sun, X. Chen, and Y.Z. Chen, Effect of molecular descriptor feature selection in support vector machine classification of pharmacokinetic and toxicological properties of chemical agents, J. Chem. Inf. Comput. Sci 44 (2004), pp. 1630–1638.10.1021/ci049869h
  • C.X. Xue, R.S. Zhang, M.C. Liu, Z.D. Hu, and B.T. Fan, Study of the quantitative structure-mobility relationship of carboxylic acids in capillary electrophoresis based on support vector machines, J. Chem. Inf. Comput. Sci 44 (2004), pp. 950–957.10.1021/ci034280o
  • C.X. Xue, R.S. Zhang, H.X. Liu, M.C. Liu, Z.D. Hu, and B.T. Fan, Support vector machines-based quantitative structure-property relationship for the prediction of heat capacity, J. Chem. Inf. Comput. Sci 44 (2004), pp. 1267–1274.10.1021/ci049934n
  • J. Huang, G. Ma, I. Muhammad, and Y. Cheng, Identifying P-glycoprotein substrates using a support vector machine optimized by a particle swarm, J. Chem. Inf. Model., 47 (2007), pp. 1638–1647.10.1021/ci700083n
  • L.-J. Tang, Y.-P. Zhou, J.-H. Jiang, H.-Y. Zou, H.-L. Wu, G.-L. Shen, and R.-Q. Yu, Radial basis function network-based transform for a nonlinear support vector machine as optimized by a particle swarm optimization algorithm with application to QSAR studies, J. Chem. Inf. Model 47 (2007), pp. 1438–1445.10.1021/ci700047x
  • M.K. Warmuth, J. Liao, G. Ratsch, M. Mathieson, S. Putta, and C. Lemmen, Active learning with support vector machines in the drug discovery process, J. Chem. Inf. Comput. Sci. 43 (2003), pp. 667–673.10.1021/ci025620t
  • S. Bhavani, A. Nagargadde, A. Thawani, V. Sridhar, and N. Chandra, Substructure-based support vector machine classifiers for prediction of adverse effects in diverse classes of drugs, J. Chem. Inf. Comput. Sci. 46 (2006), pp. 2478–2486.10.1021/ci060128l
  • C.J.C. Burges, A tutorial on support vector machines for pattern recognition, Data Min. Knowl. Disc. 2 (1998), pp. 127–167.
  • J. Weston, F. Perez-Cruz, O. Bousquet, O. Chapelle, A. Elisseeff, and B. Scholkopf, Feature selection and transduction for prediction of molecular bioactivity for drug design, Bioinformatics 19 (2003), pp. 764–771.10.1093/bioinformatics/btg054
  • P. de Cerqueira Lima, A. Golbraikh, S. Oloff, Y. Xiao, and A. Tropsha, Combinatorial QSAR modeling of P-glycoprotein substrates, J. Chem. Inf. Model 46 (2006), pp. 1245–1254.10.1021/ci0504317
  • A. Kovatcheva, A. Golbraikh, S. Oloff, Y.D. Xiao, W. Zheng, P. Wolschann, G. Buchbauer, and A. Tropsha, Combinatorial QSAR of ambergris fragrance compounds, J. Chem. Inf. Comput. Sci. 44 (2004), pp. 582–595.10.1021/ci034203t
  • A. Tropsha, P. Gramatica, and V.K. Gombar, The importance of being earnest: Validation is the absolute Essential for successful application and interpretation of QSAR Models, QSAR Combinat. Sci 22 (2003), pp. 69–72.10.1002/(ISSN)1611-0218
  • P. Gramatica, P. Pilutti, and E. Papa, Validated QSAR prediction of OH tropospheric degradation of VOCs: Splitting into training–test sets and consensus modeling, J. Chem. Inf. Comp. Sci 44 (2004), pp. 1794–1802.10.1021/ci049923u
  • V. Consonni, D. Ballabio, and R. Todeschini, Comments on the definition of the Q2 parameter for QSAR validation, J. Chem. Inf. Model. 49 (2009), pp. 1669–1678.10.1021/ci900115y
  • The Comprehensive R Archive Network, http://cran.r-project.org/.
  • X.-W. Chen, B. Gerlach, and D. Casasent, Pruning support vectors for imbalanced data classification, in Proceedings of International Joint Conference on Neural Networks, Piscataway, IEEE operation Centre, NJ, 2005, pp. 1883–1888.
  • R. Akbani, S. Kwek, and N. Japkowicz, Applying support vector machines to imbalanced datasets, in Proceedings of the 15th European Conference on Machine Learning, J.F. Boulicant, F. Esposito, F. Giannotti, and D. Pedreschi, eds., Springer, New York, 2004, pp. 39–50.
  • Y. Tang, Y.-Q. Zhang, N.V. Chawla, and S. Krasser, SVMs modeling for highly imbalanced classification, J. LaTex Class Files 1 (2002), pp. 1–9.
  • J.J. Sutherland and D.F. Weaver, Development of quantitative structure-activity relationships and classification models for anticonvulsant activity of hydantoin analogues, J. Chem. Inf. Comput. Sci. 43 (2003), pp. 1028–1036.10.1021/ci025639w

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.