132
Views
0
CrossRef citations to date
0
Altmetric
Review

Chemoinformatic approaches for navigating large chemical spaces

Pages 403-414 | Received 12 Dec 2023, Accepted 30 Jan 2024, Published online: 05 Feb 2024

References

  • Bohacek RS, McMartin C, Guida WC. The art and practice of structure-based drug design: a molecular modeling perspective. Med Res Rev. 1996;16(1):3–50. doi: 10.1002/(SICI)1098-1128(199601)16:1<3:AID-MED1>3.0.CO;2-6
  • Walters WP. Virtual chemical libraries. J Med Chem. 2019;62(3):1116–1124.
  • Andrianov GV, Ong WJG, Serebriiskii I, et al. Efficient hit-to-lead searching of kinase inhibitor chemical space via computational fragment merging. J Chem Inf Model. 2021;61(12):5967–5987. doi: 10.1021/acs.jcim.1c00630
  • Mendez D, Gaulton A, Bento AP, et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 2018;47(D1):D930–D940. doi: 10.1093/nar/gky1075
  • Kim S, Chen J, Cheng T, et al. Pubchem 2023 update. Nucleic Acids Res. 2022;51(D1):D1373–D1380. doi: 10.1093/nar/gkac956
  • Liu R, Li X, Lam KS. Combinatorial chemistry in drug discovery. Curr Opin Chem Biol. 2017;38:117–126. doi: 10.1016/j.cbpa.2017.03.017
  • Brenner S, Lerner RA. Encoded combinatorial chemistry. Proc Natl Acad Sci. 1992;89(12):5381–5383. doi: 10.1073/pnas.89.12.5381
  • Brunschweiger A, Young DW, editors. DNA-encoded libraries. Cham, Switzerland: Springer International Publishing; 2022.
  • Schneider G, Lee ML, Stahl M, et al. De Novo design of molecular architectures by evolutionary assembly of drug-derived building blocks. J Comput Aided Mol Des. 2000;14(5):487–494. doi: 10.1023/A:1008184403558
  • Vogt M. Using deep neural networks to explore chemical space. Expert Opin Drug Discov. 2022;17(3):297–304. doi: 10.1080/17460441.2022.2019704
  • Vogt M. Exploring chemical space — generative models and their evaluation. Artif Intell Life Sci. 2023;3:100064. doi: 10.1016/j.ailsci.2023.100064
  • Meyers J, Fabian B, Brown N. De novo molecular design and generative models. Drug Discov Today. 2021;26(11):2707–2715. doi: 10.1016/j.drudis.2021.05.019
  • Weininger D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Model. 1988;28(1):31–36. doi: 10.1021/ci00057a005
  • O’Boyle N, Dalke A. DeepSMILES: an adaptation of SMILES for use in machine-learning of chemical structures. ChemRxiv. 2018. doi: 10.26434/chemrxiv.7097960.v1
  • Krenn M, Häse F, Nigam A, et al. Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation. Mach Learn: Sci Technol. 2020;1(4):045024. doi: 10.1088/2632-2153/aba947
  • Jin W, Barzilay R, Jaakkola T Junction tree variational autoencoder for molecular graph generation. In: Dy J Krause A, editors Proceedings of the 35th International Conference on Machine Learning; (Proceedings of Machine Learning Research; Vol. 80); 10–15 Jul. PMLR; 2018. p. 2323–2332. Available from: https://proceedings.mlr.press/v80/jin18a.html.
  • You J, Liu B, Ying R, et al. Graph convolutional policy network for goal-directed molecular graph generation. In: NIPS’18: Proceedings of the 32nd International Conference on Neural Information Processing Systems; Montreal, Canada. 2018. p. 6412–6422.
  • Li Y, Zhang L, Liu Z. Multi-objective de novo drug design with conditional graph generative model. J Cheminf. 2018;10(1):33. doi: 10.1186/s13321-018-0287-6
  • Mercado R, Bjerrum EJ, Engkvist O. Exploring graph traversal algorithms in graph-based molecular generation. J Chem Inf Model. 2021;62(9):2093–2100. doi: 10.1021/acs.jcim.1c00777
  • Mercado R, Rastemo T, Lindelöf E, et al. Graph networks for molecular design. Mach Learn: Sci Technol. 2021;2(2):025023. doi: 10.1088/2632-2153/abcf91
  • Brown N, Fiscato M, Segler MH, et al. GuacaMol: Benchmarking models for de novo molecular design. J Chem Inf Model. 2019;59(3):1096–1108. doi: 10.1021/acs.jcim.8b00839
  • Polykovskiy D, Zhebrak A, Sanchez-Lengeling B, et al. Molecular sets (MOSES): A benchmarking platform for molecular generation models. Front Pharmacol. 2020;11:11. doi: 10.3389/fphar.2020.565644
  • Skinnider MA, Stacey RG, Wishart DS, et al. Chemical language models enable navigation in sparsely populated chemical space. Nat Mach Intell. 2021;3(9):759–770. doi: 10.1038/s42256-021-00368-1
  • Dunn TB, Seabra GM, Kim TD, et al. Diversity and chemical library networks of large data sets. J Chem Inf Model. 2022;62(9):2186–2201. doi: 10.1021/acs.jcim.1c01013
  • Pikalyova R, Zabolotna Y, Horvath D, et al. Chemical library space: definition and DNA-encoded library comparison study case. J Chem Inf Model. 2023;63(13):4042–4055. doi: 10.1021/acs.jcim.3c00520
  • Pikalyova R, Zabolotna Y, Horvath D, et al. Meta-GTM: visualization and analysis of the chemical library space. J Chem Inf Model. 2023;63(17):5571–5582. doi: 10.1021/acs.jcim.3c00719
  • Willett P, Barnard JM, Downs GM. Chemical similarity searching. J Chem Inf Comput Sci. 1998;38(6):983–996.
  • Maggiora G, Vogt M, Stumpfe D, et al. Molecular similarity in medicinal chemistry: Miniperspective. J Med Chem. 2014;57(8):3186–3204. doi: 10.1021/jm401411z
  • Hussain J, Rea C. Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets. J Chem Inf Model. 2010;50(3):339–348.
  • Naveja JJ, Vogt M, Stumpfe D, et al. Systematic extraction of analogue series from large compound collections using a new computational compound–core relationship method. ACS Omega. 2019;4(1):1027–1032. doi: 10.1021/acsomega.8b03390
  • Brown RD, Martin YC. Use of Structure−Activity data to compare structure-based clustering methods and descriptors for use in compound selection. J Chem Inf Comput Sci. 1996;36(3):572–584. doi: 10.1021/ci9501047
  • Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model. 2010;50(5):742–754. doi: 10.1021/ci100050t
  • Carhart RE, Smith DH, Venkataraghavan R. Atom pairs as molecular features in structure-activity studies: definition and applications. J Chem Inf Comput Sci. 1985;25(2):64–73. doi: 10.1021/ci00046a002
  • Awale M, Reymond JL. Atom pair 2D-fingerprints perceive 3D-molecular shape and pharmacophores for very fast virtual screening of zinc and GDB-17. J Chem Inf Model. 2014;54(7):1892–1907. doi: 10.1021/ci500232g
  • Capecchi A, Probst D, Reymond JL. One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome. J Cheminf. 2020;12(1). doi: 10.1186/s13321-020-00445-4
  • Hawkins PCD, Skillman AG, Nicholls A. Comparison of shape-matching and docking as virtual screening tools. J Med Chem. 2007;50(1):74–82. doi: 10.1021/jm0603365
  • Todeschini R, Consonni V, Xiang H, et al. Similarity coefficients for binary chemoinformatics data: overview and extended comparison using simulated and real data sets. J Chem Inf Model. 2012;52(11):2884–2901. doi: 10.1021/ci300261r.
  • Tversky A. Features of similarity. Psychol Rev. 1977;84(4):327–352. doi: 10.1037/0033-295X.84.4.327
  • Wu M, Vogt M, Maggiora GM, et al. Design of chemical space networks on the basis of Tversky similarity. J Comput Aided Mol Des. 2016;30(1):1–12. doi: 10.1007/s10822-015-9891-y
  • Medina-Franco JL, Chávez-Hernández AL, López-López E, et al. Chemical multiverse: an expanded view of chemical space. Mol Inf. 2022;41(11). doi: 10.1002/minf.202200116
  • Johnson MA, Maggiora GM, editors. Concepts and applications of molecular similarity. (New York): John Wiley & Sons; 1990.
  • Gilson MK, Liu T, Baitaluk M, et al. BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res. 2016;44(D1):D1045–D1053. doi: 10.1093/nar/gkv1072
  • Irwin JJ, Tang KG, Young J, et al. ZINC20 - a free ultralarge-scale chemical database for ligand discovery. J Chem Inf Model. 2020;60(12):6065–6073. doi: 10.1021/acs.jcim.0c00675
  • Gaulton A, Hersey A, Nowotka M, et al. The ChEMBL database in 2017. Nucleic Acids Res. 2017;45(D1):D945–D954. doi: 10.1093/nar/gkw1074
  • Papadatos G, Davies M, Dedman N, et al. SureChEMBL: a large-scale, chemically annotated patent document database. Nucleic Acids Res. 2016;44(D1):D1220–D1228. doi: 10.1093/nar/gkv1253
  • Pence HE, Williams A. Chemspider: an online chemical information resource. J Chem Educ. 2010;87(11):1123–1124. doi: 10.1021/ed100697w
  • Tingle BI, Tang KG, Castanon M, et al. ZINC-22 - a free multi-billion-scale database of tangible compounds for ligand discovery. J Chem Inf Model. 2023;63(4):1166–1176. doi: 10.1021/acs.jcim.2c01253
  • Fink T, Reymond JL. Virtual exploration of the chemical universe up to 11 atoms of C, N, O, F: assembly of 26.4 million structures (110.9 million stereoisomers) and analysis for new ring systems, stereochemistry, physicochemical properties, compound classes, and drug discovery. J Chem Inf Model. 2007;47(2):342–353. doi: 10.1021/ci600423u
  • Blum LC, Reymond JL. 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. J Am Chem Soc. 2009;131(25):8732–8733. doi: 10.1021/ja902302h
  • Ruddigkeit L, van Deursen R, Blum LC, et al. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J Chem Inf Model. 2012;52(11):2864–2875. doi: 10.1021/ci300415d
  • Neumann A, Marrison L, Klein R. Relevance of the trillion-sized chemical space “eXplore” as a source for drug discovery. ACS Med Chem Lett. 2023;14(4):466–472. doi: 10.1021/acsmedchemlett.3c00021
  • Anastasiu DC, Karypis G. Efficient identification of Tanimoto nearest neighbors: all-pairs similarity search using the extended Jaccard coefficient. Int J Data Sci Anal. 2017;4(3):153–172. doi: 10.1007/s41060-017-0064-z
  • Dalke A. The chemfp project. J Cheminf. 2019;11(1). doi: 10.1186/s13321-019-0398-8
  • Vogt M. Progress with modeling activity landscapes in drug discovery. Expert Opin Drug Discov. 2018;13(7):605–615. doi: 10.1080/17460441.2018.1465926
  • Maggiora G, Medina-Franco JL, Iqbal J, et al. From qualitative to quantitative analysis of activity and property landscapes. J Chem Inf Model. 2020;60(12):5873–5880. doi: 10.1021/acs.jcim.0c01249
  • Dunn TB, López-López E, Kim TD, et al. Exploring activity landscapes with extended similarity: is Tanimoto enough? Mol Inf. 2023;42(7). doi: 10.1002/minf.202300056
  • Maggiora GM, Shanmugasundaram V. Molecular similarity measures. In: Bajorath J, editor. Chemoinformatics and Computational Chemical Biology. Totowa, NJ: Humana Press; 2010. p. 39–100.
  • Maggiora GM, Bajorath J. Chemical space networks: a powerful new paradigm for the description of chemical space. J Comput Aided Mol Des. 2014;28(8):795–802. doi: 10.1007/s10822-014-9760-0
  • Zwierzyna M, Vogt M, Maggiora GM, et al. Design and characterization of chemical space networks for different compound data sets. J Comput Aided Mol Des. 2015;29(2):113–125. doi: 10.1007/s10822-014-9821-4
  • Korn M, Ehrt C, Ruggiu F, et al. Navigating large chemical spaces in early-phase drug discovery. Curr Opin Struct Biol. 2023;80:102578. doi: 10.1016/j.sbi.2023.102578
  • Hoffmann T, Gastreich M. The next level in chemical space navigation: going far beyond enumerable compound libraries. Drug Discov Today. 2019;24(5):1148–1156. doi: 10.1016/j.drudis.2019.02.013
  • Lyu J, Wang S, Balius TE, et al. Ultra-large library docking for discovering new chemotypes. Nature. 2019;566(7743):224–229. doi: 10.1038/s41586-019-0917-9
  • Ottl J, Leder L, Schaefer JV, et al. Encoded library technologies as integrated lead finding platforms for drug discovery. Molecules. 2019;24(8):1629. doi: 10.3390/molecules24081629
  • Warr WA, Nicklaus MC, Nicolaou CA, et al. Exploration of ultralarge compound collections for drug discovery. J Chem Inf Model. 2022;62(9):2021–2034. doi: 10.1021/acs.jcim.2c00224
  • Schneider G, Fechner U. Computer-based de novo design of drug-like molecules. Nat Rev Drug Discov. 2005;4(8):649–663. doi: 10.1038/nrd1799
  • Wang R, Gao Y, Lai L. LigBuilder: A multi-purpose program for structure-based drug design. J Mol Model. 2000;6(7–8):498–516. doi: 10.1007/s0089400060498
  • Chéron N, Jasty N, Shakhnovich EI. OpenGrowth: an automated and rational algorithm for finding new protein ligands. J Med Chem. 2016;59(9):4171–4188. doi: 10.1021/acs.jmedchem.5b00886
  • Kutchukian PS, Lou D, Shakhnovich EI. FOG: fragment optimized growth algorithm for the de novo generation of molecules occupying druglike chemical space. J Chem Inf Model. 2009;49(7):1630–1642. doi: 10.1021/ci9000458
  • White D, Wilson RC. Generative models for chemical structures. J Chem Inf Model. 2010;50(7):1257–1274. doi: 10.1021/ci9004089
  • Rodrigues T, Hauser N, Reker D, et al. Multidimensional de novo design reveals 5-HT2b receptor-selective ligands. Angew Chem Int Ed. 2015;54(5):1551–1555. doi: 10.1002/anie.201410201
  • Polishchuk P. CReM: chemically reasonable mutations framework for structure generation. J Cheminf. 2020;12(1). doi: 10.1186/s13321-020-00431-w
  • Hartenfeller M, Zettl H, Walter M, et al. DOGS: Reaction-driven de novo design of bioactive compounds. PLoS Comput Biol. 2012;8(2):e1002380. doi: 10.1371/journal.pcbi.1002380
  • Besnard J, Ruda GF, Setola V, et al. Automated design of ligands to polypharmacological profiles. Nature. 2012;492(7428):215–220. doi: 10.1038/nature11691
  • Goodfellow I, Bengio Y, Courville A. Deep learning. Cambridge (MA): The MIT Press; 2017.
  • Bagal V, Aggarwal R, Vinod PK, et al. MolGPT: Molecular generation using a transformer-decoder model. J Chem Inf Model. 2022;62(9):2064–2076. doi: 10.1021/acs.jcim.1c00600
  • Runcie NT, Mey AS. SILVR: guided diffusion for molecule generation. J Chem Inf Model. 2023;63(19):5996–6005. doi: 10.1021/acs.jcim.3c00667
  • Vogt M. How do we optimize chemical space navigation? Expert Opin Drug Discov. 2020;15(5):523–525. doi: 10.1080/17460441.2020.1730324
  • Kier LB. A shape index from molecular graphs. Quant Struct Act Relat. 1985;4(3):109–116. doi: 10.1002/qsar.19850040303
  • Hall LH, Kier LB.The molecular connectivity chi indexes and kappa shape indexes in structure-property modeling.In: Lipkowitz KB, Boyd DB, editors . Reviews in Computational Chemistry. New York: VCH Publishers Inc; 1991. p. 367–422.
  • Bender A, Jenkins JL, Glick M, et al. “Bayes affinity fingerprints” improve retrieval rates in virtual screening and define orthogonal bioactivity space: when are multitarget drugs a feasible concept? J Chem Inf Model. 2006;46(6):2445–2456. doi: 10.1021/ci600197y
  • Škuta C, Cortés-Ciriano I, Dehaen W, et al. QSAR-derived affinity fingerprints (part 1): fingerprint construction and modeling performance for similarity searching, bioactivity classification and scaffold hopping. J Cheminf. 2020;12(1). doi: 10.1186/s13321-020-00443-6
  • Xiong GL, Zhao Y, Liu L, et al. Computational bioactivity fingerprint similarities to navigate the discovery of novel scaffolds. J Med Chem. 2021;64(11):7544–7554. doi: 10.1021/acs.jmedchem.1c00234
  • Petrone PM, Simms B, Nigsch F, et al. Rethinking molecular similarity: comparing compounds on the basis of biological activity. ACS Chem Biol. 2012;7(8):1399–1409. doi: 10.1021/cb3001028
  • Helal KY, Maciejewski M, Gregori-Puigjané E, et al. Public domain HTS fingerprints: design and evaluation of compound bioactivity profiles from Pubchem’s bioassay repository. J Chem Inf Model. 2016;56(2):390–398. doi: 10.1021/acs.jcim.5b00498
  • Rarey M, Dixon JS. Feature trees: a new molecular similarity measure based on tree matching. J Comput Aided Mol Des. 1998;12(5):471–490. doi: 10.1023/A:1008068904628
  • Rarey M, Stahl M. Similarity searching in large combinatorial chemistry spaces. J Comput Aided Mol Des. 2001;15(6):497–520.
  • Klingler FM, Gastreich M, Grygorenko O, et al. SAR by space: enriching hit sets from the chemical space. Molecules. 2019;24(17):3096. doi: 10.3390/molecules24173096
  • Bellmann L, Penner P, Rarey M. Topological similarity search in large combinatorial fragment spaces. J Chem Inf Model. 2021;61(1):238–251. doi: 10.1021/acs.jcim.0c00850
  • Schmidt R, Klein R, Rarey M. Maximum common substructure searching in combinatorial make-on-demand compound spaces. J Chem Inf Model. 2022;62(9):2133–2150. doi: 10.1021/acs.jcim.1c00640
  • Miranda-Quintana RA, Bajusz D, Rácz A, et al. Extended similarity indices: the benefits of comparing more than two objects simultaneously. part 1: theory and characteristics. J Cheminf. 2021;13(1). doi: 10.1186/s13321-021-00505-3
  • Miranda-Quintana RA, Rácz A, Bajusz D, et al. Extended similarity indices: the benefits of comparing more than two objects simultaneously. part 2: speed, consistency, diversity selection. J Cheminf. 2021;13(1). doi: 10.1186/s13321-021-00504-4
  • Rácz A, Dunn TB, Bajusz D, et al. Extended continuous similarity indices: theory and application for QSAR descriptor selection. J Comput Aided Mol Des. 2022;36(3):157–173. doi: 10.1007/s10822-022-00444-7
  • van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9:2579–2605.
  • Nguyen K, Blum L, van Deursen R, et al. Classification of organic molecules by molecular quantum numbers. ChemMedchem. 2009;4(11):1803–1805. doi: 10.1002/cmdc.200900317
  • Awale M, van Deursen R, Reymond JL. MQN-Mapplet: visualization of chemical space with interactive maps of DrugBank, ChEMBL, PubChem, GDB-11, and GDB-13. J Chem Inf Model. 2013;53(2):509–518. doi: 10.1021/ci300513m
  • Oprea TI, Gottfries J. Chemography: the art of navigating in chemical space. J Comb Chem. 2001;3(2):157–166.
  • Awale M, Reymond JL. Similarity mapplet: interactive visualization of the directory of useful decoys and ChEMBL in high dimensional chemical spaces. J Chem Inf Model. 2015;55(8):1509–1516. doi: 10.1021/acs.jcim.5b00182
  • Naveja JJ, Medina-Franco JL. ChemMaps: towards an approach for visualizing the chemical space based on adaptive satellite compounds. F1000Res. 2017;6:1134. doi: 10.12688/f1000research.12095.2
  • Ruggiu F, Marcou G, Varnek A, et al. ISIDA property-labelled fragment descriptors. Mol Inf. 2010;29(12):855–868. doi: 10.1002/minf.201000099
  • Sidorov P, Gaspar H, Marcou G, et al. Mappability of drug-like space: towards a polypharmacologically competent map of drug-relevant compounds. J ComputAided Mol Des. 2015;29(12):1087–1108. doi: 10.1007/s10822-015-9882-z
  • Kayastha S, Horvath D, Gilberg E, et al. Privileged structural motif detection and analysis using generative topographic maps. J Chem Inf Model. 2017;57(5):1218–1232. doi: 10.1021/acs.jcim.7b00128
  • Probst D, Reymond JL. Visualization of very large high-dimensional data sets as minimum spanning trees. J Cheminf. 2020;12(1). doi: 10.1186/s13321-020-0416-x
  • Segler MHS, Kogej T, Tyrchan C, et al. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Central Sci. 2018;4(1):120–131. doi: 10.1021/acscentsci.7b00512
  • Gupta A, Müller AT, Huisman BJH, et al. Generative recurrent networks for de novo drug design. Mol Inform. 2018;37(1–2):1700111. doi: 10.1002/minf.201700111
  • Amabilino S, Pogány P, Pickett SD, et al. Guidelines for recurrent neural network transfer learning-based molecular generation of focused libraries. J Chem Inf Model. 2020;60(12):5699–5713. doi: 10.1021/acs.jcim.0c00343
  • Yonchev D, Bajorath J. Integrating computational lead optimization diagnostics with analog design and candidate selection. Future Sci OA. 2020;6(3):FSO451. doi: 10.2144/fsoa-2019-0131
  • Olivecrona M, Blaschke T, Engkvist O, et al. Molecular de-novo design through deep reinforcement learning. J Cheminf. 2017;9(1). doi: 10.1186/s13321-017-0235-x
  • Blaschke T, Olivecrona M, Engkvist O, et al. Application of generative autoencoder in de novo molecular design. Mol Inform. 2018;37(1–2):1700123. doi: 10.1002/minf.201700123
  • Zhavoronkov A, Ivanenkov YA, Aliper A, et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat Biotechnol. 2019;37(9):1038–1040. doi: 10.1038/s41587-019-0224-x
  • Gómez-Bombarelli R, Wei JN, Duvenaud D, et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Sci. 2018;4(2):268–276. doi: 10.1021/acscentsci.7b00572
  • Colby SM, Nuñez JR, Hodas NO, et al. Deep learning to generate in silico chemical property libraries and candidate molecules for small molecule identification in complex samples. Anal Chem. 2020;92(2):1720–1729. doi: 10.1021/acs.analchem.9b02348
  • Polykovskiy D, Zhebrak A, Vetrov D, et al. Entangled conditional adversarial autoencoder for de novo drug discovery. Mol Pharm. 2018;15(10):4398–4405. doi: 10.1021/acs.molpharmaceut.8b00839
  • Hong SH, Ryu S, Lim J, et al. Molecular generative model based on an adversarially regularized autoencoder. J Chem Inf Model. 2020;60(1):29–36. doi: 10.1021/acs.jcim.9b00694
  • Arús-Pous J, Blaschke T, Ulander S, et al. Exploring the GDB-13 chemical space using deep generative models. J Cheminf. 2019;11(1). doi: 10.1186/s13321-019-0341-z
  • Bertz SH. The first general index of molecular complexity. J Am Chem Soc. 1981;103(12):3599–3601. doi: 10.1021/ja00402a071
  • Ertl P, Rohde B, Selzer P. Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. J Med Chem. 2000;43(20):3714–3717. doi: 10.1021/jm000942e
  • Bemis GW, Murcko MA. The properties of known drugs. 1. molecular frameworks. J Med Chem. 1996;39(15):2887–2893. doi: 10.1021/jm9602928
  • Wildman SA, Crippen GM. Prediction of physicochemical parameters by atomic contributions. J Chem Inf Comput Sci. 1999;39(5):868–873. doi: 10.1021/ci990307l
  • Bickerton GR, Paolini GV, Besnard J, et al. Quantifying the chemical beauty of drugs. Nat Chem. 2012;4(2):90–98. doi: 10.1038/nchem.1243
  • Ertl P, Schuffenhauer A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J Cheminf. 2009;1(1):8. doi: 10.1186/1758-2946-1-8
  • Preuer K, Renz P, Unterthiner T, et al. Fréchet ChemNet distance: A metric for generative models for molecules in drug discovery. J Chem Inf Model. 2018;58(9):1736–1741. doi: 10.1021/acs.jcim.8b00234
  • Heusel M, Ramsauer H, Unterthiner T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Adv Neural Inf Process Syst. 2017;30:6629–6640.
  • Salimans T, Goodfellow I, Zaremba W, et al. Improved techniques for training GANs. In: Lee D, Sugiyama M, Luxburg U , editors. Advances in Neural Information Processing Systems. Red Hook, NY: Curran Associates; 2016. p. 2234–2242.
  • González-Medina M, Prieto-Martínez FD, Owen JR, et al. Consensus diversity plots: a global diversity analysis of chemical libraries. J Cheminf. 2016;8(1). doi: 10.1186/s13321-016-0176-9
  • Medina-Franco J, Martínez-Mayorga K, Bender A, et al. Scaffold diversity analysis of compound data sets using an entropy-based measure. QSAR Comb Sci. 2009;28(11–12):1551–1560. doi: 10.1002/qsar.200960069
  • Krier M, Bret G, Rognan D. Assessing the scaffold diversity of screening libraries. J Chem Inf Model. 2006;46(2):512–524. doi: 10.1021/ci050352v
  • Fourches D, Tropsha A. Using graph indices for the analysis and comparison of chemical datasets. Mol Inf. 2013;32(9–10):827–842. doi: 10.1002/minf.201300076
  • Fernández-de Gortari E, García-Jacas CR, Martinez-Mayorga K, et al. Database fingerprint (DFP): an approach to represent molecular databases. J Cheminform. 2017;9(1). doi: 10.1186/s13321-017-0195-1
  • Pikalyova R, Zabolotna Y, Volochnyuk DM, et al. Exploration of the chemical space of DNA-encoded libraries. Mol Inf. 2022;41(6). doi: 10.1002/minf.202100289
  • Casciuc I, Zabolotna Y, Horvath D, et al. Virtual screening with generative topographic maps: how many maps are required? J Chem Inf Model. 2019;59(1):564–572. doi: 10.1021/acs.jcim.8b00650
  • Lessel U, Lemmen C. Comparison of large chemical spaces. ACS Med Chem Lett. 2019;10(10):1504–1510.
  • Durant JL, Leland BA, Henry DR, et al. Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci. 2002;42(6):1273–1280. doi: 10.1021/ci010132r
  • Gao W, Coley CW. The synthesizability of molecules proposed by generative models. J Chem Inf Model. 2020;60(12):5714–5723. doi: 10.1021/acs.jcim.0c00174

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.