384
Views
0
CrossRef citations to date
0
Altmetric
Commentary

Chemical and biological language models in molecular design: opportunities, risks and scientific reasoning

ORCID Icon
Article: FSO957 | Received 20 Dec 2023, Accepted 03 Jan 2024, Published online: 07 Feb 2024

References

  • Vaswani A, Shazeer N, Parmar N et al. Attention is all you need. Adv. Neur. Inf. Proc. Sys. 30(1), 5998–6008 (2017).
  • Chen H, Covert IC, Lundberg SM, Lee S. Algorithms to estimate Shapley value feature attributions. Nat. Mach. Intell. 5(6), 590–601 (2023).
  • Rodríguez-Pérez R, Bajorath J. Explainable machine learning for property predictions in compound optimization. J. Med. Chem. 64(24), 17744–17752 (2021).
  • Feldmann C, Bajorath J. Machine learning reveals that structural features distinguishing promiscuous and non-promiscuous compounds depend on target combinations. Sci. Rep. 11(1), 7863 (2021).
  • Bontempi G, Flauder M. From dependency to causality: a machine learning approach. J. Mach. Learn. Res. 16(1), 2437–2457 (2015).
  • Lapuschkin S, Wäldchen S, Binder A, Montavon G, Samek W, Müller KR. Unmasking Clever Hans predictors and assessing what machines really learn. Nat. Commun. 10(1), 1096 (2019).
  • Pfungst O. Clever Hans (the horse of Mr. Von Osten): contribution to experimental animal and human psychology. J. Philos. Psychol. Sci. Meth. 8(1), 663–666 (1911).
  • Abel R, Wang L, Harder ED, Berne BJ, Friesner RA. Advancing drug discovery through enhanced free energy calculations. Acc. Chem. Res. 50(7), 1625–1632 (2017).
  • Volkov M, Turk J-A, Drizard N et al. On the frustration to predict binding affinities from protein–ligand structures with deep neural networks. J. Med. Chem. 65(11), 7946–7958 (2022).
  • Mastropietro A, Pasculli G, Bajorath J. Learning characteristics of graph neural networks predicting protein–ligand affinities. Nat. Mach. Intell. 5(12), 1427–1436 (2023).
  • Bajorath J. Potential inconsistencies or artifacts in deriving and interpreting deep learning models and key criteria for scientifically sound applications in the life sciences. Artif. Intell. Life Sci. 5(1), 100093 (2024).
  • Chen L, Tan X, Wang D et al. TransformerCPI: improving compound–protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments. Bioinformatics 36(16), 4406–4414 (2020).
  • Nguyen T, Le H, Quinn TP et al. GraphDTA: predicting drug–target binding affinity with graph neural networks. Bioinformatics 37(8), 1140–1147 (2020).
  • Zhao Q, Zhao H, Zheng K, Wang J. HyperAttentionDTI: improving drug–protein interaction prediction by sequence-based deep learning with attention mechanism. Bioinformatics 38(3), 655–662 (2022).
  • Chen L, Fan Z, Chang J et al. Sequence-based drug design as a concept in computational drug design. Nat. Commun. 14(1), 4217 (2023).
  • Grechishnikova D. Transformer neural network for protein-specific de novo drug generation as a machine translation problem. Sci. Rep. 11(1), 321 (2021).
  • Qian H, Lin C, Zhao D et al. AlphaDrug: protein target specific de novo molecular generation. PNAS Nexus 1(4), pgac227 (2022).
  • Yoshimori A, Bajorath J. Motif2Mol: prediction of new active compounds based on sequence motifs of ligand binding sites in proteins using a biochemical language model. Biomolecules 13(5), 833 (2023).
  • Rives A, Meier J, Sercu T et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118(25), e2016239118 (2021).
  • Elnaggar A, Heinzinger M, Dallago C et al. ProtTrans: toward understanding the language of life through self-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(1), 7112–7127 (2022).