13,182
Views
29
CrossRef citations to date
0
Altmetric
Report

In silico proof of principle of machine learning-based antibody design at unconstrained scale

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, , ORCID Icon, ORCID Icon & ORCID Icon show all
Article: 2031482 | Received 17 Aug 2021, Accepted 17 Jan 2022, Published online: 04 Apr 2022

References

  • Lu R-M, Hwang Y-C, Liu I-J, Lee -C-C, Tsai H-Z, Li H-J, Wu H-C. Development of therapeutic antibodies for the treatment of diseases. J Biomed Sci. 2020;27(1):1. doi:10.1186/s12929-019-0592-z.
  • Wang C, Li W, Drabek D, Okba NMA, van Haperen R, Osterhaus ADME, van Kuppeveld FJM, Haagmans BL, Grosveld F, Bosch B-J. A human monoclonal antibody blocking SARS-CoV-2 infection. Nat Commun. 2020;11(1):2251. doi:10.1038/s41467-020-16256-y.
  • Marasco WA, Sui J. The growth and potential of human antiviral monoclonal antibody therapeutics. Nat Biotechnol. 2007;25(12):1421–18. doi:10.1038/nbt1363.
  • Liu C, Zhou Q, Li Y, Garner LV, Watkins SP, Carter LJ, Smoot J, Gregg AC, Daniels AD, Jervey S, et al. Research and development on therapeutic agents and vaccines for COVID-19 and related human coronavirus diseases. ACS Cent Sci. 2020;6(3):315–31. doi:10.1021/acscentsci.0c00272.
  • Laustsen AH, Bohn M-F, Ljungars A. The challenges with developing therapeutic monoclonal antibodies for pandemic application. Expert Opin Drug Discov. 2022;17(1): 5–8.
  • Torjesen I. Drug development: the journey of a medicine from lab to shelf. Pharm J [Internet] 2015; Available from: https://www.pharmaceutical-journal.com/publications/tomorrows-pharmacist/drug-development-the-journey-of-a-medicine-from-lab-to-shelf/20068196.article?firstPass=false
  • Narayanan H, Dingfelder F, Butté A, Lorenzen N, Sokolov M, Arosio P. Machine learning for biologics: opportunities for protein engineering, developability, and formulation. Trends Pharmacol Sci [Internet]. 2021;42(3):151–65. doi:10.1016/j.tips.2020.12.004.
  • Laustsen AH, Greiff V, Karatt-Vellatt A, Muyldermans S, Jenkins TP. Animal immunization, in vitro display technologies, and machine learning for antibody discovery. Trends Biotechnol [Internet]. 2021;39(12):1263–73. doi:10.1016/j.tibtech.2021.03.003.
  • Carter PJ, Lazar GA. Next generation antibody drugs: pursuit of the “high-hanging fruit. Nat Rev Drug Discov. 2018;17(3):197–223. doi:10.1038/nrd.2017.227.
  • Fischman S, Ofran Y. Computational design of antibodies. Curr Opin Struct Biol. 2018;51:156–62. doi:10.1016/j.sbi.2018.04.007.
  • Greiff V, Yaari G, Cowell L. Mining adaptive immune receptor repertoires for biological and clinical information using machine learning. Current Opinion in Systems Biology [Internet] 2020; Available from: http://www.sciencedirect.com/science/article/pii/S2452310020300524
  • Brown AJ, Snapkov I, Akbar R, Pavlović M, Miho E, Sandve GK, Greiff V. Augmenting adaptive immunity: progress and challenges in the quantitative engineering and analysis of adaptive immune receptor repertoires. Mol Syst Des Eng. 2019;4:701–36.
  • Norman RA, Ambrosetti F, Bonvin AMJJ, Colwell LJ, Kelm S, Kumar S, Krawczyk K. Computational approaches to therapeutic antibody design: established methods and emerging trends. Brief Bioinform [Internet]. 2019. doi:10.1093/bib/bbz095.
  • Graves J, Byerly J, Priego E, Makkapati N, Parish SV, Medellin B, Berrondo M. A review of deep learning methods for antibodies. Antibodies (Basel) [Internet]. 2020;9. Available from. doi:10.3390/antib9020012.
  • Csepregi L, Ehling RA, Wagner B, Reddy ST. Immune literacy: reading, writing, and editing adaptive immunity. iScience. 2020;23(9):101519. doi:10.1016/j.isci.2020.101519.
  • Pittala S, Bailey-Kellogg C, Elofsson A. Learning context-aware structural representations to predict antigen and antibody binding interfaces. Bioinformatics. 2020;36:3996–4003. doi:10.1093/bioinformatics/btaa263.
  • Pertseva M, Gao B, Neumeier D, Yermanos A, Reddy ST. Applications of machine and deep learning in adaptive immunity. 2021 [cited 2021 Apr 26]; Available from: https://www.annualreviews.org/doi/abs/10.1146/annurev-chembioeng-101420-125021
  • Wu Z, Johnston KE, Arnold FH, Yang KK. Protein sequence design with deep generative models [Internet]. arXiv [q-bio.QM]2021; Available from: http://arxiv.org/abs/2104.04457
  • Horst A, Smakaj E, Natali EN, Tosoni D, Babrak LM, Meier P, Miho E. Machine learning detects anti-DENV signatures in antibody repertoire sequences. Front Artif Intell [Internet]. 2021;4. Available from. https://www.frontiersin.org/articles/10.3389/frai.2021.715462/full.
  • Leem J, Mitchell LS, Farmery JHR, Barton J, Galson JD. Deciphering the language of antibodies using self-supervised learning [Internet]. bioRxiv2021 [cited 2021 Nov 18]; 2021.11.10.468064. Available from: https://www.biorxiv.org/content/10.1101/2021.11.10.468064v1
  • Ruffolo JA, Gray JJ, Sulam J. Deciphering antibody affinity maturation with language models and weakly supervised learning [Internet]. arXiv [q-bio.BM]2021; Available from: http://arxiv.org/abs/2112.07782
  • Shuai RW, Ruffolo JA, Gray JJ. Generative language modeling for antibody design [Internet]. bioRxiv2021 [cited 2022 Jan 15]; 2021.12.13.472419. Available from: https://www.biorxiv.org/content/10.1101/2021.12.13.472419v1
  • Amimeur T, Shaver JM, Ketchem RR, Alex Taylor J, Clark RH, Smith J, Van Citters D, Siska CC, Smidt P, Sprague M, et al. Designing feature-controlled humanoid antibody discovery libraries using generative adversarial networks [Internet]. bioRxiv2020 [cited 2020 May 28]; 2020.04.12.024844. Available from: https://www.biorxiv.org/content/10.1101/2020.04.12.024844v1
  • Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative Adversarial Nets. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ editors. Advances in neural information processing systems 27. Massachusetts: Curran Associates, Inc.; 2014. p. 2672–80.
  • Kovaltsuk A, Leem J, Kelm S, Snowden J, Deane CM, Krawczyk K. Observed antibody space: a resource for data mining next-generation sequencing of antibody repertoires. J Immunol. 2018;201(8):2502–09. doi:10.4049/jimmunol.1800708.
  • Friedensohn S, Neumeier D, Khan TA, Csepregi L, Parola C, de Vries ARG, Erlach L, Mason DM, Reddy ST. Convergent selection in antibody repertoires is revealed by deep learning [Internet]. bioRxiv2020 [cited 2020 May 29]; 2020.02.25.965673. Available from: https://www.biorxiv.org/content/10.1101/2020.02.25.965673v1
  • Kingma DP, Welling M. Auto-encoding variational bayes [Internet]. arXiv [stat.ML]2013; Available from: http://arxiv.org/abs/1312.6114v10
  • Widrich M, Schäfl B, Pavlović M, Ramsauer H, Gruber L, Holzleitner M, Brandstetter J, Sandve GK, Greiff V, Hochreiter S, et al. Modern hopfield networks and attention for immune repertoire classification. Adv Neural Inf Process Syst [Internet]. 2020;33. Available from. http://proceedings.neurips.cc/paper/2020/hash/da4902cb0bc38210839714ebdcf0efc3-Abstract.html.
  • Davidsen K, Olson BJ, DeWitt WS 3rd, Feng J, Harkins E, Bradley P, Matsen FA 4th. Deep generative models for T cell receptor protein sequences. Elife [Internet]. 2019;8. Available from. doi:10.7554/eLife.46935.
  • Saka K, Kakuzaki T, Metsugi S, Kashiwagi D, Yoshida K, Wada M, Tsunoda H, Teramoto R. Antibody design using LSTM based deep generative model from phage display library for affinity maturation. Sci Rep. 2021;11(1):5852. doi:10.1038/s41598-021-85274-7.
  • Eguchi RR, Anand N, Choe CA, Huang P-S. IG-VAE: generative modeling of immunoglobulin proteins by direct 3D coordinate generation [Internet]. 2020 [cited 2020 Aug 13]; 2020.08.07.242347. Available from: https://www.biorxiv.org/content/10.1101/2020.08.07.242347v1
  • Robert PA, Akbar R, Frank R, Pavlović M, Widrich M, Snapkov I, Chernigovskaya M, Scheffer L, Slabodkin A, Mehta BB , et al. One billion synthetic 3D-antibody-antigen complexes enable unconstrained machine-learning formalized investigation of antibody specificity prediction. 2021. [Internet]. Available from. doi:10.1101/2021.07.06.451258.
  • Robert PA, Meyer-Hermann M. Ymir: A 3D structural affinity model for multi-epitope vaccine simulations. iScience 2021; Available from: doi:https://doi.org/10.1016/j.isci.2021.102979
  • Robert PA, Marschall AL, Meyer-Hermann M. Induction of broadly neutralizing antibodies in germinal centre simulations. Curr Opin Biotechnol. 2018;51:137–45. doi:10.1016/j.copbio.2018.01.006.
  • Preuer K, Renz P, Unterthiner T, Hochreiter S, Klambauer G. Fréchet chemnet distance: a metric for generative models for molecules in drug discovery. J Chem Inf Model. 2018;58(9):1736–41. doi:10.1021/acs.jcim.8b00234.
  • Mathai N, Chen Y, Kirchmair J. Validation strategies for target prediction methods. Brief Bioinform. 2020;21(3):791–802. doi:10.1093/bib/bbz026.
  • Mason DM, Friedensohn S, Weber CR, Jordi C, Wagner B, Meng SM, Ehling RA, Bonati L, Dahinden J, Gainza P, et al. Optimization of therapeutic antibodies by predicting antigen specificity from antibody sequence via deep learning. Nat Biomed Eng. 2021;5(6):600–12. doi:10.1038/s41551-021-00699-9.
  • Greiff V, Menzel U, Miho E, Weber C, Riedel R, Cook S, Valai A, Lopes T, Radbruch A, Winkler TH, et al. Systems analysis reveals high genetic and antigen-driven predetermination of antibody repertoires throughout B cell development. Cell Rep. 2017;19(7):1467–78. doi:10.1016/j.celrep.2017.04.054.
  • Ferdous S, Martin ACR. AbDb: antibody structure database-a database of PDB-derived antibody structures. Database [Internet]. 2018. doi:10.1093/database/bay040.
  • Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, et al. Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422–23. doi:10.1093/bioinformatics/btp163.
  • Engelhart E, Lopez R, Emerson R, Lin C, Shikany C. Massively multiplexed affinity characterization of therapeutic antibodies against SARS-CoV-2 variants. bioRxiv [Internet] 2021; Available from: https://www.biorxiv.org/content/10.1101/2021.04.27.440939v1.abstract
  • Shin J-E, Riesselman AJ, Kollasch AW, McMahon C, Simon E, Sander C, Manglik A, Kruse AC, Marks DS. Protein design and variant prediction using autoregressive generative models. Nat Commun. 2021;12(1):2403. doi:10.1038/s41467-021-22732-w.
  • Akbar R, Robert PA, Pavlović M, Jeliazkov JR. A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding. Cell Reports [Internet] 2021; Available from: https://www.sciencedirect.com/science/article/pii/S2211124721001704
  • Biswas S, Khimulya G, Alley EC, Esvelt KM, Church GM. Low-N protein engineering with data-efficient deep learning. Nat Methods. 2021;18(4):389–96. doi:10.1038/s41592-021-01100-y.
  • AlQuraishi M, Sorger PK. Differentiable biology: using deep learning for biophysics-based and data-driven modeling of molecular mechanisms. Nat Methods. 2021;18(10):1169–80. doi:10.1038/s41592-021-01283-4.
  • Ethayarajh K, Jurafsky D. Utility is in the eye of the user: a critique of NLP leaderboards [Internet]. arXiv [cs.CL]2020; Available from: http://arxiv.org/abs/2009.13888
  • Isacchini G, Walczak AM, Mora T, Nourmohammad A. Deep generative selection models of T and B cell receptor repertoires with soNNia. Proc Natl Acad Sci U S A [Internet]. 2021:118. Available from. doi:10.1073/pnas.2023141118.
  • Semeniuta S, Severyn A, Gelly S. On accurate evaluation of GANs for language generation [Internet]. arXiv [cs.CL]2018; Available from: http://arxiv.org/abs/1806.04936
  • Renz P, Van Rompaey D, Wegner JK, Hochreiter S, Klambauer G. On failure modes of molecule generators and optimizers. 2020; Available from: https://chemrxiv.org/articles/On_Failure_Modes_of_Molecule_Generators_and_Optimizers/12213542
  • Mensink T, Uijlings J, Kuznetsova A, Gygli M, Ferrari V. Factors of influence for transfer learning across diverse appearance domains and task types [Internet]. arXiv [cs.CV]2021; Available from: http://arxiv.org/abs/2103.13318
  • Gelman S, Romero PA, Gitter A. Neural networks to learn protein sequence-function relationships from deep mutational scanning data. bioRxiv [Internet] 2020; Available from: https://www.biorxiv.org/content/10.1101/2020.10.25.353946v1.abstract
  • Rao R, Liu J, Verkuil R, Meier J, Canny JF, Abbeel P, Sercu T, Rives A. MSA transformer [internet]. Cold Spring Harbor Laboratory2021 [cited 2021 Feb 18]; 2021.02.12.430858. Available from: https://www.biorxiv.org/content/10.1101/2021.02.12.430858v1
  • Kurczab R, Bojarski AJ. The influence of the negative-positive ratio and screening database size on the performance of machine learning-based virtual screening. PLoS One. 2017;12(4):e0175410. doi:10.1371/journal.pone.0175410.
  • Kim J, Kim J. The impact of imbalanced training data on machine learning for author name disambiguation. Scientometrics. 2018;117(1):511–26. doi:10.1007/s11192-018-2865-9.
  • Pavlović M, Scheffer L, Motwani K, Kanduri C, Kompova R, Vazov N, Waagan K, Bernal FLM, Costa AA, Corrie B, et al. The immuneML ecosystem for machine learning analysis of adaptive immune receptor repertoires. Nat Mach Intell. 2021;3(11):936–44. doi:10.1038/s42256-021-00413-z.
  • Gane A, Belanger D, Dohan D, Angermueller C, Vora RDS, Chapelle O, Alipanahi B, Murphy K, Colwell L. A comparison of generative models for sequence design [Internet]. [cited 2021 Oct 24]; Available from: https://research.google/pubs/pub49141.pdf
  • Seib V, Lange B, Wirtz S. Mixing real and synthetic data to enhance neural network training – a review of current approaches [Internet]. arXiv [cs.CV]2020; Available from: http://arxiv.org/abs/2007.08781
  • Shen J, Nicolaou CA. Molecular property prediction: recent trends in the era of artificial intelligence. Drug Discov Today Technol [Internet] 2020; Available from: http://www.sciencedirect.com/science/article/pii/S1740674920300032
  • Athey J, Alexaki A, Osipova E, Rostovtsev A, Santana-Quintero LV, Katneni U, Simonyan V, Kimchi-Sarfaty C. A new and updated resource for codon usage tables. BMC Bioinform. 2017;18(1):391. doi:10.1186/s12859-017-1793-7.
  • DeVries T, Drozdzal M, Taylor GW. Instance selection for GANs [Internet]. arXiv [cs.CV]2020; Available from: http://arxiv.org/abs/2007.15255
  • Jin W, Wohlwend J, Barzilay R, Jaakkola T. Iterative refinement graph neural network for antibody sequence-structure co-design [Internet]. arXiv [q-bio.BM]2021; Available from: http://arxiv.org/abs/2110.04624
  • Chen X, Dougherty T, Hong C, Schibler R, Zhao YC, Sadeghi R, Matasci N, Wu Y-C, Kerman I. Predicting antibody developability from sequence using machine learning [Internet]. 2020 [cited 2020 Oct 9]; 2020.06.18.159798. Available from: https://www.biorxiv.org/content/10.1101/2020.06.18.159798v1.abstract
  • Melnyk I, Das P, Chenthamarakshan V, Lozano A. Benchmarking deep generative models for diverse antibody sequence design [Internet]. arXiv [q-bio.BM]2021; Available from: http://arxiv.org/abs/2111.06801
  • Gao W, Mahajan SP, Sulam J, Gray JJ. Deep learning in protein structural modeling and design [ Internet]. arXiv [q-bio.BM]2020; Available from: http://arxiv.org/abs/2007.08383
  • Jiménez-Luna J, Grisoni F, Schneider G. Drug discovery with explainable artificial intelligence [ Internet]. arXiv [cs.AI]2020; Available from: http://arxiv.org/abs/2007.00523
  • Preuer K, Klambauer G, Rippmann F, Hochreiter S, Unterthiner T. Interpretable deep learning in drug discovery. In: Samek W, Montavon G, Vedaldi A, Hansen LK, Müller K-R, editors. Explainable AI: interpreting, explaining and visualizing deep learning. Cham: Springer International Publishing; 2019. p. 331–45.
  • Ruffolo JA, Sulam J, Gray JJ. Antibody structure prediction using interpretable deep learning [Internet]. bioRxiv2021 [cited 2021 Jul 2]; 2021.05.27.445982. Available from: https://www.biorxiv.org/content/10.1101/2021.05.27.445982v1
  • DeWitt WS, Lindau P, Snyder TM, Sherwood AM, Vignali M, Carlson CS, Greenberg PD, Duerkopp N, Emerson RO, Robins HS. A public database of memory and naive B-cell receptor sequences. PLoS One. 2016;11(8):e0160853. doi:10.1371/journal.pone.0160853.
  • Mann M, Saunders R, Smith C, Backofen R, Deane CM. Producing high-accuracy lattice models from protein atomic coordinates including side chains. Adv Bioinformatics. 2012;2012:148045. doi:10.1155/2012/148045.
  • Miyazawa S, Jernigan RL. Residue–residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J Mol Biol. 1996;256(3):623–44. doi:10.1006/jmbi.1996.0114.
  • Bailly M, Mieczkowski C, Juan V, Metwally E, Tomazela D, Baker J, Uchida M, Kofman E, Raoufi F, Motlagh S, et al. Predicting antibody developability profiles through early stage discovery screening. MAbs. 2020;12(1):1743053. doi:10.1080/19420862.2020.1743053.
  • Raybould MIJ, Marks C, Krawczyk K, Taddese B, Nowak J, Lewis AP, Bujotzek A, Shi J, Deane CM. Five computational developability guidelines for therapeutic antibody profiling. Proceedings of the National Academy of Sciences 2019; 116:4025–30.
  • Reynisson B, Barra C, Kaabinejadian S, Hildebrand WH, Peters B, Nielsen M. Improved prediction of MHC II antigen presentation through integration and motif deconvolution of mass spectrometry MHC eluted ligand data. J Proteome Res [Internet]. 2020;19(6):2304–15. doi:10.1021/acs.jproteome.9b00874.
  • Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80. doi:10.1162/neco.1997.9.8.1735.
  • Kingma DP, Ba J. Adam: a method for stochastic optimization [Internet]. arXiv [cs.LG]2014; Available from: http://arxiv.org/abs/1412.6980
  • Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, et al. TensorFlow: large-scale machine learning on heterogeneous distributed systems [Internet]. arXiv [cs.DC]2016; Available from: http://arxiv.org/abs/1603.04467
  • Ohtamaa DNM Python-levenshtein.Tinkle] https://githubcom/miohtama/python-Levenshtein[Kreiptasi:2016-03-12] [Internet] Available from: https://pypi.org/project/python-Levenshtein/
  • Palme J, Hochreiter S, Bodenhofer U. KeBABS: an R package for kernel-based analysis of biological sequences. Bioinformatics. 2015;btv176.
  • Weber CR, Akbar R, Yermanos A, Pavlović M, Snapkov I, Sandve GK, Reddy ST, Greiff V. immuneSIM: tunable multi-feature simulation of B- and T-cell receptor repertoires for immunoinformatics benchmarking. Bioinf [Internet]. 2020;36(11):3594–96. doi:10.1093/bioinformatics/btaa158.
  • Greiff V, Weber CR, Palme J, Bodenhofer U, Miho E, Menzel U, Reddy ST. Learning the high-dimensional immunogenomic features that predict public and private antibody repertoires. J Immunol. 2017;199(8):2985–97. doi:10.4049/jimmunol.1700594.
  • Vermeesch P, Resentini A, Garzanti E. An R package for statistical provenance analysis. Sediment Geol. 2016;336:14–25. doi:10.1016/j.sedgeo.2016.01.009.
  • Mason DM, Weber CR, Parola C, Meng SM, Greiff V, Kelton WJ, Reddy ST. High-throughput antibody engineering in mammalian cells by CRISPR/Cas9-mediated homology-directed mutagenesis. Nucleic Acids Res. 2018;46(14):7436–49. doi:10.1093/nar/gky550.
  • Wickham H. ggplot2: elegant graphics for data analysis. New York: Springer-Verlag New York; 2009.