1,929
Views
0
CrossRef citations to date
0
Altmetric
Report

Large-scale data mining of four billion human antibody variable regions reveals convergence between therapeutic and natural antibodies that constrains search space for biologics drug discovery

, , , , , , , , , , ORCID Icon, , , & ORCID Icon show all
Article: 2361928 | Received 30 Jan 2024, Accepted 27 May 2024, Published online: 06 Jun 2024

References

  • Kaplon H, Crescioli S, Chenoweth A, Visweswaraiah J, Reichert JM. Antibodies to watch in 2023. Mabs-austin. 2023;15(1):2153410. doi:10.1080/19420862.2022.2153410.
  • Mullard A. 2022 FDA approvals. Nat Rev Drug Discov. 2023;22(2):83–12. doi:10.1038/d41573-023-00001-3.
  • Boyd SD, Joshi SA, Crowe JE Jr., Boraschi D, Rappuoli R. High-throughput DNA sequencing analysis of antibody repertoires. Microbiol Spectr. 2014;2(5). doi:10.1128/microbiolspec.AID-0017-2014.
  • Marchalonis JJ, Adelman MK, Schluter SF, Ramsland PA. The antibody repertoire in evolution: chance, selection, and continuity. Dev Comp Immunol. 2006;30(1–2):223–47. doi:10.1016/j.dci.2005.06.011.
  • Rees AR. Understanding the human antibody repertoire. Mabs-austin. 2020;12(1):1729683. doi:10.1080/19420862.2020.1729683.
  • Khass M, Vale AM, Burrows PD, Schroeder HW Jr. The sequences encoded by immunoglobulin diversity (DH) gene segments play key roles in controlling B-cell development, antigen-binding site diversity, and antibody production. Immunological Reviews. 2018;284(1):106–19. doi:10.1111/imr.12669.
  • Soto C, Bombardi RG, Branchizio A, Kose N, Matta P, Sevy AM, Sinkovits RS, Gilchuk P, Finn JA, Crowe JE Jr. High frequency of shared clonotypes in human B cell receptor repertoires. Nature. 2019;566(7744):398–402. doi:10.1038/s41586-019-0934-8.
  • Briney B, Inderbitzin A, Joyce C, Burton DR. Commonality despite exceptional diversity in the baseline human antibody repertoire. Nature. 2019;566(7744):393–97. doi:10.1038/s41586-019-0879-y.
  • Lupo C, Spisak N, Walczak AM, Mora T, Yates AJ. Learning the statistics and landscape of somatic mutation-induced insertions and deletions in antibodies. PLOS Comput Biol. 2022;18(6):e1010167. doi:10.1371/journal.pcbi.1010167.
  • Perelson AS, Oster GF. Theoretical studies of clonal selection: minimal antibody repertoire size and reliability of self-non-self discrimination. J Theor Biol. 1979;81(4):645–70. doi:10.1016/0022-5193(79)90275-3.
  • Krawczyk K, Kelm S, Kovaltsuk A, Galson JD, Kelly D, Trück J, Regep C, Leem J, Wong WK, Nowak J. et al. Structurally mapping antibody repertoires. Front Immunol. 2018;9:1698. doi:10.3389/fimmu.2018.01698.
  • Raybould MIJ, Marks C, Kovaltsuk A, Lewis AP, Shi J, Deane CM, Dunbrack RL. Public baseline and shared response structures support the theory of antibody repertoire functional commonality. PLOS Comput Biol. 2021;17(3):e1008781. doi:10.1371/journal.pcbi.1008781.
  • Kovaltsuk A, Raybould MIJ, Wong WK, Marks C, Kelm S, Snowden J, Trück J, Deane CM, Ofran Y. Structural diversity of B-cell receptor repertoires along the B-cell differentiation axis in humans and mice. PLOS Comput Biol. 2020;16(2):e1007636. doi:10.1371/journal.pcbi.1007636.
  • Galson JD, Trück J, Fowler A, Clutterbuck EA, Münz M, Cerundolo V, Reinhard C, van der Most R, Pollard AJ, Lunter G. et al. Analysis of B cell repertoire dynamics following hepatitis B vaccination in humans, and enrichment of vaccine-specific antibody sequences. EBioMedicine. 2015;2(12):2070–79. doi:10.1016/j.ebiom.2015.11.034.
  • Khetan R, Curtis R, Deane CM, Hadsund JT, Kar U, Krawczyk K, Kuroda D, Robinson SA, Sormanni P, Tsumoto K. et al. Current advances in biopharmaceutical informatics: guidelines, impact and challenges in the computational developability assessment of antibody therapeutics. Mabs-austin. 2022;14(1):2020082. doi:10.1080/19420862.2021.2020082.
  • Młokosiewicz J, Deszyński P, Wilman W, Jaszczyszyn I, Ganesan R, Kovaltsuk A, Leem J, Galson JD, Krawczyk K, Alkan C. AbDiver: a tool to explore the natural antibody landscape to aid therapeutic design. Bioinformatics. 2022;38(9):2628–30. doi:10.1093/bioinformatics/btac151.
  • Krawczyk K, Raybould MIJ, Kovaltsuk A, Deane CM. Looking for therapeutic antibodies in next-generation sequencing repositories. Mabs-austin. 2019;11(7):1197–205. doi:10.1080/19420862.2019.1633884.
  • Petersen BM, Ulmer SA, Rhodes ER, Gutierrez-Gonzalez MF, Dekosky BJ, Sprenger KG, Whitehead TA. Regulatory approved monoclonal antibodies contain framework mutations predicted from human antibody repertoires. Front Immunol. 2021;12:728694. doi:10.3389/fimmu.2021.728694.
  • Jain T, Sun T, Durand S, Hall A, Houston NR, Nett JH, Sharkey B, Bobrowicz B, Caffry I, Yu Y. et al. Biophysical properties of the clinical-stage antibody landscape. Proc Natl Acad Sci USA. 2017;114(5):944–49. doi:10.1073/pnas.1616408114.
  • Olsen TH, Boyles F, Deane CM. Observed antibody space: a diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences. Protein Sci. 2022;31(1):141–46. doi:10.1002/pro.4205.
  • Kovaltsuk A, Leem J, Kelm S, Snowden J, Deane CM, Krawczyk K. Observed antibody space: a resource for data mining next-generation sequencing of antibody repertoires. J Immunol. 2018;201(8):2502–09. doi:10.4049/jimmunol.1800708.
  • Corrie BD, Marthandan N, Zimonja B, Jaglale J, Zhou Y, Barr E, Knoetze N, Breden FMW, Christley S, Scott JK. et al. iReceptor: a platform for querying and analyzing antibody/B-cell and T-cell receptor repertoire data across federated repositories [Internet]. Immunol Rev. 2018;284(1):24–41. doi:10.1111/imr.12666.
  • Norman RA, Ambrosetti F, Bonvin AMJJ, Colwell LJ, Kelm S, Kumar S, Krawczyk K. Computational approaches to therapeutic antibody design: established methods and emerging trends. Briefings Bioinf. 2020;21(5):1549–67. doi:10.1093/bib/bbz095.
  • Wilman W, Wróbel S, Bielska W, Deszynski P, Dudzic P, Jaszczyszyn I, Kaniewski J, Młokosiewicz J, Rouyan A, Satława T. et al. Machine-designed biotherapeutics: opportunities, feasibility and advantages of deep learning in computational antibody discovery. Brief Bioinform [Internet]. 2022;23(4). doi:10.1093/bib/bbac267.
  • Ye J, Ma N, Madden TL, Ostell JM. IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res. 2013;41(W1):W34–40. doi:10.1093/nar/gkt382.
  • Dunbar J, Deane CM. ANARCI: antigen receptor numbering and receptor classification. Bioinformatics. 2016;32(2):298–300. doi:10.1093/bioinformatics/btv552.
  • Lefranc MP, Giudicelli V, Ginestoux C, Bodmer J, Müller W, Bontrop R, Lemaitre M, Malik A, Barbié V, Chaume D. IMGT, the international ImMunoGeneTics database. Nucleic Acids Res. 1999;27(1):209–12. doi:10.1093/nar/27.1.209.
  • Smakaj E, Babrak L, Ohlin M, Shugay M, Briney B, Tosoni D, Galli C, Grobelsek V, D’Angelo I, Olson B. et al. Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences. Bioinformatics. 2020;36(6):1731–39. doi:10.1093/bioinformatics/btz845.
  • Steinegger M, Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol. 2017;35(11):1026–28. doi:10.1038/nbt.3988.
  • Omer A, Shemesh O, Peres A, Polak P, Shepherd AJ, Watson CT, Boyd SD, Collins AM, Lees W, Yaari G. VDJbase: an adaptive immune receptor genotype and haplotype database. Nucleic Acids Res. 2020;48(D1):D1051–6. doi:10.1093/nar/gkz872.
  • Deszyński P, Młokosiewicz J, Volanakis A, Jaszczyszyn I, Castellana N, Bonissone S, Ganesan R, Krawczyk K. INDI—integrated nanobody database for immunoinformatics. Nucleic Acids Res. 2022;50(D1):D1273–81. doi:10.1093/nar/gkab1021.
  • Jovčevska I, Muyldermans S. The therapeutic potential of nanobodies. BioDrugs. 2020;34(1):11–26. doi:10.1007/s40259-019-00392-z.
  • Briney BS, Willis JR, Hicar MD, Thomas JW 2nd, Crowe JE Jr. Frequency and genetic characterization of V(DD)J recombinants in the human peripheral blood antibody repertoire. Immunology. 2012;137(1):56–64. doi:10.1111/j.1365-2567.2012.03605.x.
  • Weber CR, Akbar R, Yermanos A, Pavlović M, Snapkov I, Sandve GK, Reddy ST, Greiff V, Schwartz R. immuneSIM: tunable multi-feature simulation of B- and T-cell receptor repertoires for immunoinformatics benchmarking. Bioinformatics. 2020;36(11):3594–96. doi:10.1093/bioinformatics/btaa158.
  • Isacchini G, Walczak AM, Mora T, Nourmohammad A. Deep generative selection models of T and B cell receptor repertoires with soNnia. Proc Natl Acad Sci USA. 2021;118(14):e2023141118. doi:10.1073/pnas.2023141118.
  • Ostrovsky-Berman M, Frankel B, Polak P, Yaari G. Immune2vec: Embedding B/T Cell Receptor Sequences in ℝ N using natural language processing. Front Immunol. 2021;12:680687. doi:10.3389/fimmu.2021.680687.
  • Mason DM, Friedensohn S, Weber CR, Jordi C, Wagner B, Meng SM, Ehling RA, Bonati L, Dahinden J, Gainza P. et al. Optimization of therapeutic antibodies by predicting antigen specificity from antibody sequence via deep learning. Nat Biomed Eng. 2021;5(6):600–12. doi:10.1038/s41551-021-00699-9.
  • Bashour H, Smorodina E, Pariset M, Zhong J, Akbar R, Chernigovskaya M, Quý KL, Snapkov I, Rawat P, Krawczyk K. et al. Cartography of the developability landscapes of native and human-engineered antibodies [Internet]. bioRxiv. 2023 [accessed 2023 Nov 9]; 2023.10.26.563958. doi:10.1101/2023.10.26.563958v1.
  • Akbar R, Robert PA, Weber CR, Widrich M, Frank R, Pavlović M, Scheffer L, Chernigovskaya M, Snapkov I, Slabodkin A. et al. In silico proof of principle of machine learning-based antibody design at unconstrained scale. Mabs-austin. 2022;14(1):2031482.
  • Erasmus MF, Ferrara F, D’Angelo S, Spector L, Leal-Lopes C, Teixeira AA, Sørensen J, Nagpal S, Perea-Schmittle K, Choudhary A. et al. Insights into next generation sequencing guided antibody selection strategies. Sci Rep. 2023;13(1):18370. doi:10.1038/s41598-023-45538-w.
  • Teixeira AAR, D’Angelo S, Erasmus MF, Leal-Lopes C, Ferrara F, Spector LP, Naranjo L, Molina E, Max T, DeAguero A. et al. Simultaneous affinity maturation and developability enhancement using natural liability-free CDRs. Mabs-austin. 2022;14(1):2115200. doi:10.1080/19420862.2022.2115200.
  • Friedensohn S, Neumeier D, Khan TA, Csepregi L, Parola C, de Vries ARG, Erlach L, Mason DM, Reddy ST. Convergent selection in antibody repertoires is revealed by deep learning [Internet]. bioRxiv. 2020 [cited 2023 Jul 11];2020.02.25.965673. doi:10.1101/2020.02.25.965673v1.
  • Lim YW, Adler AS, Johnson DS. Predicting antibody binders and generating synthetic antibodies using deep learning. Mabs-austin. 2022;14(1):2069075. doi:10.1080/19420862.2022.2069075.
  • Jain T, Boland T, Vásquez M. Identifying developability risks for clinical progression of antibodies using high-throughput in vitro and in silico approaches. Mabs-austin. 2023;15(1):2200540. doi:10.1080/19420862.2023.2200540.
  • Saerens D, Pellis M, Loris R, Pardon E, Dumoulin M, Matagne A, Wyns L, Muyldermans S, Conrath K. Identification of a universal VHH framework to graft non-canonical antigen-binding loops of camel single-domain antibodies. J Mol Biol. 2005;352(3):597–607. doi:10.1016/j.jmb.2005.07.038.
  • Kelow S, Faezov B, Xu Q, Parker M, Adolf-Bryfogle J, Dunbrack RL. A penultimate classification of canonical antibody CDR conformations [Internet]. bioRxiv. 2022 [cited 2022 Nov 24]. 2022.10.12.511988. doi:10.1101/2022.10.12.511988.
  • Jaszczyszyn I, Bielska W, Gawlowski T, Dudzic P, Satława T, Kończak J, Wilman W, Janusz B, Wróbel S, Chomicz D. et al. Structural modeling of antibody variable regions using deep learning—progress and perspectives on drug discovery. Front Mol Biosci. 2023;10. doi:10.3389/fmolb.2023.1214424.
  • Leem J, Mitchell LS, Farmery JHR, Barton J, Galson JD. Deciphering the language of antibodies using self-supervised learning. Patterns Prejudice. 2022;3(7):100513. doi:10.1016/j.patter.2022.100513.
  • Ruffolo JA, Gray JJ, Sulam J. Deciphering antibody affinity maturation with language models and weakly supervised learning [Internet]. arXiv [q-bio.BM]. 2021. http://arxiv.org/abs/2112.07782.
  • Shrock EL, Timms RT, Kula T, Mena EL, West AP Jr, Guo R, Lee I-H, Cohen AA, McKay LGA, Bi C. et al. Germline-encoded amino acid–binding motifs drive immunodominant public antibody responses. Science. 2023;380(6640):eadc9498. doi:10.1126/science.adc9498.
  • Chomicz D, Kończak J, Wróbel S, Satława T, Dudzic P, Janusz B, Tarkowski M, Deszyński P, Gawłowski T, Kostyn A. et al. Benchmarking antibody clustering methods using sequence, structural, and machine learning similarity measures for antibody discovery applications. Front Mol Biosci. 2024;11:11. doi:10.3389/fmolb.2024.1352508.
  • Corcoran MM, Phad GE, Vázquez Bernat N, Stahl-Hennig C, Sumida N, Persson MAA, Martin M, Karlsson Hedestam GB. Production of individualized V gene databases reveals high levels of immunoglobulin genetic diversity. Nat Commun. 2016;7(1):13642. doi:10.1038/ncomms13642.
  • Ralph DK, Matsen FA, Buhler J. Per-sample immunoglobulin germline inference from B cell receptor deep sequencing data. PLoS Comput Biol. 2019 4th. 15(7):e1007133. doi:10.1371/journal.pcbi.1007133.
  • Nouri N, Kleinstein SH. A spectral clustering-based method for identifying clones from high-throughput B cell repertoire sequencing data. Bioinformatics. 2018;34(13):i341–9. doi:10.1093/bioinformatics/bty235.
  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–10. doi:10.1016/S0022-2836(05)80360-2.
  • Christley S, Aguiar A, Blanck G, Breden F, Bukhari SAC, Busse CE, Jaglale J, Harikrishnan SL, Laserson U, Peters B. et al. The ADC API: a web API for the programmatic query of the AIRR data commons. Front Big Data. 2020;3:22. doi:10.3389/fdata.2020.00022.