13,182

Views

CrossRef citations to date

Altmetric

Report

In silico proof of principle of machine learning-based antibody design at unconstrained scale

Rahmad Akbara Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, NorwayCorrespondence[email protected]

https://orcid.org/0000-0002-6692-0876 View further author information

Philippe A. Roberta Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway

https://orcid.org/0000-0003-1345-5015 View further author information

Cédric R. Weberb Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland

https://orcid.org/0000-0003-4802-8996 View further author information

Michael Widrichc Ellis Unit Linz and Lit Ai Lab, Institute for Machine Learning, Johannes Kepler University Linz, Linz, Austria

https://orcid.org/0000-0002-5721-0135 View further author information

Robert Franka Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway

https://orcid.org/0000-0001-9097-7963 View further author information

Milena Pavlovićd Department of Informatics, University of Oslo, Oslo, Norway

https://orcid.org/0000-0002-2484-3868 View further author information

Lonneke Schefferd Department of Informatics, University of Oslo, Oslo, Norway

https://orcid.org/0000-0001-8900-075X View further author information

Maria Chernigovskayaa Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway

https://orcid.org/0000-0002-1507-4171 View further author information

Igor Snapkova Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway

https://orcid.org/0000-0001-5341-685X View further author information

Andrei Slabodkina Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway

https://orcid.org/0000-0002-9320-1666 View further author information

Brij Bhushan Mehtaa Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway

https://orcid.org/0000-0002-8501-7076 View further author information

Enkelejda Mihoe Institute of Medical Engineering and Medical Informatics, School of Life Sciences, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Muttenz, Switzerland

https://orcid.org/0000-0001-6461-0519 View further author information

Fridtjof Lund-Johansena Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway

https://orcid.org/0000-0002-2445-1258 View further author information

Jan Terje Andersena Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, Norway;f Institute of Clinical Medicine, Department of Pharmacology, University of Oslo, Oslo, Norway

https://orcid.org/0000-0003-1710-1628 View further author information

Sepp Hochreiterc Ellis Unit Linz and Lit Ai Lab, Institute for Machine Learning, Johannes Kepler University Linz, Linz, Austria;g Institute of Advanced Research in Artificial Intelligence (IARAI), Austria

https://orcid.org/0000-0001-7449-2528 View further author information

Ingrid Hobæk Haffh Department of Mathematics, University of Oslo, Oslo, NorwayView further author information

Günter Klambauerc Ellis Unit Linz and Lit Ai Lab, Institute for Machine Learning, Johannes Kepler University Linz, Linz, Austria

https://orcid.org/0000-0003-2861-5552 View further author information

Geir Kjetil Sandved Department of Informatics, University of Oslo, Oslo, Norway

https://orcid.org/0000-0002-4959-1409 View further author information

Victor Greiffa Department of Immunology, Oslo University Hospital Rikshospitalet and University of Oslo, NorwayCorrespondence[email protected]

https://orcid.org/0000-0003-2622-5032 View further author information

show all

1D:	=	One dimensional
3D:	=	Three dimensional
CDR-H3:	=	Complementarity-determining region 3 of the heavy chain
CNN:	=	Convolutional neural network
GANs:	=	Generative adversarial networks
HER2:	=	Human epidermal growth factor 2
HIV:	=	Human immunodeficiency virus
KSD:	=	Kolgomorov-Smirnov distance
LD:	=	Levenshtein distance
Low-N:	=	Lower-sized training dataset
LSTM:	=	Long short-term memory
mAb:	=	Monoclonal antibody
MHCI:	=	Major histocompatibility complex I
MHCII:	=	Major histocompatibility complex II
ML:	=	Machine learning
OAS:	=	Observed antibody space
PDB:	=	Protein data bank
RNN:	=	Recurrent neural network
SARS-CoV-2:	=	Severe acute respiratory syndrome coronavirus 2
TCRβ:	=	T cell receptor beta
VAE:	=	Variational autoencoders

In silico proof of principle of machine learning-based antibody design at unconstrained scale

ABSTRACT

Introduction

Results

Deep learning generates novel antigen-specific CDR-H3 sequences across a wide range of developability parameters

Table 1. List of 3D-antigens used in the deep-learning-based antibody generation pipeline

Table 2. Antibody developability parameters and their computational implementation

On-demand generation of large amounts of CDR-H3 sequences with broad developability and affinity that match or exceed the training sequences

The quality of ML-based antibody sequence generation is a function of the size of the training data

Transfer learning enables the generation of high-affinity CDR-H3 sequences from lower-sized (low-N) training datasets

Antibody-design conclusions gained from simulated antibody-antigen binding data on required sequence diversity hold on experimental antibody-antigen data

Discussion

Methods

Reference experimental immunoglobulin and 3D-crystal structure antigen data

Experimental datasets used for the experimental validation of antibody-design conclusions drawn from ML training on simulated antibody-antigen binding data

Reference CNN model trained on experimental human epidermal growth factor 2 (HER2) CDR-H3 binder and non-binder sequences

Generation of lattice-based antibody-antigen binding structures using Absolut!

Computation of developability parameters

Deep generative learning using long short-term memory neural networks for generating antibody CDR-H3 sequences

Implementation of transfer learning

Sequence similarity, composition, and long-range dependencies

Distance between distributions

Mean squared error of positional amino acid frequency matrix

Generation of sequences from position-specific weight matrix (PWM)

Graphics

Hardware

List of abbreviations

Data and code availability

Supplemental Material

Acknowledgments

Disclosure statement

Supplementary material

References

Information for

Open access

Opportunities

Help and information

In silico proof of principle of machine learning-based antibody design at unconstrained scale

ABSTRACT

Introduction

Results

Deep learning generates novel antigen-specific CDR-H3 sequences across a wide range of developability parameters

Table 1. List of 3D-antigens used in the deep-learning-based antibody generation pipeline

Table 2. Antibody developability parameters and their computational implementation

On-demand generation of large amounts of CDR-H3 sequences with broad developability and affinity that match or exceed the training sequences

The quality of ML-based antibody sequence generation is a function of the size of the training data

Transfer learning enables the generation of high-affinity CDR-H3 sequences from lower-sized (low-N) training datasets

Antibody-design conclusions gained from simulated antibody-antigen binding data on required sequence diversity hold on experimental antibody-antigen data

Discussion

Methods

Reference experimental immunoglobulin and 3D-crystal structure antigen data

Experimental datasets used for the experimental validation of antibody-design conclusions drawn from ML training on simulated antibody-antigen binding data

Reference CNN model trained on experimental human epidermal growth factor 2 (HER2) CDR-H3 binder and non-binder sequences

Generation of lattice-based antibody-antigen binding structures using Absolut!

Computation of developability parameters

Deep generative learning using long short-term memory neural networks for generating antibody CDR-H3 sequences

Implementation of transfer learning

Sequence similarity, composition, and long-range dependencies

Distance between distributions

Mean squared error of positional amino acid frequency matrix

Generation of sequences from position-specific weight matrix (PWM)

Graphics

Hardware

List of abbreviations

Data and code availability

Supplemental Material

Acknowledgments

Disclosure statement

Supplementary material

Additional information

Funding

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date