Zero-Shot AI-Guided Mutations For Antibody Fitness Optimization
Two Stanford papers demonstrate improved antibody binding affinity, enzyme activity, and antibiotic resistance solely using AI models trained on general protein sequences.
Antibody discovery campaigns typically start by collecting a set of binders to an antigen using an experimental screen such as a phage display. Then, the task becomes optimizing affinity to the antigen of interest while maintaining developability properties. An experimental approach would include directed evolution, i.e. iteratively testing variants produced through mutagenesis. This blogpost investigates two papers that massively increase the throughput of evolving fitter proteins by using protein language models.
Reading Shanker et. al.’s Editor’s summary, I was impressed by the comment “In experimental screens of virus-neutralizing antibodies, the authors observed substantial improvement in binding affinity and neutralization for their predicted sequences”, especially since their ML model hadn’t been trained on any data outside of publicly available, general protein sequences.
The implication here being that language model confidence(or an ensemble of multiple) alone is sufficient to not only recommend biologically sound mutations, but also improve the fitness of a protein on tasks without specifically being trained on it. Through only 2 rounds of evolution no less.
Another paper by a subset of the authors of Shanker et. al., Hie et. al., finds that language model-guided substitutions “improved the binding affinities of four clinically relevant, highly mature antibodies up to sevenfold and three unmatured antibodies up to 160-fold”. This suggests that training language models on all proteins available can lead to emulating evolutionary pressures as seen in mutagenesis.
And while the authors focus antibody affinity maturation and neutralization tasks, the approach should by definition be applicable any loosely defined “fitness” optimization task. Hie et. al. show the example of enzyme kinetics improving from 3 to 20% for a alkaline phosphatase.
Interestingly, the majority of mutations produced through the language modeling approach were in framework regions, where traditional approaches almost always focus on CDRs. This is a promising result in that AI-guided approaches may not necessarily replace experimental approaches, but rather unlock alternatives that traditional methodologies can’t feasibly attempt.
Both approaches discussed are easily accessible through the Tamarind web platform for any type of protein inputs. Take a look below and get in touch at info@tamarind.bio to learn more on how to securely optimize your proprietary sequences!
Evolution starting from structure: https://www.tamarind.bio/structural-evolution
Evolution starting from sequence: https://www.tamarind.bio/antibody-evolution