ESMFold2 is the structure-prediction engine within the ESM3 architecture. It predicts atomic-level protein structures directly from sequence data at state-of-the-art speed and accuracy, without requiring the multiple sequence alignments that slow down traditional methods . This speed is what makes large-scale structural coverage practical.
ESM Atlas has been dramatically expanded. The original ESM Metagenomic Atlas from Meta FAIR covered roughly 600 million protein structures . Biohub's updated atlas now charts 6.8 billion proteins with 1.1 billion predicted structures, an order-of-magnitude expansion that provides structural coverage across far more of the protein universe
.
Additionally, the release includes esm3-sm-open-v1, a generative model trained on 2.78 billion natural proteins and augmented with synthetic data to 3.15 billion sequences, 236 million structures, and 539 million function annotations, totaling 771 billion tokens . This model is released under a non-commercial license for academic and non-profit use
.
The practical promise is speed and scale. Designing and validating therapeutic protein binders traditionally takes months or years of iterative wet-lab work. Biohub's tools compress this into weeks or days by enabling three capabilities:
A recurring criticism of AI-designed proteins is that they look good on a computer but fail in the lab. Biohub reports that this is not the case here. Binders designed entirely in silico using these models have been validated in real laboratory experiments—the AI-designed proteins bound their intended targets .
Alex Rives, head of science at Biohub, stated that "these models have acquired such a precise representation of biological processes that it allows for the computational design of protein interfaces, which can then be tested in the lab with expected results" . This means the models have captured enough fundamental biology to produce functional designs without requiring iterative wet-lab optimization.
On April 29, 2026, Biohub announced the Virtual Biology Initiative (VBI), a five-year, $500 million commitment to build the multi-modal datasets and AI models needed for predictive models of human cells . Of that funding, $100 million is allocated to coordinating global data-generation efforts, and $400 million is dedicated to generating data at scale and developing next-generation technologies for measuring, imaging, and engineering biology
.
The protein biology release is the first major scientific output under VBI. The initiative's partners span many of the most prominent institutions in biology and technology: the Broad Institute, Allen Institute, Arc Institute, Wellcome Sanger Institute, Human Cell Atlas, Human Protein Atlas, NVIDIA, and Renaissance Philanthropy .
The ESM family did not start at Biohub. It was originally developed at Meta AI's FAIR lab, which published the first ESM-1 models and released the original ESMFold in Science in 2023, generating the first 600-million-plus protein structure predictions . That work produced the original ESM Metagenomic Atlas, which at the time was the largest database of high-resolution predicted structures, roughly three times larger than any existing protein structure database
.
When EvolutionaryScale, the startup formed by the original FAIR ESM team, spun out of Meta, Biohub absorbed and continued the research. This fourth-generation release builds directly on that lineage, with Biohub now leading development as an open philanthropic science venture .
Researchers can experiment with and deploy these tools across multiple platforms:
esm3-sm-open-v1 and ESMC 600M are hosted at huggingface.co/biohub/ under a non-commercial license biohub.org/ai-models for exploring and downloading the models
Comments
0 comments