AI Agents Are Failing at Basic Biology: The Data Plumbing Crisis
A landmark study by Anthropic, NCBI, the Broad Institute, and the Chan Zuckerberg Initiative found that top AI models fail catastrophically at retrieving viral sequence data, with accuracy as low as 16.9%, because pub... The underlying problem is that biological data infrastructure lacks deterministic, reproducible...
What do researchers from Anthropic, NCBI, the Broad Institute, and the Chan Zuckerberg Initiative reveal about why AI agents fail at retrievThe gap between AI and biology is not a failure of intelligence but of infrastructure — a lesson made clear by new research from Anthropic and leading scientific institutions.
AI Prompt
Create a landscape editorial hero image for this Studio Global article: What do researchers from Anthropic, NCBI, the Broad Institute, and the Chan Zuckerberg Initiative reveal about why AI agents fail at retriev. Article summary: In a collaboration between Anthropic, NCBI, the Broad Institute, and the Chan Zuckerberg Initiative (CZI), researchers demonstrated that state-of-the-art AI agents fail at retrieving biological data from public databases. Topic tags: general, government, academic, general web, user generated. Reference image context from search candidates: Reference image 1: visual subject "# Artificial Intelligence agents for biological research: a survey. A **.gov** website belongs to an official government organization in the United States. Inclusion in an NLM data" source context "Artificial Intelligence agents for biological research: a survey - PMC" Reference image 2: vis
openai.com
A blockbuster collaboration between Anthropic, NCBI, the Broad Institute, and the Chan Zuckerberg Initiative has exposed a dirty secret of AI-driven science: today's most powerful AI agents are utterly unreliable for a task as simple as fetching viral DNA sequences from a public database. The research, published in June 2026, found that models like Claude Sonnet 4 achieved as low as 16.9% accuracy on this routine job. But the culprit isn't the AI's intelligence — it's the plumbing. The infrastructure was designed for humans clicking through web forms, not autonomous agents. By building a deterministic retrieval layer called gget virus, the team hit nearly 100% accuracy instantly, proving that fixing the data pipes is the fastest path to trustworthy AI biology .
Why AI agents crash on biological databases
Laura Luebbert and her colleagues framed the issue with a powerful analogy: using an AI agent to navigate biological data is like driving a modern car through a medieval city. The car is technically advanced, but the roads were never designed for it .
The collaboration tested several leading AI systems — Claude, GPT-based models, Biomni Open Source, and Edison Analysis — on the seemingly straightforward task of retrieving viral sequence data from NCBI Virus, a go-to resource for virologists tracking outbreaks and developing diagnostics . The results were alarming.
Studio Global AI
Search, cite, and publish your own answer
Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.
What is the short answer to "AI Agents Are Failing at Basic Biology: The Data Plumbing Crisis"?
A landmark study by Anthropic, NCBI, the Broad Institute, and the Chan Zuckerberg Initiative found that top AI models fail catastrophically at retrieving viral sequence data, with accuracy as low as 16.9%, because pub...
What are the key points to validate first?
A landmark study by Anthropic, NCBI, the Broad Institute, and the Chan Zuckerberg Initiative found that top AI models fail catastrophically at retrieving viral sequence data, with accuracy as low as 16.9%, because pub... The underlying problem is that biological data infrastructure lacks deterministic, reproducible interfaces — forcing AI agents to navigate inconsistent web forms and causing the same query to return 106, 15, or 5 resu...
What should I do next in practice?
The implications go far beyond viruses: the team argues NCBI's 30+ databases need agent native rebuilds, and CZI is separately pushing for federated, AI scale data to power the next generation of computational biology.
NCBI Virus and many other public biological databases were built for interactive, browser-based workflows. Scientists click through filters, manually inspect results, and rely on visual cues. This interface logic is incompatible with autonomous agents that expect structured, programmatic commands .
Radically non-deterministic results
The most damning finding was inconsistency. When researchers asked Claude Sonnet 4 three times to retrieve Ebolavirus sequences against a verified ground truth of 266, it returned 106 on the first try, 15 on the second, and just 5 on the third. No prompts changed — only the output did .
This isn't just about missing a few records. In one simulation, a faulty retrieval skewed a phylogenetic analysis so badly that it estimated the origin of an Ebola outbreak as 1922 instead of the correct 2014 date. The AI hadn't hallucinated the science — it had been fed a broken dataset and dutifully built a false conclusion on top of it .
Brittle, fragmented infrastructure
Biological data is scattered across dozens of databases with incompatible identifiers, different metadata standards, and no version-controlled APIs. Software engineers rely on package managers and versioned endpoints; computational biologists are often stuck scripting against inconsistent web interfaces that change without notice .
The deterministic fix: gget virus
Rather than training a better model, the team built a better retrieval layer. gget virus is a lightweight, deterministic framework that formalizes the filtering logic of NCBI Virus into a reproducible programmatic system .
It works by applying metadata constraints before downloading sequences, selectively fetching only the structured GenBank records that match, and reducing data transfer by over 98% for high-volume queries while preserving exact-match semantics. The result is the same dataset every time — a property that AI agents desperately need but the old infrastructure couldn't deliver .
The impact was immediate and dramatic. When autonomous AI systems used gget virus as their retrieval backend:
Accuracy jumped to at least 90.0% for all tested models, with GPT-5.5 reaching 99.7%.
Stability metrics rose to 0.92–1.00 across the board.
Error magnitude, especially the catastrophic kind that shifts scientific conclusions, collapsed .
The takeaway is unambiguous: the binding constraint on AI-powered biology is not model reasoning — it's deterministic data access. Add the right retrieval layer, and today's agents can already do reliable work .
Rethinking biological data infrastructure for the agent era
The gget virus success story is a proof of concept for a much larger shift. The researchers argue this pattern is not limited to virology — NCBI alone hosts over 30 databases that would benefit from similar deterministic wrappers .
From human-oriented to agent-native design
Biological databases must evolve to expose well-documented, versioned APIs with standardized filtering and reproducible query semantics. This is the equivalent of what software developers get from package managers and version control systems — critical infrastructure that biological science currently lacks .
The push for federated, AI-scale data
In a parallel effort, the Chan Zuckerberg Initiative published a roadmap calling for interoperable, pooled biological datasets that can be queried via command-line interfaces and machine-readable standards. Their vision: a world where scientists can search, analyze, and download multi-modal data in a single federated query, enabling AI-scale discovery without the current retrieval chaos .
CZI is already acting on this, developing a CLI for federated data access and building the Billion Cells Project, a landmark single-cell dataset intended to train next-generation AI models. The goal is foundational infrastructure that makes biological data as accessible to machines as code repositories are to developers .
The lesson isn't limited to biology
The core insight — that legacy human-first interfaces break AI agents —generalizes across scientific computing. Deterministic, programmatic access layers aren't a luxury; they are a prerequisite for letting autonomous systems participate reliably in research. The fix isn't waiting for a smarter model. It's upgrading the roads.
arxiv.org[PDF] A path towards AI-scale, interoperable biological data - arXiv
Comments
0 comments