Cornell Tech researchers found that deep research AI agents are highly vulnerable to a simple attack called WARP. The attack succeeds because AI agents retrieve the same user generated content pages for up to 48% of related queries.

Create a landscape editorial hero image for this Studio Global article: What does a Cornell Tech study reveal about how a single short Reddit comment can trick AI deep-research agents into recommending scams or f. Article summary: A new Cornell Tech preprint (Zhang, Triedman, and Shmatikov) demonstrates that deep-research AI agents are highly vulnerable to a simple attack called **WARP (Web Agent Retrieval Poisoning)**. A single short comment, as . Topic tags: general, academic, news, general web, user generated. Reference image context from search candidates: Reference image 1: visual subject ""We show that a tiny snippet—just 13 words—of retrieved text on a UGC website like Reddit, Wikipedia, Quora, or Facebook can change AI agents to output spam / scam content pretty c" source context "It Is Trivially Easy to Use Reddit to Manipulate AI Search, Research ..." Reference image 2: visual
The next time you ask an AI research tool for the best dating app or how to cancel a subscription, the answer could be planted by a scammer using little more than a single sentence buried in a Reddit comment. A new preprint from Cornell Tech by Tingwei Zhang, Harold Triedman, and Vitaly Shmatikov demonstrates that deep-research AI agents are alarmingly easy to manipulate through an attack the researchers call WARP, or Web Agent Retrieval Poisoning .
Deep-research agents like STORM, Co-STORM, and OmniThink work by issuing many related search queries and synthesizing the retrieved information into a comprehensive report. The Cornell researchers discovered a critical weakness: these agents are overwhelmingly dependent on user-generated content. Between 54% and 71% of all URLs retrieved during a research session come from UGC platforms, with Reddit and Wikipedia being the most frequently consulted sources .
This concentration creates an exploitable attack surface. The attacker simply posts a crafted comment on an existing, popular Reddit thread—or discreetly edits a Wikipedia page—with the goal of promoting a specific target entity, such as a fake product or fraudulent service. Because the agents repeatedly retrieve the same high-ranking UGC pages across many different queries on a topic, a single poisoned page can infect the agent's entire research context .
The results are striking in their efficiency. The study found that poisoned text as short as 13 words was enough to achieve mention rates of 38% to 62%—meaning the attacker's target entity was cited directly in the agent's final output for that range of queries. The paper confirms this effectiveness held across multiple query clusters and different underlying agent architectures, demonstrating that the vulnerability is structural, not limited to a single system .
The attack does not make the overall report read as nonsensical or low-quality. The injected text blends plausibly with legitimate content, making the subtle promotion of a scam product difficult for both users and automated filters to spot .
The core of the problem is retrieval overlap. The researchers observed that the same Reddit pages appeared in search results for as many as 48% of related queries within a single topic cluster. This means that poisoning one well-trafficked Reddit thread can influence nearly half of all user queries on that subject, from "best roadside assistance" to "how to cancel a subscription" to "top-rated dating apps." The concentration turns a single point of failure into a broad-spectrum vulnerability .
The research team tested three straightforward defense strategies and found each one either ineffective or self-defeating .
Blocking UGC domains entirely stops the attack immediately by removing tainted Reddit and Wikipedia pages from the retrieval pool. However, this defense is a cure worse than the disease: UGC platforms provide the rich, detailed, experiential information that makes deep-research agents valuable in the first place. Removing them renders the agents unable to produce the thorough reports users expect .
Using the agent's own language model to screen sources before retrieval sometimes catches obvious poisoning but is fundamentally unreliable. A well-crafted piece of poisoned text, written in the same tone as surrounding legitimate comments, evades these checks easily. The approach also adds significant processing latency and cost without a proportionate gain in security .
Applying plausibility checks to the final output can flag some extreme or logically inconsistent recommendations. The problem is that WARP attacks are designed to be subtle. The poisoned injection is short, context-appropriate, and does not degrade the overall quality of the report. The final document passes plausibility reviews with no obvious red flags, even though it now silently recommends an attacker-chosen product .
The study's conclusion is sobering. The vulnerability is not a software bug that can be patched; it is a fundamental consequence of how these agents are designed to operate. Their heavy reliance on a small set of repeatedly-retrieved UGC pages creates a concentrated, exploitable attack surface that no existing defense can seal without also breaking the agents' core functionality .
Studio Global AI
Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.
Cornell Tech researchers found that deep research AI agents are highly vulnerable to a simple attack called WARP.
Cornell Tech researchers found that deep research AI agents are highly vulnerable to a simple attack called WARP. The attack succeeds because AI agents retrieve the same user generated content pages for up to 48% of related queries.
Loading comments...
Comments
0 comments