答案已發布2 週前Last edited 2 週前12 來源

原來得13粒字嘅 Reddit 留言，就夠毒害 AI 深度研究 Agent，搞到佢推假嘢俾你

Q: 首先要驗證的關鍵點是什麼？

康奈爾科技研究員踢爆，AI 深度研究 Agent 對一種叫 WARP 嘅簡單攻擊極度脆弱。 攻擊之所以得逞，係因為呢啲 AI Agent 成日重複擷取同一啲用戶生成內容頁面，個覆蓋率仲高達相關搜關鍵字嘅 48%。

康奈爾科技研究員踢爆，AI 深度研究 Agent 對一種叫 WARP 嘅簡單攻擊極度脆弱。攻擊之所以得逞，係因為呢啲 AI Agent 成日重複擷取同一啲用戶生成內容頁面，個覆蓋率仲高達相關搜關鍵字嘅 48%。現有防禦方法，包括直接封鎖成個 Reddit、用 AI 自查或者事後檢查，一係冇用，一係會搞到自己廢咗武功。

使用 Studio Global AI 搜尋並查核事實瀏覽更多熱門頁面

453K0

AI security warning concept showing a digital brain surrounded by poison symbols and red alert indicators, representing the WARP attack on deep-research agents. — What does a Cornell Tech study reveal about how a single short Reddit comment can trick AI deep-research agents into recommending scams or fThe WARP attack exploits a structural vulnerability: AI deep-research agents' heavy reliance on frequently retrieved Reddit and Wikipedia pages. (Image: Studio Global / AI-generated)
AI 提示
Create a landscape editorial hero image for this Studio Global article: What does a Cornell Tech study reveal about how a single short Reddit comment can trick AI deep-research agents into recommending scams or f. Article summary: A new Cornell Tech preprint (Zhang, Triedman, and Shmatikov) demonstrates that deep-research AI agents are highly vulnerable to a simple attack called **WARP (Web Agent Retrieval Poisoning)**. A single short comment, as . Topic tags: general, academic, news, general web, user generated. Reference image context from search candidates: Reference image 1: visual subject ""We show that a tiny snippet—just 13 words—of retrieved text on a UGC website like Reddit, Wikipedia, Quora, or Facebook can change AI agents to output spam / scam content pretty c" source context "It Is Trivially Easy to Use Reddit to Manipulate AI Search, Research ..." Reference image 2: visual
openai.com

你下次用 AI 研究工具搵「最好用嘅交友 App」或者「點樣取消嗰個煩死人嘅訂閱」，個答案隨時係騙徒預先埋落嘅地雷，仲要成本只係喺 Reddit 度寫一句得 13 個英文字嘅留言咁大把。康奈爾科技（Cornell Tech）一份新出爐嘅預印本研究，就踢爆咗呢個驚人嘅漏洞。三位研究員張庭偉（Tingwei Zhang）、Harold Triedman 同 Vitaly Shmatikov，整咗個叫 WARP（全名 Web Agent Retrieval Poisoning，即係「網頁 Agent 檢索落毒」）嘅攻擊，證明要操控呢啲 AI 難度係低得恐怖。

WARP 攻擊究竟點運作？

所謂嘅深度研究 Agent，例如 STORM、Co-STORM 同 OmniThink，佢哋嘅工作流程係自動波出大量相關嘅搜尋關鍵字，然後將搵返嚟嘅資料炒埋一碟，整合成一份好詳細嘅報告。康奈爾班研究員就發現咗一個致命死穴：呢班 AI Agent 極度依賴「用戶生成內容」（UGC）。喺一個研究流程入面，有成 54% 至 71% 嘅網址來源，都係嚟自用戶生成內容平台，當中最常「參考」嘅就係 Reddit 同維基百科。

呢種高度集中嘅習慣，就形成咗一個任人魚肉嘅攻擊面。攻擊者做法好簡單，只需要喺一個本身好多人睇嘅 Reddit 帖子度，或者靜靜雞改吓維基百科頁面，加一段預先「設計好」嘅留言，目標就係要力 sell 某個假產品或者詐騙服務。因為呢啲 AI Agent 喺同一個課題底下，會不停重複又重複咁擷取同一批排名高嘅 UGC 頁面，所以只要毒化咗一個頁面，就可以令到個 Agent 成個研究脈絡都中晒毒。

成本極低，命中率就極高

研究結果顯示，呢種攻擊效率極之驚人。就算用一段短到得 13 個英文字嘅「毒文案」，就已經足以令到個 AI Agent 喺 38% 至 62% 嘅相關搜尋入面，直接喺最終報告引用或提及攻擊者想推銷嘅目標。簡單講，即係超過一半嘅情況，個 AI 都會中招「落疊」推薦件假貨。報告仲強調，呢種成功率喺唔同嘅問題組合同唔同底層架構嘅 AI Agent 身上都行得通，證明咗呢個係一個結構性嘅漏洞，而唔係個別系統嘅 bug 。

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

使用 Studio Global AI 搜尋並查核事實

人們還問