答案已發布2 個月前Last edited 上個月19 來源

Cisco研究踢爆：前沿AI模型喺多輪攻擊下全線失守

Cisco今年5月嘅 Proprietary Problems 報告指出，冇任何前沿AI模型頂得住多輪迭代攻擊，成功率由7.89%到88.30%不等——同單次測試嘅「靚仔」成績表完全兩個世界。 xAI嘅Grok 4.1 Fast（非推理模式）係「豆腐渣」之王，攻擊成功率高達88.30%；Amazon Nova 2 Lite最硬淨，但佢嗰7.89%嘅「殘餘風險」，Cisco都話唔可以當冇事。

使用 Studio Global AI 搜尋並查核事實瀏覽更多熱門頁面

Conceptual AI-generated illustration symbolizing a frontier AI model under persistent multi-turn adversarial attack, with layered prompts chipping away at a digital shield. — Which frontier AI models are most vulnerable to multi-turn adversarial attacks, what attack strategy families were identified, and what recoCisco's adversarial testing reveals that even the most advanced AI safety shields can be eroded by iterative, multi-turn conversational attacks.
AI 提示
Create a landscape editorial hero image for this Studio Global article: Which frontier AI models are most vulnerable to multi-turn adversarial attacks, what attack strategy families were identified, and what reco. Article summary: Cisco's May 2026 research, published as *Proprietary Problems* with a companion open-weight study *Death by a Thousand Prompts*, tested 15 closed flagship models and eight open-weight models against both single-turn and . Topic tags: general, academic, general web. Reference image context from search candidates: Reference image 1: visual subject "### Cisco report finds no closed frontier AI model is safe from multi-turn attacks. A new report out today from Cisco Systems Inc. argues that none of the closed flagship large lan" source context "Cisco report finds no closed frontier AI model is safe from multi-turn attacks - SiliconANGLE" Reference image 2: visual s
openai.com

業界普遍用嚟衡量AI模型安全性嘅基準測試，有個結構性嘅先天缺陷：佢哋假設單一次惡意提示同單一次模型回應，就足以反映模型喺現實世界中對抗黑客嘅能力。Cisco嘅AI威脅研究團隊喺2026年5月發表嘅一份報告 Proprietary Problems，就徹底打破咗呢個迷思。

佢哋用咗雙軌制嘅評估方式，測試嚟自OpenAI、Anthropic、Google、Amazon同xAI嘅15款旗艦閉源模型，用咗超過30,000個單次提示同近7,000次多輪迭代攻擊（橫跨逾 1,400次對話），得出一個斬釘截鐵嘅結論：冇任何一個前沿模型，喺迭代攻擊下係安全嘅。 單次攻擊成功率（ASR）根本唔可以準確預測，當黑客可以喺對話中不斷調整策略嗰陣，實際會發生咩事。

呢個發現，同佢哋早前另一份針對開源模型嘅研究報告 Death by a Thousand Prompts （死於千次提示）一脈相承。嗰份報告揭露咗開源模型更加脆弱，多輪攻擊成功率最高達到92.78%，普遍比單次攻擊高出2到10倍。兩份報告加埋，成為業界至今最全面嘅「前沿模型抗壓力測試」。

呃人嘅「單次測試」幻象

單次同多輪測試之間嘅安全差距，好多時大到嚇死人。喺閉源模型測試中，多輪攻擊成功率由7.89%去到88.30%不等，但同樣呢批模型嘅單次攻擊成功率只係介乎2.19%至64.91%。 15個模型入面，有8個嘅絕對差距超過。講白啲，即係一啲喺單次測試中俾人歸類為「安全」嘅模型，喺持續壓力下好多時都頂唔住。

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

使用 Studio Global AI 搜尋並查核事實

人們還問