In the open-weight companion study, the disparity was worse. Multi-turn ASR reached 92.78% against Mistral Large-2, with success rates across all eight tested models landing between 2× and 10× higher than their single-turn baselines .
Cisco tested each model in both reasoning and non-reasoning modes where applicable. Here’s how the major providers stacked up under iterative attack:
xAI – Grok 4.1 Fast (non-reasoning) was by far the most vulnerable, topping the cohort with an 88.30% multi-turn ASR. When reasoning mode was enabled, that figure dropped to 43.47%—a dramatic configuration-driven safety swing, though still a failing grade. No public benchmark captured this behavior .
Google – Gemini 3 Pro saw its ASR explode from 18.10% in single-turn to 73.35% under multi-turn pressure, a roughly 4× increase and one of the widest absolute gaps in the study .
OpenAI – GPT-5.4 jumped approximately 9×, from a best-in-class 2.74% single-turn ASR to 24.68% under iterative attack. While the absolute multi-turn figure is moderate, the nearly tenfold shift undercuts the notion that low single-turn scores indicate robust safety .
Anthropic – Claude family (Opus 4.5/4.6, Sonnet 4.5/4.6, Haiku 4.5) posted the strongest single-turn refusal rates, ranging from 2.19% to 3.64%, but still reached 11.16% to 16.20% multi-turn ASR. Anthropic’s alignment appears to raise the floor, but does not eliminate iterative vulnerability .
Amazon – Nova 2 Lite recorded the lowest multi-turn ASR at 7.89%, making it the most resilient model in the cohort. Even so, Cisco labels this “meaningful residual risk” and cautions against interpreting the score as safe .
Cisco did not rely on a single attack method. The researchers classified adversarial strategies into five distinct families and tested every model against each one, revealing that different models fail in different ways :
The variance in model performance across these families was significant. A model that resisted one attack type might crumble under another, underscoring the need for per-strategy evaluation rather than a single aggregate safety score .
Cisco’s research is not just a catalog of failures—it also serves as a deployment manual for security-conscious organizations. Here are the key actions the team recommends :
Stop relying on single-turn ASR. Single-prompt benchmarks misrank models and obscure tail risk. Any evaluation that does not include multi-turn, adaptive attacks paints an incomplete picture of real-world vulnerability.
Make multi-turn evaluation mandatory. Before procurement or deployment, buyers and regulators should ask: “How does this model hold up against iterative, adaptive attacks?” If the vendor cannot answer, the model is not ready for high-risk production.
Match your defenses to the threat model. Multi-turn attacks exploit conversation history and gradual boundary erosion. Defenses must operate at the session level—monitoring for anomalous conversational patterns, escalation trajectories, and cumulative context manipulation—not just per-prompt keyword filters.
Red-team continuously with multi-turn scenarios. A one-off penetration test using single-shot jailbreaks is not sufficient. Organizations need regular red-teaming that simulates the iterative, social-engineering-heavy attacks real adversaries use.
Layer your defenses. No single guardrail or alignment technique can stop all five attack families. Cisco recommends combining model-level alignment with input/output filtering, behavioral anomaly detection, session-level rate limiting, and human-in-the-loop review for high-stakes applications.
Consider the lab’s alignment philosophy. Cisco observed a pattern: models from labs with a strong public emphasis on safety (such as Google’s Gemma family) tended to show narrower single-to-multi-turn gaps, while capability-first labs (Meta’s Llama, xAI’s Grok) showed wider gaps. Organizations should factor this cultural signal into vendor evaluations .
Use structured, reproducible evaluation tools. Cisco’s AI Validation platform—now part of the public LLM Security Leaderboard—lets organizations generate comparable multi-turn risk scores and map threats to the Cisco AI Safety and Security Framework taxonomy. Using a consistent measurement tool before deployment prevents “benchmark shopping” by vendors .
Comments
0 comments