Anthropic publicly released Claude Opus 4.8 on May 28, 2026, positioning it as its most capable generally available model . The update directly succeeds Opus 4.7 and targets coding, long-running agentic tasks, and enterprise reliability. Standard pricing carries over unchanged, while a significantly cheaper Fast mode and new workflow tools accompany the release
.
The most widely cited performance comparison is on the SWE-Bench Pro agentic coding benchmark. According to Anthropic's evaluation data, Opus 4.8 scored a leading 69.2%, compared to 64.3% for its predecessor, 58.6% for OpenAI's GPT-5.5, and 54.2% for Google's Gemini 3.1 Pro .
On the broader agentic coding suite, GPT-5.5 still holds a lead in specific areas. On the Terminal-Bench 2.1 agentic terminal coding evaluation, GPT-5.5 scored 78.2%, ahead of Opus 4.8's 74.6% and Gemini 3.1 Pro's 70.3% .
Anthropic's internal benchmarks also report gains on knowledge-work tasks. The model achieved a score of 1890 on the GDPval-AA evaluation for economically valuable knowledge work, compared to GPT-5.5's 1769 and Gemini's 1314 . Across the full suite, Anthropic claims Opus 4.8 outperforms both rival models in several key categories, though it does not lead every single test
.
In a shift from pure raw intelligence benchmarks, Anthropic heavily emphasized improvements in model trustworthiness. The company reported that Opus 4.8 is approximately four times less likely than Opus 4.7 to allow flaws in its own generated code to pass unremarked .
Early tester feedback highlighted that the model is significantly more likely to flag uncertainty and less prone to making unsupported claims during complex, multi-step workflows . The company directly framed "honesty" as a marquee product feature in this release, stating the model is less likely to present insufficiently supported information as factual
.
Alongside the base model, Anthropic launched new user-facing features specifically for developers and power users .
Dynamic Workflows: Available as a research preview in Claude Code, this feature lets the model plan a task, orchestrate it across hundreds of parallel subagents, and verify the results before reporting back. It is designed for massive-scale code migrations, auditing, and bug hunting within a single session .
Adjustable Engagement / Effort Control: Users can now dictate the model's depth of reasoning. The "effort" parameter on claude.ai and Claude Code allows a trade-off between intelligence, token cost, and speed. The documentation recommends using the xhigh level for the most difficult coding and agentic use cases, and a minimum of high for other intelligence-sensitive tasks .
Pricing for regular API usage remains flat from the previous generation .
Prompt caching rates are set at $6.25 per million tokens for 5-minute cache writes, $10 per million tokens for 1-hour cache writes, and $0.50 per million tokens for cache hits and refreshes .
The Opus 4.8 release is not a pure raw benchmark bump; it is a targeted enterprise and developer upgrade. The product story centers on reliability for agents, explicit uncertainty handling, and giving programmers control over cost-performance trade-offs through explicit effort levels. The pricing story remains conservative, with no increase for standard API calls, while the Fast mode price drop makes high-speed inference more accessible for latency-critical applications.
Studio Global AI
Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.
Claude Opus 4.8 scored 69.2% on the SWE Bench Pro agentic coding benchmark, outperforming GPT 5.5 at 58.6% and Gemini 3.1 Pro at 54.2%, but GPT 5.5 still leads on terminal based coding tasks.
Claude Opus 4.8 scored 69.2% on the SWE Bench Pro agentic coding benchmark, outperforming GPT 5.5 at 58.6% and Gemini 3.1 Pro at 54.2%, but GPT 5.5 still leads on terminal based coding tasks. Standard pricing remains unchanged at $5 per million input tokens and $25 per million output tokens, while a new Fast mode costs $10 per million input and $50 per million output tokens—roughly 2.5x faster and 3x cheap...
The launch emphasizes reliability and honesty improvements, with the model being four times less likely to miss code flaws, alongside new features like dynamic workflows and user controlled effort levels.
Loading comments...
Comments
0 comments