On Terminal-Bench 2.1, which tests command-line agentic coding, Opus 4.8 scores 74.6%. It still trails GPT-5.5's 78.2% but jumps substantially ahead of Opus 4.7's 66.1% on the same evaluation . For agentic computer use, Opus 4.8 reaches 83.4% on OSWorld-Verified, edging past Opus 4.7 (82.8%) and GPT-5.5 (78.7%)
.
On knowledge work measured by GDPval-AA, Opus 4.8 scores 1890 Elo, well ahead of GPT-5.5 (1769) and a wide gap over Gemini (1314) . On Humanity's Last Exam for multidisciplinary reasoning, it scores 57.9% with tools—Anthropic's highest general-access result—against 49.8% without tools
.
Anthropic positions Opus 4.8 as a meaningfully more honest model. In the company's own evaluations, it is roughly four times less likely than Opus 4.7 to let coding flaws pass unremarked in its self-assessments .
Rates of misaligned behavior—including deception or cooperation with misuse—are substantially lower than in Opus 4.7 and comparable to Claude Mythos Preview, which Anthropic considers its best-aligned model . This matters for developers who rely on AI to review or generate production code and need a model that flags its own blind spots rather than confidently delivering flawed output.
The most visible user-facing change is a new effort control dial now available on claude.ai and the Cowork interface . Users can select how much computational effort Claude applies to a response across several tiers:
xhigh in Claude Code settings): More thorough reasoning recommended for difficult tasks and long-running workflows.In Claude Code, Anthropic has increased rate limits to accommodate the higher token usage that comes with elevated effort levels . This gives developers finer-grained tradeoffs between latency, cost, and reasoning depth on complex coding and agentic tasks.
For developers tackling very large-scale problems, Anthropic is rolling out dynamic workflows as a research preview within Claude Code for Enterprise, Team, and Max plan subscribers .
The feature lets Claude plan a task, then spawn and run hundreds of parallel subagents in a single session. Outputs are verified before being reported back, which makes the system suitable for codebase-scale migrations across hundreds of thousands of lines of code .
Standard pricing for Opus 4.8 remains exactly what it was for Opus 4.7: $5 per million input tokens and $25 per million output tokens . Prompt caching write and refresh rates stay consistent with the premium Opus tier
.
The more significant pricing shift is on the speed side. Fast mode for Opus 4.8 delivers up to 2.5x faster output token generation and now costs $10 per million input tokens and $50 per million output tokens . That is three times cheaper than fast mode was for Opus 4.6 and Opus 4.7, where it cost $30/$150
. Anthropic has deprecated fast mode for Opus 4.6 and is directing users to migrate to fast mode for Opus 4.8 or 4.7
.
To use fast mode via the API, developers set speed: "fast"claude-opus-4-8 and include the fast-mode-2026-02-01 beta header . The feature is priced as a multiplier on standard rates across the full 200k+ input token context window, and it stacks with prompt caching and data residency multipliers
.
The model is available today through the Claude API using the alias claude-opus-4-8, and it is supported in fast mode, prompt caching, and batch processing configurations . Anthropic's API documentation and platform release notes confirm that customers on Claude for Pro, Max, Team, and Enterprise plans can access Opus 4.8 immediately
.
Alongside Opus 4.8, Anthropic sharpened its language on the timeline for making Mythos-class models generally available. Since April 7, 2026, Claude Mythos Preview has been restricted to roughly 50 defensive-security partners through Project Glasswing . The model's offensive cybersecurity capability was strong enough that Anthropic withheld it from public release
.
On May 28, Anthropic updated its public framing to say it plans to release Mythos-class models to all customers in the coming weeks . This is the most explicit timeline the company has given since it launched Glasswing. It also disclosed a $65 billion Series H round at a $965 billion post-money valuation, reinforcing that the commercial path for Mythos-class models is moving from research preview toward broad availability
.
Claude Opus 4.8 is an incremental model upgrade that makes three practical differences for builders:
For teams evaluating whether to switch, the strongest signal is to test Opus 4.8 directly on your own coding, agentic, and knowledge-work tasks—particularly for long-running sessions where self-correction, uncertainty flagging, and parallel subagent orchestration matter most.
Comments
0 comments