AntwortenÖffentliche URL25. Apr. 202618 Quellen

Was sind die Benchmarks von Claude Opus 4.7?

Die öffentlich auffindbaren Benchmark Werte für Claude Opus 4.7 sind nach den gefundenen Quellen vor allem 87,6 % auf SWE bench Verified und 94,2 % auf GPQA. Anthropic beschreibt Opus 4.7 außerdem als sein stärkstes allgemein verfügbares Modell; ein weiterer Treffer nennt 80,5 % auf SWE bench Multilingual. [5][6][1][15

Wichtige Erkenntnisse

Die öffentlich auffindbaren Benchmark-Werte für Claude Opus 4.7 sind nach den gefundenen Quellen vor allem 87,6 % auf SWE-bench Verified und 94,2 % auf GPQA. Anthropic beschreibt Opus 4.7 außerdem als sein stärkstes allgemein verfügbares Modell; ein weiterer Treffer nennt 80,5 %
**SWE-bench Verified:** 87,6 % [5][6]

Forschungsantwort

Die öffentlich auffindbaren Benchmark-Werte für Claude Opus 4.7 sind nach den gefundenen Quellen vor allem 87,6 % auf SWE-bench Verified und 94,2 % auf GPQA. Anthropic beschreibt Opus 4.7 außerdem als sein stärkstes allgemein verfügbares Modell; ein weiterer Treffer nennt 80,5 % auf SWE-bench Multilingual. ^[5]^[6]^[1]^[15]

SWE-bench Verified: 87,6 % ^[5]^[6]
GPQA: 94,2 % ^[6]
SWE-bench Multilingual: 80,5 % ^[15]

Wichtig:

Der Wert für SWE-bench Verified wird in mehreren Treffern konsistent genannt. ^[5]^[6]
GPQA 94,2 % habe ich in den Suchergebnissen nur in einem Benchmark-Zusammenfassungsartikel klar gesehen, nicht direkt im Snippet der offiziellen Anthropic-Seite. ^[6]
Für eine vollständige Benchmark-Tabelle direkt aus der offiziellen Release-Seite liegen mir über die Such-Snippets hinaus hier keine vollständig ausgelesenen Primärdaten vor. Wenn du willst, kann ich dir als Nächstes auch eine Einordnung gegen GPT, Gemini und andere Claude-Modelle machen.

Unterstützende Visuals

Claude Opus 4.7 results: early benchmarks, real-world feedbackA line graph compares the agentic coding performance of Anthropic's Claude models, Opus 4.7 and Opus 4.6, across different effort levels, showing that Opus 4.7 outperforms Opus 4.6 at all effort levels with higher scores and markers labeled from "low" to "max" and "xhigh."

Anthropic Claude Opus 4.7: 7 Critical Facts About the New FlagshipA horizontal bar chart displays the AI models in the Claude lineup ranked by their Intelligence Index scores, with Claude Opus 4.6 and 4.7 achieving the highest scores around 53 and 51.7, respectively.

Introducing Claude Opus 4.7 \ AnthropicA comparative table displaying performance metrics of different AI models, highlighting the advancements of Anthropic's Claude Opus 4.7 over previous versions and competitors, with specific emphasis on its high score of 93.9% in Agentic coding.

Anthropic Promised Claude Opus 4.7 Would Change EverythingThe image displays a comparison table of benchmark scores for Anthropic's Claude Opus 4.7 and 4.6 models across various performance metrics and tests.

Claude Opus 4.7: Benchmarks, Breaking Changes, Migration Guide

Anthropic just released the most powerful AI model you canAnthropic just released the most powerful AI model you can ...

Claude Opus 4.7 dropped this week and the benchmark numbers areThe image displays a comparison table highlighting the performance of Claude Opus 4.7 on various benchmarks and skills, with scores spanning from 40% to over 90%, alongside other models like GPT-5.4 and Gemini 3.1 Pro.

Opus 4.7 scores near frontier on ECI - Epoch AIA bar chart compares the performance of Claude Opus 4.7, Opus 4.6, and GPT-5.4 on the GPQA benchmark, showing that Claude Opus 4.7 outperforms the others with a higher accuracy percentage.

Claude Opus 4.7 Benchmark Full Analysis: Empirical Data LeadingThe image displays benchmark results for the Claude Opus 4.7 model released in April 2026, highlighting its leading performance in seven major rankings, including SW-E-bench, SW-E-bench Pro, GPQA Diamond reasoning, and Vision multimodal, with notable accuracy percentages and performance improvements.

Claude Opus 4.7: benchmarks, features, and migration guide (AprilA comparison table displaying benchmark performance percentages for different AI models and tools, including Opus 4.7, Opus 4.6, GPT-5.4, Gemini 3.1 Pro, and Mythos Preview across various features such as coding, reasoning, search, and specialized tasks.

Claude Opus 4.7 just scored 87.6% on SWE-bench — the hardestClaude Opus 4.7 just scored 87.6% on SWE-bench — the hardest ...

Forschungsgespräch

Du25. Apr. 2026

Forschungsfrage

Was sind die Benchmarks von Claude Opus 4.7?

Studio Global AI25. Apr. 202618 Quellen

Zitierte Antwort

SWE-bench Verified: 87,6 % ^[5]^[6]
GPQA: 94,2 % ^[6]
SWE-bench Multilingual: 80,5 % ^[15]

Wichtig:

Der Wert für SWE-bench Verified wird in mehreren Treffern konsistent genannt. ^[5]^[6]
GPQA 94,2 % habe ich in den Suchergebnissen nur in einem Benchmark-Zusammenfassungsartikel klar gesehen, nicht direkt im Snippet der offiziellen Anthropic-Seite. ^[6]
Für eine vollständige Benchmark-Tabelle direkt aus der offiziellen Release-Seite liegen mir über die Such-Snippets hinaus hier keine vollständig ausgelesenen Primärdaten vor. Wenn du willst, kann ich dir als Nächstes auch eine Einordnung gegen GPT, Gemini und andere Claude-Modelle machen.

Die Leute fragen auch

Wie lautet die kurze Antwort auf „Was sind die Benchmarks von Claude Opus 4.7?“?

Was sind die wichtigsten Punkte, die zuerst validiert werden müssen?

Welches verwandte Thema sollte ich als nächstes untersuchen?

Fahren Sie mit „Was sind die Benchmarks von Claude Mythos?“ für einen anderen Blickwinkel und zusätzliche Zitate fort.

Zugehörige Seite öffnen

Womit soll ich das vergleichen?

Vergleichen Sie diese Antwort mit „Vergleiche die Benchmarks von DeepSeek V4, Kimi K2.6, Claude Opus 4.7 und GPT-5.5.“.

Zugehörige Seite öffnen

Setzen Sie Ihre Recherche fort

Was sind die Benchmarks von Claude Mythos?

Vergleiche die Benchmarks von DeepSeek V4, Kimi K2.6, Claude Opus 4.7 und GPT-5.5.

Suche & Faktencheck: Welche KI ist besser: ChatGPT, Gemini, Claude, Copilot oder Perplexity?

Suche & Faktencheck: Was ist ChatGPT und wie funktioniert es?

Quellen

[1] Anthropic releases Claude Opus 4.7, narrowly retaking lead for most ...venturebeat.com
Anthropic is publicly releasing its most powerful large language model yet, Claude Opus 4.7, today — as it continues to keep an even more powerful successor, Mythos, restricted to a small number of external enterprise partners for cybersecurity testing and patching vulnerabilities in the software said enterprises use (which Mythos exposed rapid…
[2] Anthropic's Claude Opus 4.7 Beats GPT-5.4 in Coding Benchmark - iClarifiediclarified.com
Anthropic has launched Claude Opus 4.7, its latest flagship model that brings a notable improvement in advanced software engineering and upgraded high-resolution vision. Coming two months after the release of Claude Sonnet 4.6, the new model builds on that foundation with stronger performance on complex, long-running tasks that previously required closer supervision. For developers on the Mac, the model is well-suited for use with Apple's [Xcode 26.3](https://www.iclarified.com/10…
[3] Claude Opus 4.7 Benchmarks Explained - Vellumvellum.ai
- Coding capabilities. * SWE-bench Verified. * SWE-bench Pro. * Terminal-Bench 2.0. * Agentic capabilities. * [MCP-Atlas (Scaled tool use)](https://www.vellum.ai/blog/claud…
[4] Claude Opus 4.7: Benchmarks, Breaking Changes, Migration Guide | Rabinarayan Patrarabinarayanpatra.com
Claude Opus 4.7 ships 87.6% on SWE-bench Verified, a new tokenizer, xhigh effort, and four API breaking changes. create( model="claude-opus-4-7", model = "claude-opus-4-7 ", max_tokens=64000, max_tokens = 64000, output_config={"effort": "xhigh"}, output_config ={" effort ": " xhigh "}, messages=[{"role": "user", "content": "Refactor this service layer."}], messages =[{" role ": " user ", " content ": "Refactor this service layer. create( model="claude-opus-4-7", model = "claude-opus-4-7 ", max_tokens=128000, max_tokens = 128000, output_config={ output_config ={ "effort": "high", " effort ": "…
[5] Claude Opus 4.7: Benchmarks, Pricing, Context & What's Newllm-stats.com
Claude Opus 4.7: Benchmarks, Pricing, Context & What's New. Claude Opus 4.7 scores 87.6% on SWE-bench Verified, 94.2% on GPQA, 1M token context, 3.3x higher-resolution vision, new xhigh effort level. Claude Opus 4.7 is a direct upgrade to Opus 4.6 at the same price ($5/$25 per million tokens), with 87.6% on SWE-bench Verified (+6.8pp), a new xhigh effort level, 3.3x higher-resolution vision, and self-verification on long-running agentic tasks. It's a direct upgrade to Opus 4.6 at the same price ($5 / $25 per million input / output tokens), with meaningful gains on the hardest software e…
[6] Claude Opus 4.7: Pricing, Benchmarks & Context Window - ALM Corpalmcorp.com
Claude Opus 4.7 is Anthropic’s latest generally available Opus model, and the release matters for a simple reason: it is not just another benchmark update. Opus 4.7 keeps the same list price as Opus 4.6, adds stronger performance on hard coding and agentic workflows, improves high-resolution vision, introduces a new xhigh effort level, and uses an updated tokenizer that can increase token counts for the same input. It is positioned as a premium model for advanced coding, long-running agentic tasks, document-heavy reasoning, high-resolution visual understanding, and professional workflows th…
[7] Introducing Claude Opus 4.7 - Anthropicanthropic.com
Skip to main content Skip to footer. . Developers can use claude-opus-4-7 via the Claude API. ![Image 3: logo](https://www-cdn.anthropic.com/images/4zrzovbb/website/fabc67a6a0069ebc13b12f317401…
[8] Anthropic launches Claude Opus 4.7 with improved benchmark ...msn.com
Claude Opus 4.7 delivers broad gains over Opus 4.6 across key benchmarks and remains competitive with rival AI models such as ChatGPT and
[9] Anthropic Launches Claude Opus 4.7 With Higher ...binance.com
Anthropic launched Claude Opus 4.7, with SWE-bench Multilingual rising to 80.5% from 77.8% for Opus 4.6. Anthropic said the updated
[10] Anthropic releases Claude Opus 4.7: How to try it, benchmarks, safetymashable.com
Anthropic has been shipping products and making news at a blistering pace in 2026, and on Thursday, the AI company announced the launch of Claude Opus 4.7. Notably, Anthropic said in a press release that Opus 4.7 is not as powerful as Claude Mythos, which Anthropic deemed too dangerous for public release. Until the [announcement of Claude Mythos](https://mashable.com/artic…
[11] Claude Opus 4.7medium.com
Claude Opus 4.7 Just Dropped — The Benchmarks Are Real, But Three Breaking Changes Will Catch You Off Guard | by Tihomir Manushev | Apr, 2026 | Medium. Sitemap. Open in app. [Sign in](https://medium.com/m/signin?operation=login&redirect=https%3A%2F%2Fmedium.com%2F%40tihomir.manushev%2Fclaude-opus-4-7-3caecb6f985d&source=post_page---top_nav_layout_nav----------------------…
[12] Creator Insidefacebook.com
Creator Inside - Official Benchmark of Anthropic's latest... Log In. Forgot Account?. ## Creator Inside's Post. [](https://www.facebook.com/stories/390329817201821/UzpfSVNDOjE3MjI0Nzk5MzkxMjM2MjQ=/?view_single=false&__cft__[0]=AZY0YlBfqAwq68-JVIeDTO9IOSHhBpFtJpQNb3YTgCO6jKy2D4wz-mMJFZxwuQ_OIkXEKfzUt…
[13] Claude 3.7 Sonnet and Claude Code - Anthropicanthropic.com
We’ve developed Claude 3.7 Sonnet with a different philosophy from other reasoning models on the market. Third, in developing our reasoning models, we’ve optimized somewhat less for math and computer science competition problems, and instead shifted focus towards real-world tasks that better reflect how businesses actually use LLMs. Early testing demonstrated Claude’s leadership in coding capabilities across the board: Cursor noted Claude is once again best-in-class for real-world coding tasks, with significant improvements in areas ranging from handling complex codebases to advanced tool use…
[14] Introducing Claude 4 - Anthropicanthropic.com
Skip to main content Skip to footer. . * Extended thinking with tool use (beta): Both models can use tools—like web search—during extended thinking, allowing Clau…
[15] Introducing Claude Sonnet 4.5 - Anthropicanthropic.com
Skip to main content Skip to footer. . This is the most aligned frontier model we’ve ever released, showing large improvements across several areas of alignment compared to previous Claude models. ![Image 2: Chart showing frontier model performance on SWE-bench Veri…
[16] [PDF] Claude Opus 4.5 System Card - Anthropicanthropic.com
It then describes a wide range of safety evaluations: tests of model safeguards, honesty, and agentic safety; a comprehensive alignment assessment including investigations of sycophancy, sabotage capability, evaluation awareness, and many other factors; a model welfare report; and a set of evaluations mandated by our Responsible Scaling Policy. Our capabilities evaluations showed that Claude Opus 4.5 is state-of-the art among frontier models on software coding tasks and “agentic” tasks that require it to run autonomously on a user’s behalf. As outlined in our RSP framework, our standard capab…
[17] [PDF] Claude Opus 4.6 System Card - Anthropicwww-cdn.anthropic.com
Claude Opus 4.6 is a frontier model with strong capabilities in software engineering, agentic tasks, and long context reasoning, as well as in knowledge work—including fi nancial analysis, document creation, and multi-step research work fl ows. The model shows signi fi cant improvements in long-context reasoning, knowledge work, research, and analysis; it has also increased its capabilities in some areas of agentic coding and tool use (on a few evaluations it performs similarly to, or slightly less well than, its predecessor). The primary purpose of this survey was to inform the Responsible S…
[18] [PDF] Claude Sonnet 4.6 System Card - Anthropicanthropic.com
On some measures, Sonnet 4.6 showed the best degree of alignment we have yet seen in any Claude model. Informed by the testing described here—and similarly to Claude Sonnet 4.5—we have deployed Claude Sonnet 4.6 under the AI Safety Level 3 (ASL-3) Standard. 3 Abstract 3 1 Introduction 7 1.1 Model training and characteristics 8 1.1.1 Training data and process 8 1.1.2 Thinking modes and the effort parameter 9 1.1.3 Crowd workers 9 1.2 Release decision process 10 1.2.1 Overview 10 1.2.2 Iterative model evaluations 10 1.2.3 AI S…

AntwortenÖffentliche URL25. Apr. 202618 Quellen

Was sind die Benchmarks von Claude Opus 4.7?

Wichtige Erkenntnisse

Die öffentlich auffindbaren Benchmark-Werte für Claude Opus 4.7 sind nach den gefundenen Quellen vor allem 87,6 % auf SWE-bench Verified und 94,2 % auf GPQA. Anthropic beschreibt Opus 4.7 außerdem als sein stärkstes allgemein verfügbares Modell; ein weiterer Treffer nennt 80,5 %
**SWE-bench Verified:** 87,6 % [5][6]

Forschungsantwort

SWE-bench Verified: 87,6 % ^[5]^[6]
GPQA: 94,2 % ^[6]
SWE-bench Multilingual: 80,5 % ^[15]

Wichtig:

Der Wert für SWE-bench Verified wird in mehreren Treffern konsistent genannt. ^[5]^[6]
GPQA 94,2 % habe ich in den Suchergebnissen nur in einem Benchmark-Zusammenfassungsartikel klar gesehen, nicht direkt im Snippet der offiziellen Anthropic-Seite. ^[6]
Für eine vollständige Benchmark-Tabelle direkt aus der offiziellen Release-Seite liegen mir über die Such-Snippets hinaus hier keine vollständig ausgelesenen Primärdaten vor. Wenn du willst, kann ich dir als Nächstes auch eine Einordnung gegen GPT, Gemini und andere Claude-Modelle machen.

Unterstützende Visuals