GPT-5.5とClaude Opus 4.7のベンチマークを調査してください。コーディング、デザイン、クリエイティブコンテンツでは、どちらが優れているでしょうか?
調査できた範囲では、コーディングはClaude Opus 4.7が優勢です。デザインとクリエイティブコンテンツも、公開情報上はClaude Opus 4.7寄りですが、GPT 5.5との厳密な横並びベンチマークが不足しているため、結論は「Claude優勢の可能性が高いが、独立検証は不十分」です。 Key findings コーディング:Claude Opus 4.7が優勢 Anthropicは、Claude Opus 4.7が同社...
調査できた範囲では、コーディングはClaude Opus 4.7が優勢です。デザインとクリエイティブコンテンツも、公開情報上はClaude Opus 4.7寄りですが、GPT 5.5との厳密な横並びベンチマークが不足しているため、結論は「Claude優勢の可能性が高いが、独立検証は不十分」です。 Key findings コーディング:Claude Opus 4.7が優勢 Anthropicは、Claude Opus 4.7が同社の93タスク・コーディングベンチマークでOpus 4.6より解決率を13%改善し、Opus 4.6やSonnet 4.6でも解けなかった4タスクを解決したと説明しています。[3] SWE Bench Pro
重要なポイント
**コーディング:Claude Opus 4.7が優勢**
- Anthropicは、Claude Opus 4.7が同社の93タスク・コーディングベンチマークでOpus 4.6より解決率を13%改善し、Opus 4.6やSonnet 4.6でも解けなかった4タスクを解決したと説明しています。
- SWE-Bench Proについては、GPT-5.5よりClaude Opus 4.7が上回るという検証記事があり、実世界ソフトウェアエンジニアリング系ベンチマークではClaude Opus 4.7優位とされています。
- 複雑で長時間のコーディングワークフローでは、Claude Opus 4.7が特に強いという位置づけです。
調査できた範囲では、コーディングはClaude Opus 4.7が優勢です。デザインとクリエイティブコンテンツも、公開情報上はClaude Opus 4.7寄りですが、GPT-5.5との厳密な横並びベンチマークが不足しているため、結論は「Claude優勢の可能性が高いが、独立検証は不十分」です。
## Key findings
研究の答え
調査できた範囲では、コーディングはClaude Opus 4.7が優勢です。デザインとクリエイティブコンテンツも、公開情報上はClaude Opus 4.7寄りですが、GPT-5.5との厳密な横並びベンチマークが不足しているため、結論は「Claude優勢の可能性が高いが、独立検証は不十分」です。
Key findings
コーディング:Claude Opus 4.7が優勢
Anthropicは、Claude Opus 4.7が同社の93タスク・コーディングベンチマークでOpus 4.6より解決率を13%改善し、Opus 4.6やSonnet 4.6でも解けなかった4タスクを解決したと説明しています。[3]
SWE-Bench Proについては、GPT-5.5よりClaude Opus 4.7が上回るという検証記事があり、実世界ソフトウェアエンジニアリング系ベンチマークではClaude Opus 4.7優位とされています。[5]
複雑で長時間のコーディングワークフローでは、Claude Opus 4.7が特に強いという位置づけです。[3]
デザイン:Claude Opus 4.7が有利そうだが、証拠は限定的
AnthropicはClaude Opus 4.7について、プロフェッショナル作業でより「tasteful and creative」であり、インターフェース、スライド、ドキュメントの品質が高いと主張しています。[2]
ただし、提示された証拠内には、GPT-5.5とClaude Opus 4.7を同一条件で比較した独立系デザインベンチマークはありません。したがって、デザイン領域はClaude Opus 4.7優勢の可能性はあるものの、確定的な比較には不十分です。
クリエイティブコンテンツ:Claude Opus 4.7寄りだが、決定打は不足
Claude Opus 4.7は、プロ向けの創作・資料作成・文書作成でより高品質と説明されています。[2]
一方で、GPT-5.5との直接的な創作コンテンツ評価、たとえば広告コピー、長文記事、物語、ブランド文体再現などの横比較ベンチマークは提示証拠にありません。したがって、クリエイティブコンテンツは「Claude Opus 4.7が有利そうだが、Insufficient evidence.」です。
一般推論・知識系ではClaude Opus 4.7が上回るデータあり
Humanity’s Last Examのno-tools条件では、Claude Opus 4.7が46.9%、GPT-5.5が41.4%とされ、Claude Opus 4.7が上回っています。[4]
したがって、コーディングはClaude Opus 4.7優勢と見てよい一方、デザインとクリエイティブは「Claude Opus 4.7が有望だが、Insufficient evidence.」という扱いが妥当です。
Summary
コーディング: Claude Opus 4.7が優位。
デザイン: Claude Opus 4.7が有利そうだが、独立比較は不足。
クリエイティブコンテンツ: Claude Opus 4.7寄りだが、決定的証拠は不足。
ツール操作・OS操作: ほぼ同等で、GPT-5.5がわずかに上回るデータはあるがノイズ範囲。
総合判断: 実務で1つ選ぶなら、複雑なコーディング、UI/資料、プロ向けクリエイティブ制作ではClaude Opus 4.7を優先。スピード重視の自動化やツール操作中心の作業では、GPT-5.5も十分競争力があります。
サポートビジュアル
GPT 5.5 DESTROYS Claude Opus 4.7The image features bold text stating "THIS IS SCARY" alongside a fiery explosion resembling a blooming firework or starburst, with logos and labels referencing GPT 5.5, Claude Opus 4.7, and other advanced AI design elements.NASA is right to focus on 'hardware rich' approach following Artemis II heat shield gambleHumanoid robot takes over data collection, administrative job at UK’s construction site'Dr'Dr. of Bubbles' Lucy Rogers on Robot Wars, new book, and why failure fuels innovationUS’ heat-activated knot robots leap hundreds of times their height without electronicsAI WarAI War? White House accuses China of industrial-scale theft of US AI frontier modelsVideo: China’s SamuRoid humanoid robot offers smarter interactions in a compact formThe RV is becoming an energy platformThe RV is becoming an energy platform. These companies are building it that wayGermany tests shape-shifting wings that morph mid-flight to boost flight safetyNo racetrack glory for China’s AI robots against the data challengeHow spinning sails are cutting fuel use in modern shippingOpenAI logo illustrationClaude Opus 4.7: Smarter Coding, Better Vision, Real FixesThe image displays a benchmark comparison chart of different versions of Claude Opus and GPT models, highlighting improvements in coding capabilities, model autonomy, and correctness, with graphical bar comparisons and annotations.GPT 5.5 Vs Claude Opus 4.7 For Real WorkA comparison chart highlighting the differences between Claude Opus 4.7 and GPT 5.5, with visual representations of interconnected brain-like structures and lists of their key attributes, set on a dark background with a laptop, notebook, and coffee mug in the foreground.OpenAI's GPT-5.5 vs Claude Opus 4.7: Which is betterA stylized illustration combining elements of artificial intelligence, coding, and academic or official content, with abstract shapes and digital motifs divided into two contrasting backgrounds.GPT-5.5 vs Claude Opus 4.7: Benchmarks & PricingThe image features a headline discussing benchmarks and pricing comparisons between GPT-5.5 and Claude Opus 4.7, presented by the Digital Applied Team on their blog, with a focus on AI development and model evaluation.Introducing GPT-5.5 | OpenAIA 3D visualization of a mathematical model called the Weierstrass function, displayed on a black background with labels indicating it is related to the April 2026 OpenAI GPT-5.5 privacy filter model.Introducing GPT-5.5 | OpenAIA blurred background with pastel colors overlaid by text referencing ChatGPT for clinicians, with mentions of GPT-5.5, Terminal-Bench, SWE-Bench, GDPval, Claude Opus 4.7, and official documentation.
研究に関する会話
あなた
リサーチクエスチョン
GPT-5.5とClaude Opus 4.7のベンチマークを調査してください。コーディング、デザイン、クリエイティブコンテンツでは、どちらが優れているでしょうか?
Studio Global AI20 ソース
引用された回答
調査できた範囲では、コーディングはClaude Opus 4.7が優勢です。デザインとクリエイティブコンテンツも、公開情報上はClaude Opus 4.7寄りですが、GPT-5.5との厳密な横並びベンチマークが不足しているため、結論は「Claude優勢の可能性が高いが、独立検証は不十分」です。
Key findings
コーディング:Claude Opus 4.7が優勢
Anthropicは、Claude Opus 4.7が同社の93タスク・コーディングベンチマークでOpus 4.6より解決率を13%改善し、Opus 4.6やSonnet 4.6でも解けなかった4タスクを解決したと説明しています。[3]
SWE-Bench Proについては、GPT-5.5よりClaude Opus 4.7が上回るという検証記事があり、実世界ソフトウェアエンジニアリング系ベンチマークではClaude Opus 4.7優位とされています。[5]
複雑で長時間のコーディングワークフローでは、Claude Opus 4.7が特に強いという位置づけです。[3]
デザイン:Claude Opus 4.7が有利そうだが、証拠は限定的
AnthropicはClaude Opus 4.7について、プロフェッショナル作業でより「tasteful and creative」であり、インターフェース、スライド、ドキュメントの品質が高いと主張しています。[2]
ただし、提示された証拠内には、GPT-5.5とClaude Opus 4.7を同一条件で比較した独立系デザインベンチマークはありません。したがって、デザイン領域はClaude Opus 4.7優勢の可能性はあるものの、確定的な比較には不十分です。
クリエイティブコンテンツ:Claude Opus 4.7寄りだが、決定打は不足
Claude Opus 4.7は、プロ向けの創作・資料作成・文書作成でより高品質と説明されています。[2]
一方で、GPT-5.5との直接的な創作コンテンツ評価、たとえば広告コピー、長文記事、物語、ブランド文体再現などの横比較ベンチマークは提示証拠にありません。したがって、クリエイティブコンテンツは「Claude Opus 4.7が有利そうだが、Insufficient evidence.」です。
一般推論・知識系ではClaude Opus 4.7が上回るデータあり
Humanity’s Last Examのno-tools条件では、Claude Opus 4.7が46.9%、GPT-5.5が41.4%とされ、Claude Opus 4.7が上回っています。[4]
Thinking content omitted by default Starting with Claude Opus 4.7, thinking content is omitted from the response by default. Thinking blocks still appear in the response stream, but their thinking field will be empty unless the caller explicitly opts in. This is a silent change — no error is raised — and response latency will be slightly improved. If reasoning outputs are needed, you can set display to "summarized" and opt back in with a one-line change: Python
In particular, Anthropic says Claude Opus 4.7 is better at advanced coding tasks, visual intelligence, and document analysis. Anthropic also says Opus 4.7 is "more tasteful and creative when completing professional tasks, producing higher-quality interfaces, slides, and docs." Mashable Light Speed Want more out-of-this world tech, space and science stories? Sign up for Mashable's weekly Light Speed newsletter. By clicking Sign Me Up, you confirm you are 16+ and agree to our Terms of Use") and Privacy Policy"). Thanks for signing up! [...] Tim studied print journalism at the University of Sout…
Image 7: logo > On our 93-task coding benchmark, Claude Opus 4.7 lifted resolution by 13% over Opus 4.6, including four tasks neither Opus 4.6 nor Sonnet 4.6 could solve. Combined with faster median latency and strict instruction-following, it's particularly meaningful for complex, long-running coding workflows. It cuts the friction from those multi-step tasks so developers can stay in the flow and focus on building. > > Mario Rodriguez > > Chief Product Officer Image 8: logo [...] Image 22: logo > Claude Opus 4.7 feels like a real step up in intelligence. Code quality is noticeably improved,…
Computer Use and Tool Orchestration Computer use is the second axis where GPT-5.5 and Opus 4.7 compete most directly, and the benchmark margin is much tighter than agentic coding. On OSWorld-Verified, GPT-5.5 scores 78.7% versus 78.0% for Opus 4.7 — within noise range. On Tau2-bench Telecom (run without prompt tuning), GPT-5.5 hits 98.0%. Toolathlon goes to GPT-5.5 at 55.6% (Opus 4.7 not reported). MCP-Atlas, the tool-orchestration benchmark that tests handling complex tool sets via the Model Context Protocol, goes to Opus 4.7 at 79.1% vs 75.3%. Operate Software GPT-5.5 78.7% / Opus 4.7 78…
Sources OpenAI — Introducing GPT-5.5 OpenAI API Pricing Anthropic — Claude Models Documentation BuildFastWithAI — Claude Opus 4.7 Review & Benchmarks The Next Web — Claude Opus 4.7 Leads on SWE-bench Content was rephrased for compliance with licensing restrictions. Benchmark data sourced from official OpenAI and Anthropic publications as of April 23, 2026. Pricing and benchmarks may change — always verify on the vendor's website. ## Build With the Right AI Model Whether you need GPT-5.5 for agentic workflows, Claude Opus 4.7 for precision coding, or a multi-model architecture that uses bo…
This is where the comparison stops being close. On the same coding tasks — identical prompts, identical goals — GPT-5.5 produces roughly 72% fewer output tokens than Claude Opus 4.7. That’s not a rounding error. It’s a structural difference in how each model communicates. ### Why Opus 4.7 Uses So Many Tokens Opus 4.7 narrates its reasoning. When it writes code, it often explains what it’s about to do, writes the code, then summarizes what it did. In a chat interface, that’s sometimes helpful. In an agentic loop running dozens of steps, every narration token is a billable token. [...] ### SWE-…
Image 22: logo > Claude Opus 4.7 feels like a real step up in intelligence. Code quality is noticeably improved, it’s cutting out the meaningless wrapper functions and fallback scaffolding that used to pile up, and fixes its own code as it goes. It’s the cleanest jump we’ve seen since the move from Sonnet 3.7 to the Claude 4 series. > > Ben Lafferty > > Senior Staff Engineer Image 23: logo > For the computer-use work that sits at the heart of XBOW’s autonomous penetration testing, the new Claude Opus 4.7 is a step change: 98.5% on our visual-acuity benchmark versus 54.5% for Opus 4.6. Our sin…
Pietro Schirano, CEO of MagicPath, saw a similar step change when GPT‑5.5 merged a branch with hundreds of frontend and refactor changes into a main branch that had also changed substantially, resolving the work in one shot in about 20 minutes. Senior engineers who tested the model said GPT‑5.5 was noticeably stronger than GPT‑5.4 and Claude Opus 4.7 at reasoning and autonomy, catching issues in advance and predicting testing and review needs without explicit prompting. In one case, an engineer asked it to re-architect a comment system in a collaborative markdown editor and returned to a 12-d…
About UsAdvertise ContactFAQ #### Follow Us On LinkedInXInstagramFlipboardFacebookYouTubeTikTok All Rights Reserved, IE Media, Inc. AI and Robotics # GPT-5.5 crushes Claude Opus 4.7 in agentic coding with 82.7% terminal-bench score GPT-5.5 introduces smarter task handling, reduced token usage, and broader adoption across enterprise workflows. ByAamir Khollam AI and Robotics FacebookLinkedInXReddit Google News Preferred Source ByAamir Khollam FacebookLinkedInXReddit Google News Preferred Source OpenAI logo illustration OpenAI logo illustration.Getty Images OpenAI has introduced GPT-5.5, positi…
Headline benchmarks: Terminal-Bench 2.0 at 82.7% (Opus 4.7: 69.4%, Gemini 3.1 Pro: 68.5%). SWE-Bench Pro at 58.6% (Opus 4.7 still leads at 64.3%). OpenAI’s internal Expert-SWE eval, where tasks have a 20-hour median human completion time, at 73.1% (up from GPT-5.4’s 68.5%). GDPval wins-or-ties at 84.9% (Opus 4.7: 80.3%, Gemini 3.1 Pro: 67.3%). OSWorld-Verified at 78.7% (narrowly edges Opus 4.7’s 78.0%). FrontierMath Tier 4 at 35.4% (Opus 4.7: 22.9%, Gemini 3.1 Pro: 16.7%). CyberGym at 81.8% (Opus 4.7: 73.1%, Anthropic’s Claude Mythos: 83.1%). Tau2-Bench Telecom at 98.0% without prompt tuning.…
Meanwhile, other models, like Simon Willison’s favored 20.9GB Qwen 3.6 on his famous pelican‑on‑a‑bicycle test still beat Opus 4.7 on some one‑shot creative coding tasks, so it’s not the uncontested champion there. What to do next: Claude Code ships with a
/claude-api migrate
Skill that updates most of your code for you. I ran it on two small projects yesterday and it handled about 90% of the changes. The rest were custom prompt tricks that depended on low temperature, and those still need a human pass. If you’re on a Max plan, use auto mode: give Claude a real coding task in the terminal,…
GPT-5.5 Just Beat Claude Opus 4.7 at Engineering Image 7 Every Every 37.1K subscribers Subscribe Subscribed 528 Share Save Download Download 16,312 views 7 hours ago 16,312 views • Apr 23, 2026 OpenAI just dropped GPT-5.5—and after three weeks of hands-on testing at Every, the headline is its coding ability. On Every's Senior Engineer Benchmark, GPT-5.5 scored 62.5 out of 100. That’s about a 30-point leap over Claude Opus 4.7.…...more ...more How this was made Auto-dubbed Audio tracks for some languages were automatically generated. Learn more ## Chapters View all Image 8 #### It's Model Re…
Humanity's Last Exam (no tools) Claude Opus 4.7 at 46.9% versus GPT-5.5's 41.4% on raw knowledge-recall academic reasoning without tool assistance. Gemini 3.1 Pro (44.4%) also outperforms GPT-5.5 here. On pure reasoning without scaffolding, there's still a gap. ## 5. What developers are saying The early sentiment from developers and AI engineers has been notably positive, with some specific caveats. Matt Shumer wrote on X that GPT-5.5 is "a MASSIVE leap forward" but added that "for 99% of users, it probably won't matter" and flagged "one BIG, incredibly frustrating regression." That track…
Published ## Verdict Supported Evidence from official release notes and independent technical analysis confirms that Claude Opus 4.7 outperforms GPT-5.5 on the SWE-Bench Pro benchmark, a key metric for real-world software engineering. While OpenAI's announcement highlights a state-of-the-art (SOTA) score on a different benchmark (
Terminal-Bench 2.0
), its reported score for SWE-Bench Pro is lower than the score previously established by Anthropic's latest model. 7 reviewed sources behind this verdict. 01 openai.com ### Introducing GPT-5.5 | OpenAI Official OpenAI release notes for GPT-5.5, p…
01 Which is better, GPT-5.5 or Claude Opus 4.7?On the 10 benchmarks both providers report,Opus 4.7 leads on 6 (GPQA, HLE no tools, HLE with tools, SWE-Bench Pro, MCP Atlas, FinanceAgent v1.1) andGPT-5.5 leads on 4 (Terminal-Bench 2.0, BrowseComp, OSWorld-Verified, CyberGym). Opus 4.7's leads cluster on reasoning-heavy and review-grade tests; GPT-5.5's leads cluster on long-running tool-use and shell-driven tasks. The right one depends on the workload, not on a single overall ranking. [...] | Available in our proxy | API not yet live | Yes | [...] The way I'd translate this for a real product:…
Tim studied print journalism at the University of Southern California. He currently splits his time between Brooklyn, NY and Charleston, SC. He's currently working on his second novel, a science-fiction book. Recommended For You Anthropic says Claude Opus 4.7 has a 92% honesty rate, less sycophancy Anthropic says Claude Opus 4.7 is less likely to hallucinate or engage in sycophany than other models. By Timothy Beck Werth Anthropic releases Claude Opus 4.7: How to try it, benchmarks, safety Anthropic has been on a tear lately. By Timothy Beck Werth OpenAI is retiring GPT-4o, and the AI relatio…
GPT 5.5 feels like the model you use when you want to get a working version quickly. It can create files, move through steps, fix obvious errors, and keep the project moving without slowing everything down. That is useful for landing pages, prototypes, internal tools, scripts, and automation setups. Claude Opus 4.7 feels better when the work needs more polish. It can be stronger for code review, structure, refactoring, naming, consistency, and improving rough output. That makes Claude useful after the first version already exists. The best workflow is not complicated. Use GPT 5.5 to build the…
I Tested GPT 5.5 vs Opus 4.7: What You Need to Know OpenAI just dropped GPT 5.5. The benchmarks look strong against Opus 4.7. But benchmarks only tell part of the story. So I ran four head-to-head coding experiments. Same prompts. One shot each. Here's... 10:05 PM · Apr 23, 2026 · 411.5K Views 22 28 544 585 Read 22 replies ## New to X? Sign up now to get your own personalized timeline! Sign up with Apple Create account By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use. ## Relevant people Image 5 Nate Herk @nateherk Follow Click to Follow nateherk Founde…
I spent almost 10 hours running both side by side on real work. Coding projects, writing tasks, agent runs, the kind of messy stuff you’d do on a normal Friday. No cherry-picked prompts, no benchmark gymnastics, just real output I could actually ship. Here’s what I found. The first thing I noticed was the speed difference ## Quick refresher on what each model actually is Claude Opus 4.7 is Anthropic’s strongest publicly available model. It follows instructions more literally than 4.6, reads images at three times higher resolution, and it’s tuned for long autonomous work. If you’ve been using…
The inevitable question is whether this officially flips Claude. The market seems skeptical. I am too, depending on the use case. Opus4.7 still feels like it has a slight edge in raw, abstract architectural reasoning. If I am asking a model to design a distributed systems architecture from scratch without writing the implementation code, I might still lean toward Opus4.7. But gpt5.5 is vastly superior at the execution phase. It is an unrelenting worker bee. The rumors of gpt5.5-pro matching "Mythos" level capabilities seem slightly overblown for general knowledge tasks, but entirely accurate…
レポート公開URL2026年4月26日20 ソース
GPT-5.5とClaude Opus 4.7のベンチマークを調査してください。コーディング、デザイン、クリエイティブコンテンツでは、どちらが優れているでしょうか?
調査できた範囲では、コーディングはClaude Opus 4.7が優勢です。デザインとクリエイティブコンテンツも、公開情報上はClaude Opus 4.7寄りですが、GPT 5.5との厳密な横並びベンチマークが不足しているため、結論は「Claude優勢の可能性が高いが、独立検証は不十分」です。 Key findings コーディング:Claude Opus 4.7が優勢 Anthropicは、Claude Opus 4.7が同社...
調査できた範囲では、コーディングはClaude Opus 4.7が優勢です。デザインとクリエイティブコンテンツも、公開情報上はClaude Opus 4.7寄りですが、GPT 5.5との厳密な横並びベンチマークが不足しているため、結論は「Claude優勢の可能性が高いが、独立検証は不十分」です。 Key findings コーディング:Claude Opus 4.7が優勢 Anthropicは、Claude Opus 4.7が同社の93タスク・コーディングベンチマークでOpus 4.6より解決率を13%改善し、Opus 4.6やSonnet 4.6でも解けなかった4タスクを解決したと説明しています。[3] SWE Bench Pro
重要なポイント
**コーディング:Claude Opus 4.7が優勢**
- Anthropicは、Claude Opus 4.7が同社の93タスク・コーディングベンチマークでOpus 4.6より解決率を13%改善し、Opus 4.6やSonnet 4.6でも解けなかった4タスクを解決したと説明しています。
- SWE-Bench Proについては、GPT-5.5よりClaude Opus 4.7が上回るという検証記事があり、実世界ソフトウェアエンジニアリング系ベンチマークではClaude Opus 4.7優位とされています。
- 複雑で長時間のコーディングワークフローでは、Claude Opus 4.7が特に強いという位置づけです。
調査できた範囲では、コーディングはClaude Opus 4.7が優勢です。デザインとクリエイティブコンテンツも、公開情報上はClaude Opus 4.7寄りですが、GPT-5.5との厳密な横並びベンチマークが不足しているため、結論は「Claude優勢の可能性が高いが、独立検証は不十分」です。
## Key findings
研究の答え
調査できた範囲では、コーディングはClaude Opus 4.7が優勢です。デザインとクリエイティブコンテンツも、公開情報上はClaude Opus 4.7寄りですが、GPT-5.5との厳密な横並びベンチマークが不足しているため、結論は「Claude優勢の可能性が高いが、独立検証は不十分」です。
Key findings
コーディング:Claude Opus 4.7が優勢
Anthropicは、Claude Opus 4.7が同社の93タスク・コーディングベンチマークでOpus 4.6より解決率を13%改善し、Opus 4.6やSonnet 4.6でも解けなかった4タスクを解決したと説明しています。[3]
SWE-Bench Proについては、GPT-5.5よりClaude Opus 4.7が上回るという検証記事があり、実世界ソフトウェアエンジニアリング系ベンチマークではClaude Opus 4.7優位とされています。[5]
複雑で長時間のコーディングワークフローでは、Claude Opus 4.7が特に強いという位置づけです。[3]
デザイン:Claude Opus 4.7が有利そうだが、証拠は限定的
AnthropicはClaude Opus 4.7について、プロフェッショナル作業でより「tasteful and creative」であり、インターフェース、スライド、ドキュメントの品質が高いと主張しています。[2]
ただし、提示された証拠内には、GPT-5.5とClaude Opus 4.7を同一条件で比較した独立系デザインベンチマークはありません。したがって、デザイン領域はClaude Opus 4.7優勢の可能性はあるものの、確定的な比較には不十分です。
クリエイティブコンテンツ:Claude Opus 4.7寄りだが、決定打は不足
Claude Opus 4.7は、プロ向けの創作・資料作成・文書作成でより高品質と説明されています。[2]
一方で、GPT-5.5との直接的な創作コンテンツ評価、たとえば広告コピー、長文記事、物語、ブランド文体再現などの横比較ベンチマークは提示証拠にありません。したがって、クリエイティブコンテンツは「Claude Opus 4.7が有利そうだが、Insufficient evidence.」です。
一般推論・知識系ではClaude Opus 4.7が上回るデータあり
Humanity’s Last Examのno-tools条件では、Claude Opus 4.7が46.9%、GPT-5.5が41.4%とされ、Claude Opus 4.7が上回っています。[4]
したがって、コーディングはClaude Opus 4.7優勢と見てよい一方、デザインとクリエイティブは「Claude Opus 4.7が有望だが、Insufficient evidence.」という扱いが妥当です。
Summary
コーディング: Claude Opus 4.7が優位。
デザイン: Claude Opus 4.7が有利そうだが、独立比較は不足。
クリエイティブコンテンツ: Claude Opus 4.7寄りだが、決定的証拠は不足。
ツール操作・OS操作: ほぼ同等で、GPT-5.5がわずかに上回るデータはあるがノイズ範囲。
総合判断: 実務で1つ選ぶなら、複雑なコーディング、UI/資料、プロ向けクリエイティブ制作ではClaude Opus 4.7を優先。スピード重視の自動化やツール操作中心の作業では、GPT-5.5も十分競争力があります。
サポートビジュアル
GPT 5.5 DESTROYS Claude Opus 4.7The image features bold text stating "THIS IS SCARY" alongside a fiery explosion resembling a blooming firework or starburst, with logos and labels referencing GPT 5.5, Claude Opus 4.7, and other advanced AI design elements.NASA is right to focus on 'hardware rich' approach following Artemis II heat shield gambleHumanoid robot takes over data collection, administrative job at UK’s construction site'Dr'Dr. of Bubbles' Lucy Rogers on Robot Wars, new book, and why failure fuels innovationUS’ heat-activated knot robots leap hundreds of times their height without electronicsAI WarAI War? White House accuses China of industrial-scale theft of US AI frontier modelsVideo: China’s SamuRoid humanoid robot offers smarter interactions in a compact formThe RV is becoming an energy platformThe RV is becoming an energy platform. These companies are building it that wayGermany tests shape-shifting wings that morph mid-flight to boost flight safetyNo racetrack glory for China’s AI robots against the data challengeHow spinning sails are cutting fuel use in modern shippingOpenAI logo illustrationClaude Opus 4.7: Smarter Coding, Better Vision, Real FixesThe image displays a benchmark comparison chart of different versions of Claude Opus and GPT models, highlighting improvements in coding capabilities, model autonomy, and correctness, with graphical bar comparisons and annotations.GPT 5.5 Vs Claude Opus 4.7 For Real WorkA comparison chart highlighting the differences between Claude Opus 4.7 and GPT 5.5, with visual representations of interconnected brain-like structures and lists of their key attributes, set on a dark background with a laptop, notebook, and coffee mug in the foreground.OpenAI's GPT-5.5 vs Claude Opus 4.7: Which is betterA stylized illustration combining elements of artificial intelligence, coding, and academic or official content, with abstract shapes and digital motifs divided into two contrasting backgrounds.GPT-5.5 vs Claude Opus 4.7: Benchmarks & PricingThe image features a headline discussing benchmarks and pricing comparisons between GPT-5.5 and Claude Opus 4.7, presented by the Digital Applied Team on their blog, with a focus on AI development and model evaluation.Introducing GPT-5.5 | OpenAIA 3D visualization of a mathematical model called the Weierstrass function, displayed on a black background with labels indicating it is related to the April 2026 OpenAI GPT-5.5 privacy filter model.Introducing GPT-5.5 | OpenAIA blurred background with pastel colors overlaid by text referencing ChatGPT for clinicians, with mentions of GPT-5.5, Terminal-Bench, SWE-Bench, GDPval, Claude Opus 4.7, and official documentation.
研究に関する会話
あなた
リサーチクエスチョン
GPT-5.5とClaude Opus 4.7のベンチマークを調査してください。コーディング、デザイン、クリエイティブコンテンツでは、どちらが優れているでしょうか?
Studio Global AI20 ソース
引用された回答
調査できた範囲では、コーディングはClaude Opus 4.7が優勢です。デザインとクリエイティブコンテンツも、公開情報上はClaude Opus 4.7寄りですが、GPT-5.5との厳密な横並びベンチマークが不足しているため、結論は「Claude優勢の可能性が高いが、独立検証は不十分」です。
Key findings
コーディング:Claude Opus 4.7が優勢
Anthropicは、Claude Opus 4.7が同社の93タスク・コーディングベンチマークでOpus 4.6より解決率を13%改善し、Opus 4.6やSonnet 4.6でも解けなかった4タスクを解決したと説明しています。[3]
SWE-Bench Proについては、GPT-5.5よりClaude Opus 4.7が上回るという検証記事があり、実世界ソフトウェアエンジニアリング系ベンチマークではClaude Opus 4.7優位とされています。[5]
複雑で長時間のコーディングワークフローでは、Claude Opus 4.7が特に強いという位置づけです。[3]
デザイン:Claude Opus 4.7が有利そうだが、証拠は限定的
AnthropicはClaude Opus 4.7について、プロフェッショナル作業でより「tasteful and creative」であり、インターフェース、スライド、ドキュメントの品質が高いと主張しています。[2]
ただし、提示された証拠内には、GPT-5.5とClaude Opus 4.7を同一条件で比較した独立系デザインベンチマークはありません。したがって、デザイン領域はClaude Opus 4.7優勢の可能性はあるものの、確定的な比較には不十分です。
クリエイティブコンテンツ:Claude Opus 4.7寄りだが、決定打は不足
Claude Opus 4.7は、プロ向けの創作・資料作成・文書作成でより高品質と説明されています。[2]
一方で、GPT-5.5との直接的な創作コンテンツ評価、たとえば広告コピー、長文記事、物語、ブランド文体再現などの横比較ベンチマークは提示証拠にありません。したがって、クリエイティブコンテンツは「Claude Opus 4.7が有利そうだが、Insufficient evidence.」です。
一般推論・知識系ではClaude Opus 4.7が上回るデータあり
Humanity’s Last Examのno-tools条件では、Claude Opus 4.7が46.9%、GPT-5.5が41.4%とされ、Claude Opus 4.7が上回っています。[4]
Thinking content omitted by default Starting with Claude Opus 4.7, thinking content is omitted from the response by default. Thinking blocks still appear in the response stream, but their thinking field will be empty unless the caller explicitly opts in. This is a silent change — no error is raised — and response latency will be slightly improved. If reasoning outputs are needed, you can set display to "summarized" and opt back in with a one-line change: Python
In particular, Anthropic says Claude Opus 4.7 is better at advanced coding tasks, visual intelligence, and document analysis. Anthropic also says Opus 4.7 is "more tasteful and creative when completing professional tasks, producing higher-quality interfaces, slides, and docs." Mashable Light Speed Want more out-of-this world tech, space and science stories? Sign up for Mashable's weekly Light Speed newsletter. By clicking Sign Me Up, you confirm you are 16+ and agree to our Terms of Use") and Privacy Policy"). Thanks for signing up! [...] Tim studied print journalism at the University of Sout…
Image 7: logo > On our 93-task coding benchmark, Claude Opus 4.7 lifted resolution by 13% over Opus 4.6, including four tasks neither Opus 4.6 nor Sonnet 4.6 could solve. Combined with faster median latency and strict instruction-following, it's particularly meaningful for complex, long-running coding workflows. It cuts the friction from those multi-step tasks so developers can stay in the flow and focus on building. > > Mario Rodriguez > > Chief Product Officer Image 8: logo [...] Image 22: logo > Claude Opus 4.7 feels like a real step up in intelligence. Code quality is noticeably improved,…
Computer Use and Tool Orchestration Computer use is the second axis where GPT-5.5 and Opus 4.7 compete most directly, and the benchmark margin is much tighter than agentic coding. On OSWorld-Verified, GPT-5.5 scores 78.7% versus 78.0% for Opus 4.7 — within noise range. On Tau2-bench Telecom (run without prompt tuning), GPT-5.5 hits 98.0%. Toolathlon goes to GPT-5.5 at 55.6% (Opus 4.7 not reported). MCP-Atlas, the tool-orchestration benchmark that tests handling complex tool sets via the Model Context Protocol, goes to Opus 4.7 at 79.1% vs 75.3%. Operate Software GPT-5.5 78.7% / Opus 4.7 78…
Sources OpenAI — Introducing GPT-5.5 OpenAI API Pricing Anthropic — Claude Models Documentation BuildFastWithAI — Claude Opus 4.7 Review & Benchmarks The Next Web — Claude Opus 4.7 Leads on SWE-bench Content was rephrased for compliance with licensing restrictions. Benchmark data sourced from official OpenAI and Anthropic publications as of April 23, 2026. Pricing and benchmarks may change — always verify on the vendor's website. ## Build With the Right AI Model Whether you need GPT-5.5 for agentic workflows, Claude Opus 4.7 for precision coding, or a multi-model architecture that uses bo…
This is where the comparison stops being close. On the same coding tasks — identical prompts, identical goals — GPT-5.5 produces roughly 72% fewer output tokens than Claude Opus 4.7. That’s not a rounding error. It’s a structural difference in how each model communicates. ### Why Opus 4.7 Uses So Many Tokens Opus 4.7 narrates its reasoning. When it writes code, it often explains what it’s about to do, writes the code, then summarizes what it did. In a chat interface, that’s sometimes helpful. In an agentic loop running dozens of steps, every narration token is a billable token. [...] ### SWE-…
Image 22: logo > Claude Opus 4.7 feels like a real step up in intelligence. Code quality is noticeably improved, it’s cutting out the meaningless wrapper functions and fallback scaffolding that used to pile up, and fixes its own code as it goes. It’s the cleanest jump we’ve seen since the move from Sonnet 3.7 to the Claude 4 series. > > Ben Lafferty > > Senior Staff Engineer Image 23: logo > For the computer-use work that sits at the heart of XBOW’s autonomous penetration testing, the new Claude Opus 4.7 is a step change: 98.5% on our visual-acuity benchmark versus 54.5% for Opus 4.6. Our sin…
Pietro Schirano, CEO of MagicPath, saw a similar step change when GPT‑5.5 merged a branch with hundreds of frontend and refactor changes into a main branch that had also changed substantially, resolving the work in one shot in about 20 minutes. Senior engineers who tested the model said GPT‑5.5 was noticeably stronger than GPT‑5.4 and Claude Opus 4.7 at reasoning and autonomy, catching issues in advance and predicting testing and review needs without explicit prompting. In one case, an engineer asked it to re-architect a comment system in a collaborative markdown editor and returned to a 12-d…
About UsAdvertise ContactFAQ #### Follow Us On LinkedInXInstagramFlipboardFacebookYouTubeTikTok All Rights Reserved, IE Media, Inc. AI and Robotics # GPT-5.5 crushes Claude Opus 4.7 in agentic coding with 82.7% terminal-bench score GPT-5.5 introduces smarter task handling, reduced token usage, and broader adoption across enterprise workflows. ByAamir Khollam AI and Robotics FacebookLinkedInXReddit Google News Preferred Source ByAamir Khollam FacebookLinkedInXReddit Google News Preferred Source OpenAI logo illustration OpenAI logo illustration.Getty Images OpenAI has introduced GPT-5.5, positi…
Headline benchmarks: Terminal-Bench 2.0 at 82.7% (Opus 4.7: 69.4%, Gemini 3.1 Pro: 68.5%). SWE-Bench Pro at 58.6% (Opus 4.7 still leads at 64.3%). OpenAI’s internal Expert-SWE eval, where tasks have a 20-hour median human completion time, at 73.1% (up from GPT-5.4’s 68.5%). GDPval wins-or-ties at 84.9% (Opus 4.7: 80.3%, Gemini 3.1 Pro: 67.3%). OSWorld-Verified at 78.7% (narrowly edges Opus 4.7’s 78.0%). FrontierMath Tier 4 at 35.4% (Opus 4.7: 22.9%, Gemini 3.1 Pro: 16.7%). CyberGym at 81.8% (Opus 4.7: 73.1%, Anthropic’s Claude Mythos: 83.1%). Tau2-Bench Telecom at 98.0% without prompt tuning.…
Meanwhile, other models, like Simon Willison’s favored 20.9GB Qwen 3.6 on his famous pelican‑on‑a‑bicycle test still beat Opus 4.7 on some one‑shot creative coding tasks, so it’s not the uncontested champion there. What to do next: Claude Code ships with a
/claude-api migrate
Skill that updates most of your code for you. I ran it on two small projects yesterday and it handled about 90% of the changes. The rest were custom prompt tricks that depended on low temperature, and those still need a human pass. If you’re on a Max plan, use auto mode: give Claude a real coding task in the terminal,…
GPT-5.5 Just Beat Claude Opus 4.7 at Engineering Image 7 Every Every 37.1K subscribers Subscribe Subscribed 528 Share Save Download Download 16,312 views 7 hours ago 16,312 views • Apr 23, 2026 OpenAI just dropped GPT-5.5—and after three weeks of hands-on testing at Every, the headline is its coding ability. On Every's Senior Engineer Benchmark, GPT-5.5 scored 62.5 out of 100. That’s about a 30-point leap over Claude Opus 4.7.…...more ...more How this was made Auto-dubbed Audio tracks for some languages were automatically generated. Learn more ## Chapters View all Image 8 #### It's Model Re…
Humanity's Last Exam (no tools) Claude Opus 4.7 at 46.9% versus GPT-5.5's 41.4% on raw knowledge-recall academic reasoning without tool assistance. Gemini 3.1 Pro (44.4%) also outperforms GPT-5.5 here. On pure reasoning without scaffolding, there's still a gap. ## 5. What developers are saying The early sentiment from developers and AI engineers has been notably positive, with some specific caveats. Matt Shumer wrote on X that GPT-5.5 is "a MASSIVE leap forward" but added that "for 99% of users, it probably won't matter" and flagged "one BIG, incredibly frustrating regression." That track…
Published ## Verdict Supported Evidence from official release notes and independent technical analysis confirms that Claude Opus 4.7 outperforms GPT-5.5 on the SWE-Bench Pro benchmark, a key metric for real-world software engineering. While OpenAI's announcement highlights a state-of-the-art (SOTA) score on a different benchmark (
Terminal-Bench 2.0
), its reported score for SWE-Bench Pro is lower than the score previously established by Anthropic's latest model. 7 reviewed sources behind this verdict. 01 openai.com ### Introducing GPT-5.5 | OpenAI Official OpenAI release notes for GPT-5.5, p…
01 Which is better, GPT-5.5 or Claude Opus 4.7?On the 10 benchmarks both providers report,Opus 4.7 leads on 6 (GPQA, HLE no tools, HLE with tools, SWE-Bench Pro, MCP Atlas, FinanceAgent v1.1) andGPT-5.5 leads on 4 (Terminal-Bench 2.0, BrowseComp, OSWorld-Verified, CyberGym). Opus 4.7's leads cluster on reasoning-heavy and review-grade tests; GPT-5.5's leads cluster on long-running tool-use and shell-driven tasks. The right one depends on the workload, not on a single overall ranking. [...] | Available in our proxy | API not yet live | Yes | [...] The way I'd translate this for a real product:…
Tim studied print journalism at the University of Southern California. He currently splits his time between Brooklyn, NY and Charleston, SC. He's currently working on his second novel, a science-fiction book. Recommended For You Anthropic says Claude Opus 4.7 has a 92% honesty rate, less sycophancy Anthropic says Claude Opus 4.7 is less likely to hallucinate or engage in sycophany than other models. By Timothy Beck Werth Anthropic releases Claude Opus 4.7: How to try it, benchmarks, safety Anthropic has been on a tear lately. By Timothy Beck Werth OpenAI is retiring GPT-4o, and the AI relatio…
GPT 5.5 feels like the model you use when you want to get a working version quickly. It can create files, move through steps, fix obvious errors, and keep the project moving without slowing everything down. That is useful for landing pages, prototypes, internal tools, scripts, and automation setups. Claude Opus 4.7 feels better when the work needs more polish. It can be stronger for code review, structure, refactoring, naming, consistency, and improving rough output. That makes Claude useful after the first version already exists. The best workflow is not complicated. Use GPT 5.5 to build the…
I Tested GPT 5.5 vs Opus 4.7: What You Need to Know OpenAI just dropped GPT 5.5. The benchmarks look strong against Opus 4.7. But benchmarks only tell part of the story. So I ran four head-to-head coding experiments. Same prompts. One shot each. Here's... 10:05 PM · Apr 23, 2026 · 411.5K Views 22 28 544 585 Read 22 replies ## New to X? Sign up now to get your own personalized timeline! Sign up with Apple Create account By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use. ## Relevant people Image 5 Nate Herk @nateherk Follow Click to Follow nateherk Founde…
I spent almost 10 hours running both side by side on real work. Coding projects, writing tasks, agent runs, the kind of messy stuff you’d do on a normal Friday. No cherry-picked prompts, no benchmark gymnastics, just real output I could actually ship. Here’s what I found. The first thing I noticed was the speed difference ## Quick refresher on what each model actually is Claude Opus 4.7 is Anthropic’s strongest publicly available model. It follows instructions more literally than 4.6, reads images at three times higher resolution, and it’s tuned for long autonomous work. If you’ve been using…
The inevitable question is whether this officially flips Claude. The market seems skeptical. I am too, depending on the use case. Opus4.7 still feels like it has a slight edge in raw, abstract architectural reasoning. If I am asking a model to design a distributed systems architecture from scratch without writing the implementation code, I might still lean toward Opus4.7. But gpt5.5 is vastly superior at the execution phase. It is an unrelenting worker bee. The rumors of gpt5.5-pro matching "Mythos" level capabilities seem slightly overblown for general knowledge tasks, but entirely accurate…