レポート公開済み2026年4月28日Last edited 2026年5月6日14 ソース

GPT-5.5 vs Claude Opus 4.7 vs DeepSeek V4 vs Kimi K2.6：用途优先的基准对比

不要把四个模型粗暴排成一个总榜。公开基准的推理设置、发布时间、自报或第三方评测口径并不完全一致，更适合按任务场景选择。[4][18] 综合性能和经济价值任务可先看GPT 5.5；深度推理、代码审阅和专业判断可先看Claude Opus 4.7；开放权重实验和速度可先看Kimi K2.6；长上下文与低API价格可先看DeepSeek V4 Pro。[3][4][23][26][27] Kimi K2.6在Artificial Analysis开放模型表中为Intelligence 54、112 tokens/s；DeepSeek V4 Pro为Intelligence 52、1M context、36 tokens/s。前者偏速...

Studio Global AIで検索して事実確認 Discover からさらに閲覧する

17K0

GPT-5.5、Claude Opus 4.7、DeepSeek V4、Kimi K2.6のベンチマーク比較を示す抽象的なAIダッシュボード — GPT-5.5・Claude Opus 4.7・DeepSeek V4・Kimi K2.6比較：ベンチマークで見る用途別の勝者4つの主要AIモデルを、総合性能・推論・速度・文脈長・価格の観点で比較するイメージ。
AI プロンプト
Create a landscape editorial hero image for this Studio Global article: GPT-5.5・Claude Opus 4.7・DeepSeek V4・Kimi K2.6比較：ベンチマークで見る用途別の勝者. Article summary: 4モデルを完全同一条件で横比較した公開表は確認できないため、単一の勝者ではなく用途別に選ぶのが安全です。総合候補はGPT 5.5（AA Intelligence 59、GDPval AA Elo 1785）とClaude Opus 4.7（共通10ベンチマークで6勝4敗）です。[4][26][27]. Topic tags: ai, llm benchmarks, openai, anthropic, deepseek. Reference image context from search candidates: Reference image 1: visual subject "[Kimi K2 vs Claude Opus 4.7 vs GPT 5.5 Comparison](https://www.youtube.com/watch?v=M90iB4hpenI). ![Image 4](https://www.youtube.com/watch?v=M90iB4hpenI). [](https://www.youtube.com" source context "Kimi K2 vs Claude Opus 4.7 vs GPT 5.5 Comparison - YouTube" Reference image 2: visual subject "[Kimi K2 vs Claude Opus 4.7 vs GPT 5.5 Comparison](https://www.youtube.com/watch?v=M90iB4hpenI). ![Image 4](https://www.youtube.com/watch?v=M90iB4hpenI). [](
openai.com

如果只问“谁最强”，GPT-5.5、Claude Opus 4.7、DeepSeek V4 Pro、Kimi K2.6很难给出一个稳妥的单一答案。更实用的问法是：你要让模型做什么？

原因很简单：公开基准往往混合了不同推理档位、不同更新时间、厂商自报和第三方整理结果。把这些数字硬塞进一张总排行榜，容易看起来一目了然，实际却可能误导判断。^[4]^[18]

本文会把DeepSeek部分主要落在可查到更多数值的DeepSeek V4 Pro（Reasoning, Max Effort）上。Artificial Analysis的开放模型表中，Kimi K2.6与DeepSeek V4 Pro同时给出了Intelligence、上下文长度、价格列和输出速度等指标，便于做同口径参考。^[23]

先说结论：按用途选第一候选

使用场景	优先候选	为什么
综合性能、经济价值任务	GPT-5.5	GPT-5.5 high在Artificial Analysis Intelligence Index中为59；GPT-5.5 xhigh在GDPval-AA中被报告为Elo 1785。^[26]^[27]
深度推理、专业审阅、复杂判断	Claude Opus 4.7	LLM Stats整理的10个共同基准中，Claude Opus 4.7领先6项，GPT-5.5领先4项。^[4]
终端操作、浏览器任务、长时间工具调用	GPT-5.5	LLM Stats认为GPT-5.5在Terminal-Bench 2.0、BrowseComp、OSWorld-Verified、CyberGym等长时间工具使用任务上更强。^[4]
开放权重路线，重视速度和性价比	Kimi K2.6	Artificial Analysis开放模型表显示，Kimi K2.6为Intelligence 54、256k context、Price列$1.7、112 tokens/s。^[23]
长上下文、低API价格、大量文本处理	DeepSeek V4 Pro / DeepSeek V4系	Artificial Analysis显示DeepSeek V4 Pro为1M context；Mashable报告DeepSeek V4的API价格低于GPT-5.5和Claude Opus 4.7。^[3]^[23]

四个模型的主要信号

模型	基准表现里的强项	价格与部署侧的特点
GPT-5.5	GPT-5.5 high在Artificial Analysis Intelligence Index中为59。GPT-5.5 xhigh在GDPval-AA中被报告为Elo 1785，约高于Claude Opus 4.7 max 30分。^[26]^[27]	Mashable报告其API价格为每100万输入token 5美元、每100万输出token 30美元。^[3]
Claude Opus 4.7	LLM Stats整理的10个共同基准中为6胜4负。Mashable表格显示，Claude Opus 4.7在SWE-Bench Pro为64.3%、GPQA Diamond为94.2%、HLE with tools为54.7%。^[4]^[9]	Mashable报告其API价格为每100万输入token 5美元、每100万输出token 25美元。^[3]
Kimi K2.6	Artificial Analysis开放模型表显示其Intelligence为54。The Decoder引用Moonshot AI发布值称，Kimi K2.6在HLE with Tools为54.0、SWE-Bench Pro为58.6、BrowseComp为83.2。^[20]^[23]	Artificial Analysis同表显示，Kimi K2.6为256k context、Price列$1.7、112 tokens/s。^[23]
DeepSeek V4 Pro	Artificial Analysis开放模型表显示其Intelligence为52。DataCamp认为，DeepSeek V4在纯能力上并不超过GPT-5.5和Claude Opus 4.7。^[16]^[23]	Artificial Analysis同表显示，DeepSeek V4 Pro为1M context、Price列$2.2、36 tokens/s。Mashable报告DeepSeek V4 API价格为每100万输入token 1.74美元、每100万输出token 3.48美元。^[3]^[23]

GPT-5.5 vs Claude Opus 4.7：前沿模型之间，胜负随任务切换

GPT-5.5和Claude Opus 4.7的对比，最不适合一句“谁碾压谁”概括。不同基准下，领先者会切换。

按Mashable列出的主要数字，Claude Opus 4.7在SWE-Bench Pro和GPQA Diamond上领先；GPT-5.5则在Terminal-Bench 2.0、Humanity's Last Exam、BrowseComp、ARC-AGI-1 Verified上领先。^[9]

基准	GPT-5.5	Claude Opus 4.7	Mashable表中的领先者
SWE-Bench Pro	58.6%	64.3%	Claude Opus 4.7
Terminal-Bench 2.0	82.7%	69.4%	GPT-5.5
Humanity's Last Exam	40.6%	31.2%	GPT-5.5
Humanity's Last Exam with tools	52.2%	54.7%	Claude Opus 4.7
BrowseComp	84.4%	79.3%	GPT-5.5
GPQA Diamond	93.6%	94.2%	Claude Opus 4.7
ARC-AGI-1 Verified	94.5%	92.0%	GPT-5.5

LLM Stats的整理稍有不同：在双方都报告的10个基准中，Claude Opus 4.7领先6项，GPT-5.5领先4项。该站的归纳是，Opus 4.7更偏推理和审阅类任务，GPT-5.5更偏长时间工具使用类任务。^[4]

但这里有个关键前提：LLM Stats也提醒，这些分数来自各提供方高推理档位的自报结果，“形状上可比较”，但并不意味着评测方法完全一致。^[4] 另外，像Humanity's Last Exam这样的项目，不同来源对领先关系的呈现也并不完全相同。^[4]^[9]

Kimi K2.6 vs DeepSeek V4 Pro：开放权重路线，看速度还是上下文

Kimi K2.6和DeepSeek V4 Pro与GPT-5.5、Claude Opus 4.7这类闭源前沿模型直接拉通比较，容易失焦。更合理的视角是：如果你正在考虑开放权重模型，究竟更需要速度、价格，还是超长上下文？

指标	Kimi K2.6	DeepSeek V4 Pro
Artificial Analysis Intelligence	54	52
Context window	256k	1.00M
Price列	$1.7	$2.2
Output speed	112 tokens/s	36 tokens/s

只看这张表，Kimi K2.6在Intelligence和输出速度上更有优势；DeepSeek V4 Pro的突出点是1M上下文窗口。^[23] The Decoder引用Moonshot AI的发布值称，Kimi K2.6在HLE with Tools为54.0、SWE-Bench Pro为58.6、BrowseComp为83.2。^[20]

不过，Kimi K2.6的公开实验并不是与GPT-5.5、Claude Opus 4.7做完全同条件对比。Hugging Face模型卡说明，Kimi K2.6评测时启用thinking mode，temperature为1.0、top-p为1.0、上下文长度为262,144 tokens；其主要对比对象也包括Claude Opus 4.6、GPT-5.4、Gemini 3.1 Pro等，而不是本文另外两个最新前沿模型。^[18]

DeepSeek V4 Pro则更像是“接近前沿能力、但用更低成本和更长上下文切入”的选项。DataCamp的判断是，DeepSeek V4在纯能力上不超过GPT-5.5和Claude Opus 4.7，但可用较低成本取得near-frontier级别表现。^[16]

价格比较：别把三种数字混在一起

看模型价格时，至少要分清三类数字。

第一类是API的token单价。Mashable报告称，DeepSeek V4为每100万输入token 1.74美元、每100万输出token 3.48美元；GPT-5.5为5美元/30美元；Claude Opus 4.7为5美元/25美元。^[3]

第二类是Artificial Analysis模型表中的Price列。Kimi K2.6为$1.7，DeepSeek V4 Pro为$2.2，但这个数字不应直接当作Mashable所列API单价的同一指标来比较。^[23]

第三类是运行基准本身的成本。Artificial Analysis文章称，跑完其Intelligence Index的成本中，DeepSeek V4 Pro为1,071美元，Kimi K2.6为948美元，Claude Opus 4.7为4,811美元。^[2]

所以，“DeepSeek更便宜”“Kimi更划算”“Claude更贵”这类判断，要先问清楚：说的是API单价，还是基准运行成本？是短问答，还是会产生大量输出token的真实业务流程？^[2]^[3]^[23]

安全性与可靠性：这是另一条轴线

Claude Opus 4.7还有一类不完全属于能力基准的信号。Mashable报道称，Anthropic声称Claude Opus 4.7有92%的honesty rate，并且更少出现sycophancy，也就是过度迎合用户的倾向。^[15]

Anthropic自己的发布内容还称，Claude Opus 4.7在内部research-agent benchmark中，六个模块合计得分0.715并列第一；在General Finance模块中，从Opus 4.6的0.767提升到0.813。^[17]

这些信息有参考价值，但不能与SWE-Bench Pro、GPQA Diamond、BrowseComp这类能力基准混为一谈。实际落地时，最好把能力分数、成本、速度、幻觉风险、可审计性分开看。^[15]^[17]

真正上线时，单模型固定不如按任务路由

在生产环境里，把所有任务都固定交给一个模型，往往不是最稳的做法。更现实的是按任务路由：简单任务用高性价比模型，困难任务再切到更强或更稳的模型。

MindStudio的代码任务比较称，在相同编码任务中，GPT-5.5比Claude Opus 4.7少用了72%的输出token；但在复杂、推理负担高的大型代码库中，Opus 4.7的细致程度可能足以抵消更高成本。^[28]

一个务实的起点可以是：标准生成、修改、终端类任务先试GPT-5.5；深度审阅和专业判断先试Claude Opus 4.7；低成本开放权重实验先试Kimi K2.6；长上下文和大批量处理先试DeepSeek V4 Pro。^[3]^[4]^[23]^[28]

最终判断

基于目前公开信息，与其给GPT-5.5、Claude Opus 4.7、DeepSeek V4 Pro、Kimi K2.6强行排出唯一冠军，不如按用途选择。

简化来看：GPT-5.5适合综合能力和经济价值任务；Claude Opus 4.7适合推理、审阅和专业判断；Kimi K2.6适合开放权重场景下追求速度与价格性能；DeepSeek V4 Pro适合长上下文和低API价格敏感的任务。^[3]^[4]^[23]^[26]^[27]

还要注意，Artificial Analysis内部也会因页面、更新时间和推理设置不同而呈现差异：一个GPT-5.5 high模型页给出Intelligence 59；另一张模型列表则显示Claude Opus 4.7 Adaptive Reasoning, Max Effort以Intelligence 57位居首位。^[27]^[30]

因此，基准分数应当作为筛选起点，而不是采购或上线的唯一依据。真正稳妥的做法，是拿你自己的任务集、预算、延迟要求和失败容忍度做小规模并行评测，再决定模型路由策略。^[4]^[18]^[28]

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

Studio Global AIで検索して事実確認

重要なポイント

不要把四个模型粗暴排成一个总榜。公开基准的推理设置、发布时间、自报或第三方评测口径并不完全一致，更适合按任务场景选择。[4][18]
综合性能和经济价值任务可先看GPT 5.5；深度推理、代码审阅和专业判断可先看Claude Opus 4.7；开放权重实验和速度可先看Kimi K2.6；长上下文与低API价格可先看DeepSeek V4 Pro。[3][4][23][26][27]
Kimi K2.6在Artificial Analysis开放模型表中为Intelligence 54、112 tokens/s；DeepSeek V4 Pro为Intelligence 52、1M context、36 tokens/s。前者偏速度，后者偏长上下文。[23]

人々も尋ねます

「GPT-5.5 vs Claude Opus 4.7 vs DeepSeek V4 vs Kimi K2.6：用途优先的基准对比」の短い答えは何ですか?

不要把四个模型粗暴排成一个总榜。公开基准的推理设置、发布时间、自报或第三方评测口径并不完全一致，更适合按任务场景选择。[4][18]

最初に検証する重要なポイントは何ですか?

次の実践では何をすればいいでしょうか？

Kimi K2.6在Artificial Analysis开放模型表中为Intelligence 54、112 tokens/s；DeepSeek V4 Pro为Intelligence 52、1M context、36 tokens/s。前者偏速度，后者偏长上下文。[23]

次にどの関連トピックを検討すればよいでしょうか?

別の角度からの引用や追加の引用については、「Claude Opus 4.7、GPT-5.5、DeepSeek V4、Kimi K2.6比較：2026年ベンチマークの結論」に進みます。

関連ページを開く

これを何と比較すればいいでしょうか？

この回答を「DeepSeek V4を技術面から読む：100万トークン、MoE、API移行の要点」と照合してください。

関連ページを開く

研究を続けてください

Comparativa de benchmarks 2026 entre Claude Opus 4.7, GPT-5.5, DeepSeek V4 y Kimi K2.6

Claude Opus 4.7、GPT-5.5、DeepSeek V4、Kimi K2.6比較：2026年ベンチマークの結論

Claude Opus 4.7 vs GPT-5.5 vs DeepSeek V4 vs Kimi K2.6：2026年ベンチマーク比較

DeepSeek V4 工程架构示意图，包含 1M 上下文、MoE 专家路由和 API 服务化元素

DeepSeek V4を技術面から読む：100万トークン、MoE、API移行の要点

Northwest Timber と Southeast Timber：なぜ答えは「larger; larger」なのか

Northwest Timber と Southeast Timber：代替性が高いと、なぜ「larger; larger」になるのか

抽象 AI 晶片與多個聊天機器人平台競爭的示意圖，象徵 DeepSeek 挑戰 OpenAI、Claude、Gemini 與 Grok

DeepSeekはOpenAI、Claude、Gemini、Grokに勝てるのか　本当の焦点はコストと信頼

DeepSeekはOpenAI、Claude、Gemini、Grokに勝てるのか

研究に関する会話

あなた2026年4月25日

リサーチクエスチョン

GPT-5.5、Claude Opus 4.7、DeepSeek V4、Kimi K2.6のベンチマークを調査し、比較してください。

Studio Global AI2026年4月25日38 ソース

引用された回答

提供された証拠だけでは、GPT-5.5、Claude Opus 4.7、DeepSeek V4、Kimi K2.6を同一ベンチマーク上で厳密に横比較するには不十分です。確認できる範囲では、GPT-5.5は「新たな首位モデル」とされる一方、Claude Opus 4.7もArtificial Analysis系の文脈で首位級と示唆されており、Kimi K2.6は一部の公開モデル比較指標のみ数値が確認できます。^[3]^[7]^[34]

Key findings

モデル	確認できたベンチマーク・性能情報	比較上の解釈
GPT-5.5	Artificial Analysis関連記事のスニペットでは、2026年4月24日時点で「新たな主要/首位AIモデル」とされている。^[3] LinkedInスニペットにも「OpenAIのGPT-5.5が新たな首位AIモデル」とある。^[34]	総合性能で非常に強い可能性はあるが、提供証拠には具体的なスコア、評価項目、競合平均との差がないため、定量順位は確定できない。^[3]^[34]
Claude Opus 4.7	Mashableスニペットでは、AnthropicがClaude Opus 4.7について「92%のhonesty rate」と「より少ないsycophancy」を主張している。^[4] SourceForgeスニペットでは、Claude Opus 4.7は高度なソフトウェアエンジニアリングと複雑な問題解決での改善を目的にしたAnthropicの最新モデルと説明されている。^[6]	安全性・誠実性系の指標では具体値があるが、GPT-5.5やKimi K2.6と同じ総合ベンチマークでの数値比較はできない。^[4]^[6]
DeepSeek V4	Artificial Analysisの公開モデル比較スニペットに「DeepSeek V4 Pro」の記載はあるが、スコアや速度、価格などの数値は提示されていない。^[7]	Insufficient evidence. 提供証拠だけではDeepSeek V4のベンチマーク順位・強み・弱みを判断できない。^[7]
Kimi K2.6	Artificial Analysisの公開モデル比較スニペットでは、Kimi K2.6のIntelligenceが54、コンテキスト長が256k、価格が$1.7、出力速度が112 tokens/sと示されている。^[7]	4モデル中で最も具体的な公開比較数値が確認できるが、GPT-5.5やClaude Opus 4.7と同一指標で直接比較できる証拠はない。^[7]

Evidence notes

Stanford HAIの2026年AI Indexでは、2026年2月時点のSWE-bench Verifiedで上位モデルが70%台前半〜中盤に密集し、Claude 4.5 Opus high reasoningが約76.8%で首位、KimiK2.5、GPT-5.2、Gemini 3 Flash high reasoningが70%〜76%の範囲に入っていたとされている。^[2]
ただし、このStanford HAIの記述はClaude Opus 4.7、GPT-5.5、Kimi K2.6、DeepSeek V4そのものではなく、直前世代または別モデルの状況を示す背景情報にとどまる。^[2]
SciCode Benchの提供スニペットにはClaude Sonnet 4 high、Gemini 2.5 Pro、GPT-5-1-codex、Grok 4などの記載があるが、比較対象であるGPT-5.5、Claude Opus 4.7、DeepSeek V4、Kimi K2.6の直接スコアは確認できない。^[1]
GPT-5.5については「新たな首位」とする記述がある一方、別スニペットではClaude Opus 4.7もArtificial Analysis Intelligence文脈で首位に位置するように読めるため、時点差、評価セット差、またはスニペットの文脈差による不整合がある可能性がある。^[3]^[34]

Limitations / uncertainty

Insufficient evidence. 4モデルすべてを同一ベンチマーク、同一日付、同一推論設定で比較した表やスコアは、提供証拠内にはありません。^[1]^[3]^[5]^[7]^[34]
DeepSeek V4は特に情報が不足しており、提供証拠では「DeepSeek V4 Pro」というモデル名の出現以上の定量情報を確認できません。^[7]
Claude Opus 4.7の「92% honesty rate」は安全性・誠実性に関する指標であり、コーディング、数学、エージェント性能、総合知能スコアと同列には扱えません。^[4]
Kimi K2.6のIntelligence 54、256k context、$1.7、112 tokens/sは有用な比較材料ですが、GPT-5.5やClaude Opus 4.7の同じ列の数値が提供されていないため、相対順位は出せません。^[7]

Summary

現時点の提供証拠で最も強い総合性能シグナルがあるのはGPT-5.5ですが、数値スコアはありません。^[3]^[34]
Claude Opus 4.7は安全性・誠実性で92%という具体値があり、ソフトウェアエンジニアリングや複雑問題解決の改善が主張されていますが、総合スコア比較はできません。^[4]^[6]
Kimi K2.6はIntelligence 54、256k context、$1.7、112 tokens/sという具体的な公開比較値がありますが、他3モデルとの同一指標比較は不足しています。^[7]
DeepSeek V4は提供証拠だけでは評価不能です。^[7]

情報源

[2] DeepSeek is back among the leading open weights models with V4 Pro ...artificialanalysis.ai
Lower cost than frontier models, but high token usage keeps costs above most open weights peers: DeepSeek V4 Pro costs $1,071 to run the Artificial Analysis Intelligence Index, more than 4x cheaper than Claude Opus 4.7 ($4,811) but above several open weight...
[3] DeepSeek V4 is here: How it compares to ChatGPT, Claude, Geminimashable.com
Here's how the API pricing compares: DeepSeek V4 costs $1.74 per 1 million input tokens and $3.48 per 1 million output tokens (1 million context window) GPT-5.5 costs at $5 per 1 million input tokens and $30 per 1 million output tokens (1 million context wi...
[4] GPT-5.5 vs Claude Opus 4.7: Pricing, Speed, Benchmarksllm-stats.com
The Verdict On the 10 benchmarks both providers report, Opus 4.7 leads on 6 and GPT-5.5 leads on 4. The leads cluster by category, not by overall quality: Opus 4.7 is ahead on the reasoning-heavy and review-grade tests (GPQA Diamond, HLE with and without to...
[9] OpenAI’s GPT-5.5 vs Claude Opus 4.7: Which is better? | Mashablemashable.com
Thanks for signing up! SWE-Bench Pro: GPT-5.5 scored 58.6; Opus 4.7 scored 64.3 percent Terminal-Bench 2.0: GPT-5.5 scored 82.7 percent; Opus 4.7 scored 69.4 percent Humanity's Last Exam: GPT-5.5 scored 40.6 percent; Opus 4.7 scored 31.2 percent\ Humanity's...
[15] Anthropic releases Claude Opus 4.7: How to try it, benchmarks, safetymashable.com
Tim studied print journalism at the University of Southern California. He currently splits his time between Brooklyn, NY and Charleston, SC. He's currently working on his second novel, a science-fiction book. Recommended For You Anthropic says Claude Opus 4...
[16] DeepSeek V4: Features, Benchmarks, and Comparisons - DataCampdatacamp.com
How large are the DeepSeek V4 models? DeepSeek uses a Mixture of Experts (MoE) architecture. The Pro model contains 1.6 trillion total parameters (49 billion active) and requires an 865GB download. The Flash model contains 284 billion parameters (13 billion...
[17] Introducing Claude Opus 4.7anthropic.com
Image 7: logo Based on our internal research-agent benchmark, Claude Opus 4.7 has the strongest efficiency baseline we’ve seen for multi-step work. It tied for the top overall score across our six modules at 0.715 and delivered the most consistent long-cont...
[18] moonshotai/Kimi-K2.6 - Hugging Facehuggingface.co
Footnotes 1. General Testing Details We report results for Kimi K2.6 and Kimi K2.5 with thinking mode enabled, Claude Opus 4.6 with max effort, GPT-5.4 with xhigh reasoning effort, and Gemini 3.1 Pro with a high thinking level. Unless otherwise specified, a...
[20] Open-weight Kimi K2.6 takes on GPT-5.4 and Claude Opus 4.6 with agent swarmsthe-decoder.com
The Decoder Open-weight Kimi K2.6 takes on GPT-5.4 and Claude Opus 4.6 with agent swarms Matthias Bastian Image description Moonshot AI has released Kimi K2.6 as an open-weight model. It's built to match GPT-5.4 and Claude Opus 4.6 on coding benchmarks, and...
[23] Comparison of Open Source AI Models across Intelligence, Performance, Price, Context Window, and more | Artificial Analysisartificialanalysis.ai
Model Name Intelligence Parameters Context Window Price Output Speed (t/s) Weights Providers Provider Benchmarks --- --- --- --- Kimi logo Kimi K2.6 Kimi 54 1.0KB (32B active at inference time) 256k $1.7 112 🤗 Novita Kimi SiliconFlow +6 more View DeepSeek...
[26] OpenAI's GPT-5.5 is the new leading AI model - Artificial Analysisartificialanalysis.ai
➤ Number one in GDPval-AA with an Elo of 1785: GPT-5.5 (xhigh) leads Claude Opus 4.7 (max) by 30 pts and Gemini 3.1 Pro Preview by 470 pts. GDPval-AA is Artificial Analysis' benchmark that leverages OpenAI's GDPval dataset to evaluate models on real-world e...
[27] GPT-5.5 (high) - Intelligence, Performance & Price Analysisartificialanalysis.ai
No, GPT-5.5 (high) is proprietary. The model weights are not publicly available. GPT-5.5 (high) is a proprietary model and OpenAI has not disclosed the model size or parameter count. GPT-5.5 (high) achieves a score of 59 on the Artificial Analysis Intellige...
[28] GPT-5.5 vs Claude Opus 4.7: Real-World Coding Performance Comparedmindstudio.ai
GPT-5.5 uses 72% fewer output tokens than Claude Opus 4.7 on the same coding tasks — a structural difference, not a minor gap. On raw benchmark quality, both models are competitive. Neither dominates on every task type. For high-volume agentic coding pipeli...
[30] Comparison of AI Models across Intelligence, Performance, and Priceartificialanalysis.ai
Which is the most intelligent AI model? Claude Opus 4.7 (Adaptive Reasoning, Max Effort) currently leads the Artificial Analysis Intelligence Index with a score of 57, out of 347 models evaluated. What are the top AI models? The top AI models by Intelligenc...

トレンドを発見する

レポート公開済み2026年4月28日Last edited 2026年5月6日14 ソース