レポート公開URL2026年4月26日26 ソース

Kimi K2.6とDeepSeek V4のベンチマークを調査してください。コーディング、デザイン、クリエイティブコンテンツでは、どちらが優れているでしょうか？

現時点の提示証拠だけで判断すると、コーディングはDeepSeek V4が優勢です。ただし、デザインとクリエイティブコンテンツについては、比較可能な専用ベンチマークが提示証拠内にないため、結論は「Insufficient evidence」です。 Key findings コーディング: DeepSeek V4が有利です。提示証拠では、DeepSeek V4が「Vibe Code Benchmark」でオープンウェイトモデル1位、Kimi K2.6が2位とされ、さらにDeepSeek V4がKimi K2.6を大きく上回るという主張があります[23]。ただし、この根拠はReddit投稿のスニペットであり、一次ベンチマークペ

重要なポイント

**コーディング:** DeepSeek V4が有利です。提示証拠では、DeepSeek V4が「Vibe Code Benchmark」でオープンウェイトモデル1位、Kimi K2.6が2位とされ、さらにDeepSeek V4がKimi K2.6を大きく上回るという主張があります。ただし、この根拠はReddit投稿のスニペットであり、一次ベンチマークページや詳細スコアではないため、信頼度は限定的です。
**デザイン:** Insufficient evidence. 提示証拠には、UI/UXデザイン、画像理解を伴うデザインレビュー、フロントエンド実装品質、ビジュアル生成支援などを直接比較するベンチマークがありません。
**クリエイティブコンテンツ:** Insufficient evidence. 提示証拠には、文章創作、広告コピー、長編構成、ブランドトーン再現などの品質をKimi K2.6とDeepSeek V4で直接比較する評価がありません。
**長文処理・大規模コンテキスト:** DeepSeek V4は長い仕様書、コードベース、ドキュメントを扱う用途で有利な可能性があります。Artificial Analysisの比較スニペットではDeepSeek V4 Pro/Flashが1000kトークン、Kimi K2.6が256kトークンのコンテキストウィンドウとされています。AINewsのスニペットでもDeepSeek V4系は1Mトークンコンテキストを備えるとされています。
現時点の提示証拠だけで判断すると、**コーディングはDeepSeek V4が優勢**です。ただし、**デザイン**と**クリエイティブコンテンツ**については、比較可能な専用ベンチマークが提示証拠内にないため、結論は「Insufficient evidence」です。
## Key findings

研究の答え

現時点の提示証拠だけで判断すると、コーディングはDeepSeek V4が優勢です。ただし、デザインとクリエイティブコンテンツについては、比較可能な専用ベンチマークが提示証拠内にないため、結論は「Insufficient evidence」です。

Key findings

コーディング: DeepSeek V4が有利です。提示証拠では、DeepSeek V4が「Vibe Code Benchmark」でオープンウェイトモデル1位、Kimi K2.6が2位とされ、さらにDeepSeek V4がKimi K2.6を大きく上回るという主張があります^[23]。ただし、この根拠はReddit投稿のスニペットであり、一次ベンチマークページや詳細スコアではないため、信頼度は限定的です^[23]。
デザイン: Insufficient evidence. 提示証拠には、UI/UXデザイン、画像理解を伴うデザインレビュー、フロントエンド実装品質、ビジュアル生成支援などを直接比較するベンチマークがありません。
クリエイティブコンテンツ: Insufficient evidence. 提示証拠には、文章創作、広告コピー、長編構成、ブランドトーン再現などの品質をKimi K2.6とDeepSeek V4で直接比較する評価がありません。
長文処理・大規模コンテキスト: DeepSeek V4は長い仕様書、コードベース、ドキュメントを扱う用途で有利な可能性があります。Artificial Analysisの比較スニペットではDeepSeek V4 Pro/Flashが1000kトークン、Kimi K2.6が256kトークンのコンテキストウィンドウとされています^[4]^[5]。AINewsのスニペットでもDeepSeek V4系は1Mトークンコンテキストを備えるとされています^[1]。
総合的な「オープンモデル」評価: Kimi K2.6については、AINewsが「world's leading Open Model」と表現し、Moonshotが2026年時点で中国のオープンモデルラボとしてリードしているという評価を掲載しています^[2]。一方で、DeepSeek V4についても、AINewsはV3以来の大きなアーキテクチャ刷新であり、1Mトークンコンテキスト、推論/非推論のハイブリッドモード、MITライセンス、詳細な技術レポートを持つとしています^[1]。

Evidence notes

DeepSeek V4のコーディング優位を示す最も直接的な証拠は「Vibe Code Benchmark」の順位主張ですが、提示証拠ではRedditスニペットに限られます^[23]。そのため、コーディングについては「DeepSeek V4が優勢」と見ますが、確度は中程度です。
コンテキスト長については、Artificial Analysisの比較スニペットがDeepSeek V4 Pro/Flashを1000kトークン、Kimi K2.6を256kトークンとしています^[4]^[5]。ただし、OpenRouterの比較スニペットでは「Max Tokens 256K」とも記載されており、プラットフォーム上の提供条件やモデル設定による差がある可能性があります^[7]。
Kimi K2.6は全体評価として高く扱われていますが、提示証拠内ではコーディング、デザイン、クリエイティブそれぞれの専用スコアが確認できません^[2]。

Limitations / uncertainty

デザインとクリエイティブコンテンツについては、提示証拠だけでは優劣を判定できません。Insufficient evidence.
コーディングの結論は、単一のベンチマーク主張に強く依存しています^[23]。一次ソースの詳細スコア、評価プロンプト、採点方法、再現性が提示証拠内にないため、断定は避けるべきです。
DeepSeek V4にはPro、Flash、Reasoning、High/Max Effortなど複数設定があり、Kimi K2.6との比較結果はモデル設定や提供プラットフォームによって変わる可能性があります^[4]^[5]^[7]。

Summary

コーディング: DeepSeek V4が優勢。ただし根拠は限定的です^[23]。
デザイン: Insufficient evidence.
クリエイティブコンテンツ: Insufficient evidence.
長文・大規模文脈処理: DeepSeek V4が有利な可能性が高いです。提示証拠ではDeepSeek V4が1Mトークン、Kimi K2.6が256kトークンとされています^[4]^[5]。
実務上の選び方: コード生成・大規模コードベース解析ならDeepSeek V4を第一候補にし、文章表現や創作系はKimi K2.6も含めて実タスクでABテストするのが妥当です。

サポートビジュアル

A digital illustration features a central block labeled "Kimi K2" connected to a network of lines and nodes, with people working on laptopsA digital illustration features a central block labeled "Kimi K2" connected to a network of lines and nodes, with people working on laptops and exchanging ideas around it, emphasizing open source, community development, and neural network concepts like DeepSeek V3.

A comparison of AI benchmark scores for Kimi-K2-0905, Kimi-K2-0711, and Claude Sonnet 4 across multiple evaluation metrics and tools, showinA comparison of AI benchmark scores for Kimi-K2-0905, Kimi-K2-0711, and Claude Sonnet 4 across multiple evaluation metrics and tools, showing performance differences in areas like multilingual, verification, and coding tasks.

The image displays a comparison chart of various benchmark scores and metrics for the Kimi K2.5 MLX Apple Silicon processor, highlighting itThe image displays a comparison chart of various benchmark scores and metrics for the Kimi K2.5 MLX Apple Silicon processor, highlighting its performance in different coding, tool use, and STEM-related tests.

Multi-objective performance optimization plot from a 13-hour Kimi K2.6 session: throughput lifts from 0.43 MT/s baseline to 1.24 MT/s afterMulti-objective performance optimization plot from a 13-hour Kimi K2.6 session: throughput lifts from 0.43 MT/s baseline to 1.24 MT/s after CPU-aware tuning and empty-set short-circuit optimizations

The image displays a comparison table of various AI models' performance benchmarks across different coding and reasoning tasks, with metricsThe image displays a comparison table of various AI models' performance benchmarks across different coding and reasoning tasks, with metrics indicating their accuracy or scores.

A comparative chart displays benchmark performance results for the Kimi K2 model against competitors, showing scores across SWE-bench, LiveCA comparative chart displays benchmark performance results for the Kimi K2 model against competitors, showing scores across SWE-bench, LiveCodeBench v6, and MATH-500 tests.

Kimi K2.6 Qwen3.5-0.8B inference optimization case showing a jump from ~15 to 193 tokens/sec, 20% faster than LM Studio

Kimi Design Bench comparing K2.6 Agent against Gemini 3.1 Pro — Kimi wins 47.5%, tie 21.1%, Google wins 31.4%

Kimi Code Bench chart showing K2.5 at 57.4% and K2.6 at 68.2% on Moonshot's internal coding evaluation

Kimi Claw Bench chart showing K2.5 at 59.6 and K2.6 at 65.5 on heterogeneous agent collaboration

Kimi K2.6 launch visual showing a moon with iridescent aurora over a dark background

Moonshot AI Releases Kimi K2.6, Beats Top US Models On Some BenchmarksThe image displays a comparative chart of various AI models, including Kimi K2.6, GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro, across different benchmarking tasks related to coding, design, creative content, and government or university documentation recognition, with performance percentages and visual agent evaluatio

What is Kimi K2.6A female scientist is presenting a diagram titled "Mixture-of-Experts Model," with interconnected categories like finance, health, law, code, science, and arts, while a classroom of students interacts with a dog avatar discussing Kimi K2.6's parameters, emphasizing that only the right experts activate.

Kimi K2.6 API Integration Guide (2026 New Edition): 256K contextThe slide highlights the Kimi K2.6 API, emphasizing its development by Moonshot AI, which surpasses GPT-5.4 in benchmarks, with details on its architecture, context, and benchmark results, and includes information about API pricing.

Kimi K2.6 API Integration Guide (2026 New Edition): 256K contextA comparative benchmark chart displays the performance scores of Kimi K2.6, GPT-5.4, and Claude Opus 4.6 across four different tests, with Kimi K2.6 leading in the AIME 2026 evaluation.

Kimi K2.6 vs DeepSeek V4 vs GPT-5.5 vs Claude Opus 4.7: WhichThe image compares the performance of Kimi K2.6 and DeepSeek V4, alongside GPT-5.5 and Claude Opus 4.7, highlighting their strengths, best use cases, access routes, and key features in a detailed infographic.

Kimi K2.6 Has Arrived: An Open-Weight Powerhouse for Agentic WorkThe image presents a comparative chart of advanced AI models, including Kimi K2.6, GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro, showcasing their performance across various benchmarks in categories like humanity's last exam, browsing, deep search, tooling, OS verification, terminal benchmarking, scientific computing, m

DeepSeek-V4-Pro is now available on APIYI: LiveCodeBench 93.5The image presents a diagram showcasing four major innovations in the deepseek-v4-pro architecture, centering around 1.6 trillion parameters and 49 billion MoE, with detailed features of Hybrid Attention, mHC, Engram Memory, and Muon Optimizer linked to the central parameters.

研究に関する会話

あなた2026年4月26日

リサーチクエスチョン

Kimi K2.6とDeepSeek V4のベンチマークを調査してください。コーディング、デザイン、クリエイティブコンテンツでは、どちらが優れているでしょうか？

Studio Global AI2026年4月26日26 ソース

引用された回答

Key findings

コーディング: DeepSeek V4が有利です。提示証拠では、DeepSeek V4が「Vibe Code Benchmark」でオープンウェイトモデル1位、Kimi K2.6が2位とされ、さらにDeepSeek V4がKimi K2.6を大きく上回るという主張があります^[23]。ただし、この根拠はReddit投稿のスニペットであり、一次ベンチマークページや詳細スコアではないため、信頼度は限定的です^[23]。
デザイン: Insufficient evidence. 提示証拠には、UI/UXデザイン、画像理解を伴うデザインレビュー、フロントエンド実装品質、ビジュアル生成支援などを直接比較するベンチマークがありません。
クリエイティブコンテンツ: Insufficient evidence. 提示証拠には、文章創作、広告コピー、長編構成、ブランドトーン再現などの品質をKimi K2.6とDeepSeek V4で直接比較する評価がありません。
長文処理・大規模コンテキスト: DeepSeek V4は長い仕様書、コードベース、ドキュメントを扱う用途で有利な可能性があります。Artificial Analysisの比較スニペットではDeepSeek V4 Pro/Flashが1000kトークン、Kimi K2.6が256kトークンのコンテキストウィンドウとされています^[4]^[5]。AINewsのスニペットでもDeepSeek V4系は1Mトークンコンテキストを備えるとされています^[1]。
総合的な「オープンモデル」評価: Kimi K2.6については、AINewsが「world's leading Open Model」と表現し、Moonshotが2026年時点で中国のオープンモデルラボとしてリードしているという評価を掲載しています^[2]。一方で、DeepSeek V4についても、AINewsはV3以来の大きなアーキテクチャ刷新であり、1Mトークンコンテキスト、推論/非推論のハイブリッドモード、MITライセンス、詳細な技術レポートを持つとしています^[1]。

Evidence notes

DeepSeek V4のコーディング優位を示す最も直接的な証拠は「Vibe Code Benchmark」の順位主張ですが、提示証拠ではRedditスニペットに限られます^[23]。そのため、コーディングについては「DeepSeek V4が優勢」と見ますが、確度は中程度です。
コンテキスト長については、Artificial Analysisの比較スニペットがDeepSeek V4 Pro/Flashを1000kトークン、Kimi K2.6を256kトークンとしています^[4]^[5]。ただし、OpenRouterの比較スニペットでは「Max Tokens 256K」とも記載されており、プラットフォーム上の提供条件やモデル設定による差がある可能性があります^[7]。
Kimi K2.6は全体評価として高く扱われていますが、提示証拠内ではコーディング、デザイン、クリエイティブそれぞれの専用スコアが確認できません^[2]。

Limitations / uncertainty

デザインとクリエイティブコンテンツについては、提示証拠だけでは優劣を判定できません。Insufficient evidence.
コーディングの結論は、単一のベンチマーク主張に強く依存しています^[23]。一次ソースの詳細スコア、評価プロンプト、採点方法、再現性が提示証拠内にないため、断定は避けるべきです。
DeepSeek V4にはPro、Flash、Reasoning、High/Max Effortなど複数設定があり、Kimi K2.6との比較結果はモデル設定や提供プラットフォームによって変わる可能性があります^[4]^[5]^[7]。

Summary

コーディング: DeepSeek V4が優勢。ただし根拠は限定的です^[23]。
デザイン: Insufficient evidence.
クリエイティブコンテンツ: Insufficient evidence.
長文・大規模文脈処理: DeepSeek V4が有利な可能性が高いです。提示証拠ではDeepSeek V4が1Mトークン、Kimi K2.6が256kトークンとされています^[4]^[5]。
実務上の選び方: コード生成・大規模コードベース解析ならDeepSeek V4を第一候補にし、文章表現や創作系はKimi K2.6も含めて実タスクでABテストするのが妥当です。

人々も尋ねます

「Kimi K2.6とDeepSeek V4のベンチマークを調査してください。コーディング、デザイン、クリエイティブコンテンツでは、どちらが優れているでしょうか？」の短い答えは何ですか?

最初に検証する重要なポイントは何ですか?

**コーディング:** DeepSeek V4が有利です。提示証拠では、DeepSeek V4が「Vibe Code Benchmark」でオープンウェイトモデル1位、Kimi K2.6が2位とされ、さらにDeepSeek V4がKimi K2.6を大きく上回るという主張があります。ただし、この根拠はReddit投稿のスニペットであり、一次ベンチマークページや詳細スコアではないため、信頼度は限定的です。 **デザイン:** Insufficient evidence. 提示証拠には、UI/UXデザイン、画像理解を伴うデザインレビュー、フロントエンド実装品質、ビジュアル生成支援などを直接比較するベンチマークがありません。

次の実践では何をすればいいでしょうか？

**クリエイティブコンテンツ:** Insufficient evidence. 提示証拠には、文章創作、広告コピー、長編構成、ブランドトーン再現などの品質をKimi K2.6とDeepSeek V4で直接比較する評価がありません。

次にどの関連トピックを検討すればよいでしょうか?

別の角度からの引用や追加の引用については、「GPT-5.5、Claude Opus 4.7、DeepSeek V4、Kimi K2.6のベンチマークを調査し、比較してください。」に進みます。

関連ページを開く

これを何と比較すればいいでしょうか？

この回答を「Deep research & compare GPT-5.5, Claude Opus 4.7, Kimi K2.6, DeepSeek V4」と照合してください。

関連ページを開く

研究を続けてください

GPT-5.5、Claude Opus 4.7、DeepSeek V4、Kimi K2.6のベンチマークを調査し、比較してください。

Deep research & compare GPT-5.5, Claude Opus 4.7, Kimi K2.6, DeepSeek V4

請閱讀 Claude Opus 4.7 同 GPT 5.5 嘅介紹： https://www.anthropic.com/news/claude-opus-4-7 https://openai.com/index/introducing-gpt-5-5 請再搜尋更多相關資料，並就

請閱讀 Claude Opus 4.7 同 GPT 5.5 嘅介紹： https://www.anthropic.com/news/claude-opus-4-7 https://openai.com/index/introducin...

研究並查核事實：在要連續搜尋、整理、交叉比對、再修正的長流程研究任務裡，Claude Opus 4.7 跟 GPT-5.5 Spud 哪一個比較不會中途失焦、漏步驟或跑偏？

情報源

[1] [AINews] Moonshot Kimi K2.6: the world's leading Open Model ...latent.space
DeepSeek V4 rumors are back, and we learned our lesson not to get too excited, but in their deafening silence since v3.2, Moonshot has owned the crown of leading Chinese open model lab for all of 2026 to date, and K2.6 refreshes the lead that K2.5 established in January, with (presumably) more continued pre/posttraining (this time, details of how much more training were not disclosed). Comparing the numbers from the two launches 3 months apart demonstrates the staggering amount of progress: Image 4 [...] # [AINews] Moonshot Kimi K2.6: the world's leading Open Model refreshes to catch up to Op…
[2] DeepSeek V4 Flash (Reasoning, Max Effort) vs Kimi K2.6: Model Comparisonartificialanalysis.ai
| Metric | DeepSeek logoDeepSeek V4 Flash (Reasoning, Max Effort) | Kimi logoKimi K2.6 | Analysis | --- --- | | Creator | DeepSeek | Kimi | | | Context Window | 1000k tokens (~1500 A4 pages of size 12 Arial font) | 256k tokens (~384 A4 pages of size 12 Arial font) | DeepSeek V4 Flash (Reasoning, Max Effort) is larger than Kimi K2.6 | | Release Date | April, 2026 | April, 2026 | DeepSeek V4 Flash (Reasoning, Max Effort) has a more recent release date than Kimi K2.6 | | Parameters | 284B, 13B active at inference time | 1000B, 32B active at inference time | DeepSeek V4 Flash (Reasoning, Max Effo…
[3] DeepSeek V4 Pro vs Kimi K2.6 - AI Model Comparison | OpenRouteropenrouter.ai
Ready Output will appear here... ### Pricing Input$0.7448 / M tokens Output$4.655 / M tokens Images– – ### Features Input Modalities text, image Output Modalities text Quantization int4 Max Tokens (input + output)256K Max Output Tokens 66K Stream cancellation Supports Tools No Prompt Training Caching ### Artificial Analysis Kimi K2.6 Intelligence 53.9 98% Coding 47.1 95% Agentic 66.0 96% ### Design Arena Kimi K2.6 3D 1369 ELO 98% Code Categories 1333 ELO 93% Data Visualization 1322 ELO 94% Game Development 1343 ELO 94% SVG 1242 ELO 78% UI Component 1320 ELO 90% Website 1300 ELO 90% Go to mode…
[4] How to Use Kimi K2.6: Complete Guide to Moonshot AI's New 1T ...tosea.ai
On April 20, 2026, Moonshot AI released Kimi K2.6 — a 1-trillion-parameter open-source Mixture-of-Experts model positioned directly at the agentic-coding segment that Claude Opus 4.7 and GPT-5.4 have dominated through early 2026. The numbers on paper are striking: SWE-Bench Pro at 58.6% (ahead of both Opus 4.6 and GPT-5.4), Humanity's Last Exam with tools at 54.0% (ahead of both), and a 185% throughput lift over K2.5 in a real 13-hour optimization run against the exchange-core benchmark. For a weights-available Chinese model to lead US frontier labs on commercially relevant agentic benchmar…
[5] Kimi K2.6 Tech Blog: Advancing Open-Source Codingkimi.com
| Benchmark | Kimi K2.6 | GPT-5.4 (xhigh) | Claude Opus 4.6 (max effort) | Gemini 3.1 Pro (thinking high) | Kimi K2.5 | --- --- --- | | Agentic | | | HLE-Full w/ tools | 54.0 | 52.1 | 53.0 | 51.4 | 50.2 | | BrowseComp | 83.2 | 82.7 | 83.7 | 85.9 | 74.9 | | BrowseComp (agent swarm) | 86.3 | — | — | — | 78.4 | | DeepSearchQA (f1-score) | 92.5 | 78.6 | 91.3 | 81.9 | 89.0 | | DeepSearchQA (accuracy) | 83.0 | 63.7 | 80.6 | 60.2 | 77.1 | | WideSearch (item-f1) | 80.8 | — | — | — | 72.7 | | Toolathlon | 50.0 | 54.6 | 47.2 | 48.8 | 27.8 | | MCPMark | 55.9 | 62.5 | 56.7 | 55.9 | 29.5 | | Claw Eval (pa…
[6] Kimi K2.6 vs DeepSeek V4 – Side-by-Side Comparison | LLM Referencellmreference.com
LLM Reference # Kimi K2.6 vs DeepSeek V4 Side-by-side comparison of specifications, capabilities, and pricing. | | Kimi K2.6 Moonshot AI | DeepSeek V4 DeepSeek | --- | Released | 2026-04-13 | 2026-02-15 | | Context window | 262K | — | | Parameters | 1T (MoE, 32B active) | — | | Architecture | Mixture of Experts (MoE) | decoder only | | License | Proprietary | Open Source | | Knowledge cutoff | — | — | | Capabilities | | | | Vision | | | | Multimodal | | | | Reasoning | | | | Function calling | | | | Tool use | | | | Structured Outputs | | | | Code execution | | | | Availability | | | | Provid…
[7] Kimi K2.6: Pricing, Benchmarks & Performance - LLM Statsllm-stats.com
Latency 137.00 s Throughput 27 c/s Parameters 1.0T Benchmarks Examples Playground API ## Benchmarks ### Arena Performance #65 Websites #33 3D #50 Games #42 Animations #17 SVG #46 Data Viz #14 Audio ### Leaderboard Rankings #3 Reasoning #3 Search #4 Coding #5 Vision #6 Math #7 Tool Calling ### Quality Tracker 0.00σ— 54 votes 7d+2.21σ 30d+2.21σ Image 3: LLM Stats Logo Websites+0.00σ(13)Games+0.00σ(11)Animations+0.00σ(9)SVG+0.00σ(7)playground-chat+0.00σ(6)Audio+0.00σ(4)Data Viz+0.00σ(3)3D+0.00σ(1) ### Kimi K2.6 Performance Across Datasets Scores sourced from the model's scorecard, paper, or offi…
[8] LLM Coding Benchmark (April 2026): GPT 5.5, DeepSeek v4, Kimi ...akitaonrails.com
| Rank | Model | Score | Tier | RubyLLM OK | Time | Cost | --- --- --- | 1 | Claude Opus 4.7 | 97 | A | ✅ | 18m | ~$1.10 | | 1 | GPT 5.4 xHigh (Codex) | 97 | A | ✅ | 22m | ~$16 | | 3 | GPT 5.5 xHigh (Codex) | 96 | A | ✅ | 18m | ~$10 | | 4 | Kimi K2.6 | 87 | A | ✅ | 20m | ~$0.30 | | 5 | Claude Opus 4.6 | 83 | A | ✅ | 16m | ~$1.10 | | 6 | Gemini 3.1 Pro | 82 | A | ✅ | 14m | ~$0.40 | | 7 | Claude Sonnet 4.6 | 78 | B | ✅ | 16m | ~$0.63 | | 7 | DeepSeek V4 Flash | 78 | B | ✅ | 3m | ~$0.01 | | 9 | Qwen 3.6 Plus | 71 | B | ✅ | 17m | ~$0.15 | | 10 | DeepSeek V4 Pro | 69 | B | ✅ | 22m (DNF) | ~$0.50 |…
[9] moonshotai/Kimi-K2.6 - Hugging Facehuggingface.co
| OSWorld-Verified | 73.1 | 75.0 | 72.7 63.3 | | Coding | | Terminal-Bench 2.0 (Terminus-2) | 66.7 | 65.4 | 65.4 | 68.5 | 50.8 | | SWE-Bench Pro | 58.6 | 57.7 | 53.4 | 54.2 | 50.7 | | SWE-Bench Multilingual | 76.7 77.8 | 76.9 | 73.0 | | SWE-Bench Verified | 80.2 80.8 | 80.6 | 76.8 | | SciCode | 52.2 | 56.6 | 51.9 | 58.9 | 48.7 | | OJBench (python) | 60.6 60.3 | 70.7 | 54.7 | | LiveCodeBench (v6) | 89.6 88.8 | 91.7 | 85.0 | | Reasoning & Knowledge | | HLE-Full | 34.7 | 39.8 | 40.0 | 44.4 | 30.1 | | AIME 2026 | 96.4 | 99.2 | 96.7 | 98.3 | 95.8 | | HMMT 2026 (Feb) | 92.7 | 97.7 | 96.2 | 94.7 | 8…
[10] I reviewed how DeepSeek V4-Pro, Kimi 2.6, Opus 4.6, and Opus 4.7 ...news.ycombinator.com
| | | | ozgune 1 day ago | parent | context | favorite | on: DeepSeek v4 I reviewed how DeepSeek V4-Pro, Kimi 2.6, Opus 4.6, and Opus 4.7 across the same AI benchmarks. All results are for Max editions, except for Kimi. Summary: Opus 4.6 forms the baseline all three are trying to beat. DeepSeek V4-Pro roughly matches it across the board, Kimi K2.6 edges it on agentic/coding benchmarks, and Opus 4.7 surpasses it on nearly everything except web search. DeepSeek V4-Pro Max shines in competitive coding benchmarks. However, it trails both Opus models on software engineering. Kimi K2.6 is remarkabl…
[11] "DeepSeek v4 is now the #1 open-weight model on our Vibe Code Benchmark, and it’s not close. It leaves the #2 (Kimi K2.6) in the dust, and even beats out frontier closed source models like Gemini 3.1 Pro." : r/acceleratereddit.com
Ethical considerations in AI development Latest breakthroughs in machine learning Impact of AI on job markets Public Anyone can view, post, and comment to this community 0 0 Reddit RulesPrivacy PolicyUser AgreementYour Privacy ChoicesAccessibilityReddit, Inc. © 2026. All rights reserved. Expand Navigation Collapse Navigation RESOURCES About Reddit Advertise Developer Platform Reddit Pro BETA Help Blog Careers Press Best of Reddit Reddit Rules Privacy Policy User Agreement Your Privacy Choices Accessibility Reddit, Inc. © 2026. All rights reserved. Image 5 [...] # "DeepSee…
[12] Deepseek v4 models are out and here are benchmarks !( 4 versions)reddit.com
Local hosting needs planning but pays off for privacy and removing token limits. Start by testing a compact quantized model on the target hardware, pick a backend that matches your team needs (easy UX vs deep control), and design predictable latency and model-loading behavior so users have a smooth experience. ### LLaMA Hosting Communities See Answer Top tools for optimizing AI model performance How to fine-tune LLaMA for specific tasks Common challenges in local AI deployment Innovative applications of LLaMA in business Image 2: Llama Image 3: Llama Public Anyone can view, post, and comment…
[13] DeepSeek V4 and Kimi K2.6 - Models - NVIDIA Developer Forumsforums.developer.nvidia.com
| Deepseek: Extract Reasoning Only NVIDIA Nemotron nim | 1 | 600 | February 18, 2025 | | Hope, dream NVIDIA Nemotron | 0 | 273 | February 29, 2024 | | DeepSeek Models - newbie python programmer - calling the wizards out there (you know who you are) DGX Spark / GB10 Projects deepseek | 4 | 138 | April 14, 2026 | [...] | Topic | | Replies | Views | Activity | --- --- | Issues on deepseek models, Models deepseek | 1 | 148 | November 14, 2025 | | DeepSeek V3.2 API returns 404 error today after yesterday's work Models deepseek | 0 | 244 | December 28, 2025 | | Deepseek-v3.2: Function 'xxx': Not fo…
[14] DeepSeek V4 Pro (Reasoning, High Effort) vs Kimi K2.6artificialanalysis.ai
| Metric | DeepSeek logoDeepSeek V4 Pro (Reasoning, High Effort) | Kimi logoKimi K2.6 | Analysis | --- --- | | Creator | DeepSeek | Kimi | | | Context Window | 1000k tokens (~1500 A4 pages of size 12 Arial font) | 256k tokens (~384 A4 pages of size 12 Arial font) | DeepSeek V4 Pro (Reasoning, High Effort) is larger than Kimi K2.6 | | Release Date | April, 2026 | April, 2026 | DeepSeek V4 Pro (Reasoning, High Effort) has a more recent release date than Kimi K2.6 | | Parameters | 1600B, 49B active at inference time | 1000B, 32B active at inference time | DeepSeek V4 Pro (Reasoning, High Effort)…
[15] DeepSeek_V4.pdfhuggingface.co
Overall, DeepSeek-V4 series retain the Transformer (Vaswani et al., 2017) architecture and Multi-Token Prediction (MTP) modules (DeepSeek-AI, 2024; Gloeckle et al., 2024), while introducing several key upgrades over DeepSeek-V3: (1) firstly, we introduce the Manifold-Constrained Hyper-Connections ( mHC) (Xie et al., 2026) to strengthen conventional residual connections; 6(2) secondly, we design a hybrid attention architecture, which greatly improves long-context efficiency through Compressed Sparse Attention and Heavily Compressed Attention. (3) thirdly, we employ Muon (Jordan et al., 2024; L…
[16] DeepSeek V4 Preview Releaseapi-docs.deepseek.com
Image 8: WeChat QRcode Community Email Discord Twitter More GitHub Copyright © 2026 DeepSeek, Inc.
[17] Deepseek v4: Best Opensource Model Ever? (Fully Tested) - YouTubeyoutube.com
to tool breakdowns? 📌 Tags (comma separated): deepseek v4, deepseek v4 pro, deepseek v4 flash, deepseek ai, open source ai, llm, large language models, ai coding, agentic ai, ai agents, kimi k2.6, glm 5.1, ai comparison, ai benchmarks, coding ai, ai tools 2026, deepseek review, ai model test, open weight models, ai development Hashtags: #DeepSeek#AI#LLM#OpenSourceAI#AICoding#Tech#ArtificialIntelligence#MachineLearning#AIModels#AgenticAI…...more [...] # Deepseek v4: Best Opensource Model Ever? (Fully Tested) Image 7 WorldofAI WorldofAI 214K subscribers Join Subscribe Subscribed 455 Share Save…
[18] [AINews] DeepSeek V4 Pro (1.6T-A49B) and Flash (284B-A13B ...latent.space
DeepSeek released DeepSeek-V4 Pro and DeepSeek-V4 Flash, its first major architecture refresh since V3 and first clear two-tier lineup, with 1M-token context, hybrid reasoning/non-reasoning modes, an MIT license, and a technical report detailed enough that multiple researchers called it one of the most important or best-written model papers of the year. Across the reactions, the factual consensus is that V4 materially advances open-weight long-context and agentic coding performance while remaining somewhat behind the top closed frontier models overall. Independent benchmarkers place V4 Pro ar…
[19] DeepSeek V4 Pro (Reasoning, Max Effort) vs Kimi K2 0905: Model Comparisonartificialanalysis.ai
| Metric | DeepSeek logoDeepSeek V4 Pro (Reasoning, Max Effort) | Kimi logoKimi K2 0905 | Analysis | --- --- | | Creator | DeepSeek | Kimi | | | Context Window | 1000k tokens (~1500 A4 pages of size 12 Arial font) | 256k tokens (~384 A4 pages of size 12 Arial font) | DeepSeek V4 Pro (Reasoning, Max Effort) is larger than Kimi K2 0905 | | Release Date | April, 2026 | September, 2025 | DeepSeek V4 Pro (Reasoning, Max Effort) has a more recent release date than Kimi K2 0905 | | Parameters | 1600B, 49B active at inference time | 1000B, 32B active at inference time | DeepSeek V4 Pro (Reasoning, Ma…
[20] deepseek-ai/DeepSeek-V4-Pro - Hugging Facehuggingface.co
| Opus-4.6 Max | GPT-5.4 xHigh | Gemini-3.1-Pro High | K2.6 Thinking | GLM-5.1 Thinking | DS-V4-Pro Max | :---: :---: :---: | Knowledge & Reasoning | | | | | | | | MMLU-Pro (EM) | 89.1 | 87.5 | 91.0 | 87.1 | 86.0 | 87.5 | | SimpleQA-Verified (Pass@1) | 46.2 | 45.3 | 75.6 | 36.9 | 38.1 | 57.9 | | Chinese-SimpleQA (Pass@1) | 76.4 | 76.8 | 85.9 | 75.9 | 75.0 | 84.4 | | GPQA Diamond (Pass@1) | 91.3 | 93.0 | 94.3 | 90.5 | 86.2 | 90.1 | | HLE (Pass@1) | 40.0 | 39.8 | 44.4 | 36.4 | 34.7 | 37.7 | | LiveCodeBench (Pass@1) | 88.8 91.7 | 89.6 93.5 | | Codeforces (Rating) 3168 | 3052 - | 3206 | | HMM…
[21] DeepSeek-V4-Pro-Max: Pricing, Benchmarks & Performancellm-stats.com
Benchmarks GPQA MMLU MMLU-Pro AIME 2025 MATH HumanEval MMMU LiveCodeBench IFEval GSM8K SWE-Bench Verified ### Models Gemini 3 Pro Grok-4 Heavy GPT-5.1 Grok-4 Qwen3-235B-A22B-Thinking DeepSeek-R1-0528 GLM-4.6 GPT OSS 120B ### Resources Playground Blog News Community API Infrastructure © 2026 llm-stats About usPrivacy policyTerms of service []( Feedback Exam [...] Show 16 more Self-reported by the model provider. Score may not be independently verified. ### Similar Models How DeepSeek-V4-Pro-Max compares to models with the closest performance across key benchmarks. DeepSeek-V4-Pro-MaxQwen3.…
[22] Kimi K2.6: The new leading open weights model - Artificial Analysisartificialanalysis.ai
➤ Multimodality: Kimi K2.6 supports Image and Video input and text output natively. The model’s max context length remains 256k. Kimi K2.6 has significantly higher token usage than Kimi K2.5. Kimi K2.5 scores 6 on the AA-Omniscience Index, primarily driven by low hallucination rate. Here’s the full suite of Kimi K2.6 evaluation results: See Artificial Analysis for further details and benchmarks of Kimi K2.6: Want to dive deeper? Discuss this model with our Discord community: ## Read the latest ### Opus 4.7: Everything you need to know Benchmarks and Analysis of Opus 4.7 April 17, 2026 ### Sub…
[23] Underwhelming or underrated? DeepSeek V4 shows “impressive ...scmp.com
The company’s most advanced system, V4 Pro, ranked second among the world’s leading open-source models, behind Beijing-based Moonshot AI’s Kimi K2.6, benchmark firm Artificial Analysis said in a report on Friday. While V4 Pro marked a clear improvement on its predecessor, V3.2, it still lagged top competitors. The results highlight the challenges facing DeepSeek as China races to narrow the AI gap with the US, amid intensifying competition at home and abroad as well as ongoing constraints on computing power. Still, analysts noted that V4 delivered meaningful technical progress. [...] # Underw…
[24] LiveBenchlivebench.ai
| GPT-5.3 Codex High | OpenAI | 72.76 | 80.15 | 78.18 | 55.00 | 87.84 | 62.69 | 80.09 | 65.38 | | Gemini 3 Flash Preview High | Google | 72.40 | 74.55 | 73.90 | 40.00 | 84.17 | 74.77 | 84.56 | 74.86 | | Kimi K2.6 Thinking | Moonshot AI | 72.17 | 79.38 | 78.57 | 58.33 | 84.28 | 65.13 | 75.14 | 64.36 | | GPT-5.1 High | OpenAI | 72.04 | 78.79 | 72.49 | 53.33 | 86.90 | 69.61 | 79.26 | 63.90 | | Qwen 3.6 Plus | Alibaba | 70.85 | 75.83 | 78.18 | 55.00 | 83.72 | 69.91 | 74.99 | 58.34 | | GPT-5 Pro | OpenAI | 70.48 | 81.69 | 72.11 | 51.67 | 86.17 | 57.04 | 80.69 | 63.96 | | GPT-5.4 Nano xHigh | OpenA…
[25] The Definitive LLM Selection & Benchmarks Guideiternal.ai
| Kimi K2.5 | Moonshot AI | 1T MoE | Open-weight | Coding, agentic (Agent Swarm up to 100 agents), vision | SWE-bench 76.8%; HumanEval 99.0%; GPQA 87.6%; HLE 51.8% (tools) | | MiniMax M2.7 | MiniMax | ~230B MoE | Proprietary | Self-evolving agent, office productivity, coding | SWE-bench 78%; GDPval-AA 1495 Elo; released March 18, 2026 | | Step-3.5-Flash | StepFun | 196B (11B active MoE) | Open-weight | Ultra-fast reasoning, competitive coding | AIME 99.8%; SWE-bench 74.4%; 100-350 tok/s; 256K context | | DeepSeek R1 | DeepSeek | ~670B MoE | MIT | Deep reasoning, math, chain-of-thought | MATH-…
[26] DeepSeek V4 finally drops—and gets beaten by a smartphone company ...x.com
Kyle Chan on X: "DeepSeek V4 finally drops—and gets beaten by a smartphone company, whose AI team is led by one of DeepSeek’s former top researchers." / X Don’t miss what’s happening People on X are the first to know. Log in Sign up # []( ## Post See new posts # Conversation Image 1 Kyle Chan @kyleichan DeepSeek V4 finally drops—and gets beaten by a smartphone company, whose AI team is led by one of DeepSeek’s former top researchers. Quote Image 2 Artificial Analysis @ArtificialAnlys · Apr 24 Xiaomi’s MiMo V2.5 Pro has landed at 54 in the Artificial Analysis Intelligence Index, tied with Mo…

レポート公開URL2026年4月26日26 ソース

Kimi K2.6とDeepSeek V4のベンチマークを調査してください。コーディング、デザイン、クリエイティブコンテンツでは、どちらが優れているでしょうか？

重要なポイント

**コーディング:** DeepSeek V4が有利です。提示証拠では、DeepSeek V4が「Vibe Code Benchmark」でオープンウェイトモデル1位、Kimi K2.6が2位とされ、さらにDeepSeek V4がKimi K2.6を大きく上回るという主張があります。ただし、この根拠はReddit投稿のスニペットであり、一次ベンチマークページや詳細スコアではないため、信頼度は限定的です。
**デザイン:** Insufficient evidence. 提示証拠には、UI/UXデザイン、画像理解を伴うデザインレビュー、フロントエンド実装品質、ビジュアル生成支援などを直接比較するベンチマークがありません。
**クリエイティブコンテンツ:** Insufficient evidence. 提示証拠には、文章創作、広告コピー、長編構成、ブランドトーン再現などの品質をKimi K2.6とDeepSeek V4で直接比較する評価がありません。
**長文処理・大規模コンテキスト:** DeepSeek V4は長い仕様書、コードベース、ドキュメントを扱う用途で有利な可能性があります。Artificial Analysisの比較スニペットではDeepSeek V4 Pro/Flashが1000kトークン、Kimi K2.6が256kトークンのコンテキストウィンドウとされています。AINewsのスニペットでもDeepSeek V4系は1Mトークンコンテキストを備えるとされています。
現時点の提示証拠だけで判断すると、**コーディングはDeepSeek V4が優勢**です。ただし、**デザイン**と**クリエイティブコンテンツ**については、比較可能な専用ベンチマークが提示証拠内にないため、結論は「Insufficient evidence」です。
## Key findings

研究の答え

Key findings

コーディング: DeepSeek V4が有利です。提示証拠では、DeepSeek V4が「Vibe Code Benchmark」でオープンウェイトモデル1位、Kimi K2.6が2位とされ、さらにDeepSeek V4がKimi K2.6を大きく上回るという主張があります^[23]。ただし、この根拠はReddit投稿のスニペットであり、一次ベンチマークページや詳細スコアではないため、信頼度は限定的です^[23]。
デザイン: Insufficient evidence. 提示証拠には、UI/UXデザイン、画像理解を伴うデザインレビュー、フロントエンド実装品質、ビジュアル生成支援などを直接比較するベンチマークがありません。
クリエイティブコンテンツ: Insufficient evidence. 提示証拠には、文章創作、広告コピー、長編構成、ブランドトーン再現などの品質をKimi K2.6とDeepSeek V4で直接比較する評価がありません。
長文処理・大規模コンテキスト: DeepSeek V4は長い仕様書、コードベース、ドキュメントを扱う用途で有利な可能性があります。Artificial Analysisの比較スニペットではDeepSeek V4 Pro/Flashが1000kトークン、Kimi K2.6が256kトークンのコンテキストウィンドウとされています^[4]^[5]。AINewsのスニペットでもDeepSeek V4系は1Mトークンコンテキストを備えるとされています^[1]。
総合的な「オープンモデル」評価: Kimi K2.6については、AINewsが「world's leading Open Model」と表現し、Moonshotが2026年時点で中国のオープンモデルラボとしてリードしているという評価を掲載しています^[2]。一方で、DeepSeek V4についても、AINewsはV3以来の大きなアーキテクチャ刷新であり、1Mトークンコンテキスト、推論/非推論のハイブリッドモード、MITライセンス、詳細な技術レポートを持つとしています^[1]。

Evidence notes

DeepSeek V4のコーディング優位を示す最も直接的な証拠は「Vibe Code Benchmark」の順位主張ですが、提示証拠ではRedditスニペットに限られます^[23]。そのため、コーディングについては「DeepSeek V4が優勢」と見ますが、確度は中程度です。
コンテキスト長については、Artificial Analysisの比較スニペットがDeepSeek V4 Pro/Flashを1000kトークン、Kimi K2.6を256kトークンとしています^[4]^[5]。ただし、OpenRouterの比較スニペットでは「Max Tokens 256K」とも記載されており、プラットフォーム上の提供条件やモデル設定による差がある可能性があります^[7]。
Kimi K2.6は全体評価として高く扱われていますが、提示証拠内ではコーディング、デザイン、クリエイティブそれぞれの専用スコアが確認できません^[2]。

Limitations / uncertainty

デザインとクリエイティブコンテンツについては、提示証拠だけでは優劣を判定できません。Insufficient evidence.
コーディングの結論は、単一のベンチマーク主張に強く依存しています^[23]。一次ソースの詳細スコア、評価プロンプト、採点方法、再現性が提示証拠内にないため、断定は避けるべきです。
DeepSeek V4にはPro、Flash、Reasoning、High/Max Effortなど複数設定があり、Kimi K2.6との比較結果はモデル設定や提供プラットフォームによって変わる可能性があります^[4]^[5]^[7]。

Summary

コーディング: DeepSeek V4が優勢。ただし根拠は限定的です^[23]。
デザイン: Insufficient evidence.
クリエイティブコンテンツ: Insufficient evidence.
長文・大規模文脈処理: DeepSeek V4が有利な可能性が高いです。提示証拠ではDeepSeek V4が1Mトークン、Kimi K2.6が256kトークンとされています^[4]^[5]。
実務上の選び方: コード生成・大規模コードベース解析ならDeepSeek V4を第一候補にし、文章表現や創作系はKimi K2.6も含めて実タスクでABテストするのが妥当です。

サポートビジュアル

研究に関する会話

あなた2026年4月26日

リサーチクエスチョン

Kimi K2.6とDeepSeek V4のベンチマークを調査してください。コーディング、デザイン、クリエイティブコンテンツでは、どちらが優れているでしょうか？

Studio Global AI2026年4月26日26 ソース

引用された回答

Key findings

コーディング: DeepSeek V4が有利です。提示証拠では、DeepSeek V4が「Vibe Code Benchmark」でオープンウェイトモデル1位、Kimi K2.6が2位とされ、さらにDeepSeek V4がKimi K2.6を大きく上回るという主張があります^[23]。ただし、この根拠はReddit投稿のスニペットであり、一次ベンチマークページや詳細スコアではないため、信頼度は限定的です^[23]。
デザイン: Insufficient evidence. 提示証拠には、UI/UXデザイン、画像理解を伴うデザインレビュー、フロントエンド実装品質、ビジュアル生成支援などを直接比較するベンチマークがありません。
クリエイティブコンテンツ: Insufficient evidence. 提示証拠には、文章創作、広告コピー、長編構成、ブランドトーン再現などの品質をKimi K2.6とDeepSeek V4で直接比較する評価がありません。
長文処理・大規模コンテキスト: DeepSeek V4は長い仕様書、コードベース、ドキュメントを扱う用途で有利な可能性があります。Artificial Analysisの比較スニペットではDeepSeek V4 Pro/Flashが1000kトークン、Kimi K2.6が256kトークンのコンテキストウィンドウとされています^[4]^[5]。AINewsのスニペットでもDeepSeek V4系は1Mトークンコンテキストを備えるとされています^[1]。
総合的な「オープンモデル」評価: Kimi K2.6については、AINewsが「world's leading Open Model」と表現し、Moonshotが2026年時点で中国のオープンモデルラボとしてリードしているという評価を掲載しています^[2]。一方で、DeepSeek V4についても、AINewsはV3以来の大きなアーキテクチャ刷新であり、1Mトークンコンテキスト、推論/非推論のハイブリッドモード、MITライセンス、詳細な技術レポートを持つとしています^[1]。

Evidence notes

DeepSeek V4のコーディング優位を示す最も直接的な証拠は「Vibe Code Benchmark」の順位主張ですが、提示証拠ではRedditスニペットに限られます^[23]。そのため、コーディングについては「DeepSeek V4が優勢」と見ますが、確度は中程度です。
コンテキスト長については、Artificial Analysisの比較スニペットがDeepSeek V4 Pro/Flashを1000kトークン、Kimi K2.6を256kトークンとしています^[4]^[5]。ただし、OpenRouterの比較スニペットでは「Max Tokens 256K」とも記載されており、プラットフォーム上の提供条件やモデル設定による差がある可能性があります^[7]。
Kimi K2.6は全体評価として高く扱われていますが、提示証拠内ではコーディング、デザイン、クリエイティブそれぞれの専用スコアが確認できません^[2]。

Limitations / uncertainty

デザインとクリエイティブコンテンツについては、提示証拠だけでは優劣を判定できません。Insufficient evidence.
コーディングの結論は、単一のベンチマーク主張に強く依存しています^[23]。一次ソースの詳細スコア、評価プロンプト、採点方法、再現性が提示証拠内にないため、断定は避けるべきです。
DeepSeek V4にはPro、Flash、Reasoning、High/Max Effortなど複数設定があり、Kimi K2.6との比較結果はモデル設定や提供プラットフォームによって変わる可能性があります^[4]^[5]^[7]。

Summary

コーディング: DeepSeek V4が優勢。ただし根拠は限定的です^[23]。
デザイン: Insufficient evidence.
クリエイティブコンテンツ: Insufficient evidence.
長文・大規模文脈処理: DeepSeek V4が有利な可能性が高いです。提示証拠ではDeepSeek V4が1Mトークン、Kimi K2.6が256kトークンとされています^[4]^[5]。
実務上の選び方: コード生成・大規模コードベース解析ならDeepSeek V4を第一候補にし、文章表現や創作系はKimi K2.6も含めて実タスクでABテストするのが妥当です。

人々も尋ねます

「Kimi K2.6とDeepSeek V4のベンチマークを調査してください。コーディング、デザイン、クリエイティブコンテンツでは、どちらが優れているでしょうか？」の短い答えは何ですか?

最初に検証する重要なポイントは何ですか?

次の実践では何をすればいいでしょうか？

次にどの関連トピックを検討すればよいでしょうか?

関連ページを開く

これを何と比較すればいいでしょうか？

この回答を「Deep research & compare GPT-5.5, Claude Opus 4.7, Kimi K2.6, DeepSeek V4」と照合してください。

関連ページを開く

研究を続けてください

GPT-5.5、Claude Opus 4.7、DeepSeek V4、Kimi K2.6のベンチマークを調査し、比較してください。

Deep research & compare GPT-5.5, Claude Opus 4.7, Kimi K2.6, DeepSeek V4

請閱讀 Claude Opus 4.7 同 GPT 5.5 嘅介紹： https://www.anthropic.com/news/claude-opus-4-7 https://openai.com/index/introducing-gpt-5-5 請再搜尋更多相關資料，並就

請閱讀 Claude Opus 4.7 同 GPT 5.5 嘅介紹： https://www.anthropic.com/news/claude-opus-4-7 https://openai.com/index/introducin...

研究並查核事實：在要連續搜尋、整理、交叉比對、再修正的長流程研究任務裡，Claude Opus 4.7 跟 GPT-5.5 Spud 哪一個比較不會中途失焦、漏步驟或跑偏？

情報源

[1] [AINews] Moonshot Kimi K2.6: the world's leading Open Model ...latent.space
DeepSeek V4 rumors are back, and we learned our lesson not to get too excited, but in their deafening silence since v3.2, Moonshot has owned the crown of leading Chinese open model lab for all of 2026 to date, and K2.6 refreshes the lead that K2.5 established in January, with (presumably) more continued pre/posttraining (this time, details of how much more training were not disclosed). Comparing the numbers from the two launches 3 months apart demonstrates the staggering amount of progress: Image 4 [...] # [AINews] Moonshot Kimi K2.6: the world's leading Open Model refreshes to catch up to Op…
[2] DeepSeek V4 Flash (Reasoning, Max Effort) vs Kimi K2.6: Model Comparisonartificialanalysis.ai
| Metric | DeepSeek logoDeepSeek V4 Flash (Reasoning, Max Effort) | Kimi logoKimi K2.6 | Analysis | --- --- | | Creator | DeepSeek | Kimi | | | Context Window | 1000k tokens (~1500 A4 pages of size 12 Arial font) | 256k tokens (~384 A4 pages of size 12 Arial font) | DeepSeek V4 Flash (Reasoning, Max Effort) is larger than Kimi K2.6 | | Release Date | April, 2026 | April, 2026 | DeepSeek V4 Flash (Reasoning, Max Effort) has a more recent release date than Kimi K2.6 | | Parameters | 284B, 13B active at inference time | 1000B, 32B active at inference time | DeepSeek V4 Flash (Reasoning, Max Effo…
[3] DeepSeek V4 Pro vs Kimi K2.6 - AI Model Comparison | OpenRouteropenrouter.ai
Ready Output will appear here... ### Pricing Input$0.7448 / M tokens Output$4.655 / M tokens Images– – ### Features Input Modalities text, image Output Modalities text Quantization int4 Max Tokens (input + output)256K Max Output Tokens 66K Stream cancellation Supports Tools No Prompt Training Caching ### Artificial Analysis Kimi K2.6 Intelligence 53.9 98% Coding 47.1 95% Agentic 66.0 96% ### Design Arena Kimi K2.6 3D 1369 ELO 98% Code Categories 1333 ELO 93% Data Visualization 1322 ELO 94% Game Development 1343 ELO 94% SVG 1242 ELO 78% UI Component 1320 ELO 90% Website 1300 ELO 90% Go to mode…
[4] How to Use Kimi K2.6: Complete Guide to Moonshot AI's New 1T ...tosea.ai
On April 20, 2026, Moonshot AI released Kimi K2.6 — a 1-trillion-parameter open-source Mixture-of-Experts model positioned directly at the agentic-coding segment that Claude Opus 4.7 and GPT-5.4 have dominated through early 2026. The numbers on paper are striking: SWE-Bench Pro at 58.6% (ahead of both Opus 4.6 and GPT-5.4), Humanity's Last Exam with tools at 54.0% (ahead of both), and a 185% throughput lift over K2.5 in a real 13-hour optimization run against the exchange-core benchmark. For a weights-available Chinese model to lead US frontier labs on commercially relevant agentic benchmar…
[5] Kimi K2.6 Tech Blog: Advancing Open-Source Codingkimi.com
| Benchmark | Kimi K2.6 | GPT-5.4 (xhigh) | Claude Opus 4.6 (max effort) | Gemini 3.1 Pro (thinking high) | Kimi K2.5 | --- --- --- | | Agentic | | | HLE-Full w/ tools | 54.0 | 52.1 | 53.0 | 51.4 | 50.2 | | BrowseComp | 83.2 | 82.7 | 83.7 | 85.9 | 74.9 | | BrowseComp (agent swarm) | 86.3 | — | — | — | 78.4 | | DeepSearchQA (f1-score) | 92.5 | 78.6 | 91.3 | 81.9 | 89.0 | | DeepSearchQA (accuracy) | 83.0 | 63.7 | 80.6 | 60.2 | 77.1 | | WideSearch (item-f1) | 80.8 | — | — | — | 72.7 | | Toolathlon | 50.0 | 54.6 | 47.2 | 48.8 | 27.8 | | MCPMark | 55.9 | 62.5 | 56.7 | 55.9 | 29.5 | | Claw Eval (pa…
[6] Kimi K2.6 vs DeepSeek V4 – Side-by-Side Comparison | LLM Referencellmreference.com
LLM Reference # Kimi K2.6 vs DeepSeek V4 Side-by-side comparison of specifications, capabilities, and pricing. | | Kimi K2.6 Moonshot AI | DeepSeek V4 DeepSeek | --- | Released | 2026-04-13 | 2026-02-15 | | Context window | 262K | — | | Parameters | 1T (MoE, 32B active) | — | | Architecture | Mixture of Experts (MoE) | decoder only | | License | Proprietary | Open Source | | Knowledge cutoff | — | — | | Capabilities | | | | Vision | | | | Multimodal | | | | Reasoning | | | | Function calling | | | | Tool use | | | | Structured Outputs | | | | Code execution | | | | Availability | | | | Provid…
[7] Kimi K2.6: Pricing, Benchmarks & Performance - LLM Statsllm-stats.com
Latency 137.00 s Throughput 27 c/s Parameters 1.0T Benchmarks Examples Playground API ## Benchmarks ### Arena Performance #65 Websites #33 3D #50 Games #42 Animations #17 SVG #46 Data Viz #14 Audio ### Leaderboard Rankings #3 Reasoning #3 Search #4 Coding #5 Vision #6 Math #7 Tool Calling ### Quality Tracker 0.00σ— 54 votes 7d+2.21σ 30d+2.21σ Image 3: LLM Stats Logo Websites+0.00σ(13)Games+0.00σ(11)Animations+0.00σ(9)SVG+0.00σ(7)playground-chat+0.00σ(6)Audio+0.00σ(4)Data Viz+0.00σ(3)3D+0.00σ(1) ### Kimi K2.6 Performance Across Datasets Scores sourced from the model's scorecard, paper, or offi…
[8] LLM Coding Benchmark (April 2026): GPT 5.5, DeepSeek v4, Kimi ...akitaonrails.com
| Rank | Model | Score | Tier | RubyLLM OK | Time | Cost | --- --- --- | 1 | Claude Opus 4.7 | 97 | A | ✅ | 18m | ~$1.10 | | 1 | GPT 5.4 xHigh (Codex) | 97 | A | ✅ | 22m | ~$16 | | 3 | GPT 5.5 xHigh (Codex) | 96 | A | ✅ | 18m | ~$10 | | 4 | Kimi K2.6 | 87 | A | ✅ | 20m | ~$0.30 | | 5 | Claude Opus 4.6 | 83 | A | ✅ | 16m | ~$1.10 | | 6 | Gemini 3.1 Pro | 82 | A | ✅ | 14m | ~$0.40 | | 7 | Claude Sonnet 4.6 | 78 | B | ✅ | 16m | ~$0.63 | | 7 | DeepSeek V4 Flash | 78 | B | ✅ | 3m | ~$0.01 | | 9 | Qwen 3.6 Plus | 71 | B | ✅ | 17m | ~$0.15 | | 10 | DeepSeek V4 Pro | 69 | B | ✅ | 22m (DNF) | ~$0.50 |…
[9] moonshotai/Kimi-K2.6 - Hugging Facehuggingface.co
| OSWorld-Verified | 73.1 | 75.0 | 72.7 63.3 | | Coding | | Terminal-Bench 2.0 (Terminus-2) | 66.7 | 65.4 | 65.4 | 68.5 | 50.8 | | SWE-Bench Pro | 58.6 | 57.7 | 53.4 | 54.2 | 50.7 | | SWE-Bench Multilingual | 76.7 77.8 | 76.9 | 73.0 | | SWE-Bench Verified | 80.2 80.8 | 80.6 | 76.8 | | SciCode | 52.2 | 56.6 | 51.9 | 58.9 | 48.7 | | OJBench (python) | 60.6 60.3 | 70.7 | 54.7 | | LiveCodeBench (v6) | 89.6 88.8 | 91.7 | 85.0 | | Reasoning & Knowledge | | HLE-Full | 34.7 | 39.8 | 40.0 | 44.4 | 30.1 | | AIME 2026 | 96.4 | 99.2 | 96.7 | 98.3 | 95.8 | | HMMT 2026 (Feb) | 92.7 | 97.7 | 96.2 | 94.7 | 8…
[10] I reviewed how DeepSeek V4-Pro, Kimi 2.6, Opus 4.6, and Opus 4.7 ...news.ycombinator.com
| | | | ozgune 1 day ago | parent | context | favorite | on: DeepSeek v4 I reviewed how DeepSeek V4-Pro, Kimi 2.6, Opus 4.6, and Opus 4.7 across the same AI benchmarks. All results are for Max editions, except for Kimi. Summary: Opus 4.6 forms the baseline all three are trying to beat. DeepSeek V4-Pro roughly matches it across the board, Kimi K2.6 edges it on agentic/coding benchmarks, and Opus 4.7 surpasses it on nearly everything except web search. DeepSeek V4-Pro Max shines in competitive coding benchmarks. However, it trails both Opus models on software engineering. Kimi K2.6 is remarkabl…
[11] "DeepSeek v4 is now the #1 open-weight model on our Vibe Code Benchmark, and it’s not close. It leaves the #2 (Kimi K2.6) in the dust, and even beats out frontier closed source models like Gemini 3.1 Pro." : r/acceleratereddit.com
Ethical considerations in AI development Latest breakthroughs in machine learning Impact of AI on job markets Public Anyone can view, post, and comment to this community 0 0 Reddit RulesPrivacy PolicyUser AgreementYour Privacy ChoicesAccessibilityReddit, Inc. © 2026. All rights reserved. Expand Navigation Collapse Navigation RESOURCES About Reddit Advertise Developer Platform Reddit Pro BETA Help Blog Careers Press Best of Reddit Reddit Rules Privacy Policy User Agreement Your Privacy Choices Accessibility Reddit, Inc. © 2026. All rights reserved. Image 5 [...] # "DeepSee…
[12] Deepseek v4 models are out and here are benchmarks !( 4 versions)reddit.com
Local hosting needs planning but pays off for privacy and removing token limits. Start by testing a compact quantized model on the target hardware, pick a backend that matches your team needs (easy UX vs deep control), and design predictable latency and model-loading behavior so users have a smooth experience. ### LLaMA Hosting Communities See Answer Top tools for optimizing AI model performance How to fine-tune LLaMA for specific tasks Common challenges in local AI deployment Innovative applications of LLaMA in business Image 2: Llama Image 3: Llama Public Anyone can view, post, and comment…
[13] DeepSeek V4 and Kimi K2.6 - Models - NVIDIA Developer Forumsforums.developer.nvidia.com
| Deepseek: Extract Reasoning Only NVIDIA Nemotron nim | 1 | 600 | February 18, 2025 | | Hope, dream NVIDIA Nemotron | 0 | 273 | February 29, 2024 | | DeepSeek Models - newbie python programmer - calling the wizards out there (you know who you are) DGX Spark / GB10 Projects deepseek | 4 | 138 | April 14, 2026 | [...] | Topic | | Replies | Views | Activity | --- --- | Issues on deepseek models, Models deepseek | 1 | 148 | November 14, 2025 | | DeepSeek V3.2 API returns 404 error today after yesterday's work Models deepseek | 0 | 244 | December 28, 2025 | | Deepseek-v3.2: Function 'xxx': Not fo…
[14] DeepSeek V4 Pro (Reasoning, High Effort) vs Kimi K2.6artificialanalysis.ai
| Metric | DeepSeek logoDeepSeek V4 Pro (Reasoning, High Effort) | Kimi logoKimi K2.6 | Analysis | --- --- | | Creator | DeepSeek | Kimi | | | Context Window | 1000k tokens (~1500 A4 pages of size 12 Arial font) | 256k tokens (~384 A4 pages of size 12 Arial font) | DeepSeek V4 Pro (Reasoning, High Effort) is larger than Kimi K2.6 | | Release Date | April, 2026 | April, 2026 | DeepSeek V4 Pro (Reasoning, High Effort) has a more recent release date than Kimi K2.6 | | Parameters | 1600B, 49B active at inference time | 1000B, 32B active at inference time | DeepSeek V4 Pro (Reasoning, High Effort)…
[15] DeepSeek_V4.pdfhuggingface.co
Overall, DeepSeek-V4 series retain the Transformer (Vaswani et al., 2017) architecture and Multi-Token Prediction (MTP) modules (DeepSeek-AI, 2024; Gloeckle et al., 2024), while introducing several key upgrades over DeepSeek-V3: (1) firstly, we introduce the Manifold-Constrained Hyper-Connections ( mHC) (Xie et al., 2026) to strengthen conventional residual connections; 6(2) secondly, we design a hybrid attention architecture, which greatly improves long-context efficiency through Compressed Sparse Attention and Heavily Compressed Attention. (3) thirdly, we employ Muon (Jordan et al., 2024; L…
[16] DeepSeek V4 Preview Releaseapi-docs.deepseek.com
Image 8: WeChat QRcode Community Email Discord Twitter More GitHub Copyright © 2026 DeepSeek, Inc.
[17] Deepseek v4: Best Opensource Model Ever? (Fully Tested) - YouTubeyoutube.com
to tool breakdowns? 📌 Tags (comma separated): deepseek v4, deepseek v4 pro, deepseek v4 flash, deepseek ai, open source ai, llm, large language models, ai coding, agentic ai, ai agents, kimi k2.6, glm 5.1, ai comparison, ai benchmarks, coding ai, ai tools 2026, deepseek review, ai model test, open weight models, ai development Hashtags: #DeepSeek#AI#LLM#OpenSourceAI#AICoding#Tech#ArtificialIntelligence#MachineLearning#AIModels#AgenticAI…...more [...] # Deepseek v4: Best Opensource Model Ever? (Fully Tested) Image 7 WorldofAI WorldofAI 214K subscribers Join Subscribe Subscribed 455 Share Save…
[18] [AINews] DeepSeek V4 Pro (1.6T-A49B) and Flash (284B-A13B ...latent.space
DeepSeek released DeepSeek-V4 Pro and DeepSeek-V4 Flash, its first major architecture refresh since V3 and first clear two-tier lineup, with 1M-token context, hybrid reasoning/non-reasoning modes, an MIT license, and a technical report detailed enough that multiple researchers called it one of the most important or best-written model papers of the year. Across the reactions, the factual consensus is that V4 materially advances open-weight long-context and agentic coding performance while remaining somewhat behind the top closed frontier models overall. Independent benchmarkers place V4 Pro ar…
[19] DeepSeek V4 Pro (Reasoning, Max Effort) vs Kimi K2 0905: Model Comparisonartificialanalysis.ai
| Metric | DeepSeek logoDeepSeek V4 Pro (Reasoning, Max Effort) | Kimi logoKimi K2 0905 | Analysis | --- --- | | Creator | DeepSeek | Kimi | | | Context Window | 1000k tokens (~1500 A4 pages of size 12 Arial font) | 256k tokens (~384 A4 pages of size 12 Arial font) | DeepSeek V4 Pro (Reasoning, Max Effort) is larger than Kimi K2 0905 | | Release Date | April, 2026 | September, 2025 | DeepSeek V4 Pro (Reasoning, Max Effort) has a more recent release date than Kimi K2 0905 | | Parameters | 1600B, 49B active at inference time | 1000B, 32B active at inference time | DeepSeek V4 Pro (Reasoning, Ma…
[20] deepseek-ai/DeepSeek-V4-Pro - Hugging Facehuggingface.co
| Opus-4.6 Max | GPT-5.4 xHigh | Gemini-3.1-Pro High | K2.6 Thinking | GLM-5.1 Thinking | DS-V4-Pro Max | :---: :---: :---: | Knowledge & Reasoning | | | | | | | | MMLU-Pro (EM) | 89.1 | 87.5 | 91.0 | 87.1 | 86.0 | 87.5 | | SimpleQA-Verified (Pass@1) | 46.2 | 45.3 | 75.6 | 36.9 | 38.1 | 57.9 | | Chinese-SimpleQA (Pass@1) | 76.4 | 76.8 | 85.9 | 75.9 | 75.0 | 84.4 | | GPQA Diamond (Pass@1) | 91.3 | 93.0 | 94.3 | 90.5 | 86.2 | 90.1 | | HLE (Pass@1) | 40.0 | 39.8 | 44.4 | 36.4 | 34.7 | 37.7 | | LiveCodeBench (Pass@1) | 88.8 91.7 | 89.6 93.5 | | Codeforces (Rating) 3168 | 3052 - | 3206 | | HMM…
[21] DeepSeek-V4-Pro-Max: Pricing, Benchmarks & Performancellm-stats.com
Benchmarks GPQA MMLU MMLU-Pro AIME 2025 MATH HumanEval MMMU LiveCodeBench IFEval GSM8K SWE-Bench Verified ### Models Gemini 3 Pro Grok-4 Heavy GPT-5.1 Grok-4 Qwen3-235B-A22B-Thinking DeepSeek-R1-0528 GLM-4.6 GPT OSS 120B ### Resources Playground Blog News Community API Infrastructure © 2026 llm-stats About usPrivacy policyTerms of service []( Feedback Exam [...] Show 16 more Self-reported by the model provider. Score may not be independently verified. ### Similar Models How DeepSeek-V4-Pro-Max compares to models with the closest performance across key benchmarks. DeepSeek-V4-Pro-MaxQwen3.…
[22] Kimi K2.6: The new leading open weights model - Artificial Analysisartificialanalysis.ai
➤ Multimodality: Kimi K2.6 supports Image and Video input and text output natively. The model’s max context length remains 256k. Kimi K2.6 has significantly higher token usage than Kimi K2.5. Kimi K2.5 scores 6 on the AA-Omniscience Index, primarily driven by low hallucination rate. Here’s the full suite of Kimi K2.6 evaluation results: See Artificial Analysis for further details and benchmarks of Kimi K2.6: Want to dive deeper? Discuss this model with our Discord community: ## Read the latest ### Opus 4.7: Everything you need to know Benchmarks and Analysis of Opus 4.7 April 17, 2026 ### Sub…
[23] Underwhelming or underrated? DeepSeek V4 shows “impressive ...scmp.com
The company’s most advanced system, V4 Pro, ranked second among the world’s leading open-source models, behind Beijing-based Moonshot AI’s Kimi K2.6, benchmark firm Artificial Analysis said in a report on Friday. While V4 Pro marked a clear improvement on its predecessor, V3.2, it still lagged top competitors. The results highlight the challenges facing DeepSeek as China races to narrow the AI gap with the US, amid intensifying competition at home and abroad as well as ongoing constraints on computing power. Still, analysts noted that V4 delivered meaningful technical progress. [...] # Underw…
[24] LiveBenchlivebench.ai
| GPT-5.3 Codex High | OpenAI | 72.76 | 80.15 | 78.18 | 55.00 | 87.84 | 62.69 | 80.09 | 65.38 | | Gemini 3 Flash Preview High | Google | 72.40 | 74.55 | 73.90 | 40.00 | 84.17 | 74.77 | 84.56 | 74.86 | | Kimi K2.6 Thinking | Moonshot AI | 72.17 | 79.38 | 78.57 | 58.33 | 84.28 | 65.13 | 75.14 | 64.36 | | GPT-5.1 High | OpenAI | 72.04 | 78.79 | 72.49 | 53.33 | 86.90 | 69.61 | 79.26 | 63.90 | | Qwen 3.6 Plus | Alibaba | 70.85 | 75.83 | 78.18 | 55.00 | 83.72 | 69.91 | 74.99 | 58.34 | | GPT-5 Pro | OpenAI | 70.48 | 81.69 | 72.11 | 51.67 | 86.17 | 57.04 | 80.69 | 63.96 | | GPT-5.4 Nano xHigh | OpenA…
[25] The Definitive LLM Selection & Benchmarks Guideiternal.ai
| Kimi K2.5 | Moonshot AI | 1T MoE | Open-weight | Coding, agentic (Agent Swarm up to 100 agents), vision | SWE-bench 76.8%; HumanEval 99.0%; GPQA 87.6%; HLE 51.8% (tools) | | MiniMax M2.7 | MiniMax | ~230B MoE | Proprietary | Self-evolving agent, office productivity, coding | SWE-bench 78%; GDPval-AA 1495 Elo; released March 18, 2026 | | Step-3.5-Flash | StepFun | 196B (11B active MoE) | Open-weight | Ultra-fast reasoning, competitive coding | AIME 99.8%; SWE-bench 74.4%; 100-350 tok/s; 256K context | | DeepSeek R1 | DeepSeek | ~670B MoE | MIT | Deep reasoning, math, chain-of-thought | MATH-…
[26] DeepSeek V4 finally drops—and gets beaten by a smartphone company ...x.com
Kyle Chan on X: "DeepSeek V4 finally drops—and gets beaten by a smartphone company, whose AI team is led by one of DeepSeek’s former top researchers." / X Don’t miss what’s happening People on X are the first to know. Log in Sign up # []( ## Post See new posts # Conversation Image 1 Kyle Chan @kyleichan DeepSeek V4 finally drops—and gets beaten by a smartphone company, whose AI team is led by one of DeepSeek’s former top researchers. Quote Image 2 Artificial Analysis @ArtificialAnlys · Apr 24 Xiaomi’s MiMo V2.5 Pro has landed at 54 in the Artificial Analysis Intelligence Index, tied with Mo…

レポート公開URL2026年4月26日26 ソース

Kimi K2.6とDeepSeek V4のベンチマークを調査してください。コーディング、デザイン、クリエイティブコンテンツでは、どちらが優れているでしょうか？

重要なポイント

**コーディング:** DeepSeek V4が有利です。提示証拠では、DeepSeek V4が「Vibe Code Benchmark」でオープンウェイトモデル1位、Kimi K2.6が2位とされ、さらにDeepSeek V4がKimi K2.6を大きく上回るという主張があります。ただし、この根拠はReddit投稿のスニペットであり、一次ベンチマークページや詳細スコアではないため、信頼度は限定的です。
**デザイン:** Insufficient evidence. 提示証拠には、UI/UXデザイン、画像理解を伴うデザインレビュー、フロントエンド実装品質、ビジュアル生成支援などを直接比較するベンチマークがありません。
**クリエイティブコンテンツ:** Insufficient evidence. 提示証拠には、文章創作、広告コピー、長編構成、ブランドトーン再現などの品質をKimi K2.6とDeepSeek V4で直接比較する評価がありません。
**長文処理・大規模コンテキスト:** DeepSeek V4は長い仕様書、コードベース、ドキュメントを扱う用途で有利な可能性があります。Artificial Analysisの比較スニペットではDeepSeek V4 Pro/Flashが1000kトークン、Kimi K2.6が256kトークンのコンテキストウィンドウとされています。AINewsのスニペットでもDeepSeek V4系は1Mトークンコンテキストを備えるとされています。
現時点の提示証拠だけで判断すると、**コーディングはDeepSeek V4が優勢**です。ただし、**デザイン**と**クリエイティブコンテンツ**については、比較可能な専用ベンチマークが提示証拠内にないため、結論は「Insufficient evidence」です。
## Key findings

研究の答え

Key findings

コーディング: DeepSeek V4が有利です。提示証拠では、DeepSeek V4が「Vibe Code Benchmark」でオープンウェイトモデル1位、Kimi K2.6が2位とされ、さらにDeepSeek V4がKimi K2.6を大きく上回るという主張があります^[23]。ただし、この根拠はReddit投稿のスニペットであり、一次ベンチマークページや詳細スコアではないため、信頼度は限定的です^[23]。
デザイン: Insufficient evidence. 提示証拠には、UI/UXデザイン、画像理解を伴うデザインレビュー、フロントエンド実装品質、ビジュアル生成支援などを直接比較するベンチマークがありません。
クリエイティブコンテンツ: Insufficient evidence. 提示証拠には、文章創作、広告コピー、長編構成、ブランドトーン再現などの品質をKimi K2.6とDeepSeek V4で直接比較する評価がありません。
長文処理・大規模コンテキスト: DeepSeek V4は長い仕様書、コードベース、ドキュメントを扱う用途で有利な可能性があります。Artificial Analysisの比較スニペットではDeepSeek V4 Pro/Flashが1000kトークン、Kimi K2.6が256kトークンのコンテキストウィンドウとされています^[4]^[5]。AINewsのスニペットでもDeepSeek V4系は1Mトークンコンテキストを備えるとされています^[1]。
総合的な「オープンモデル」評価: Kimi K2.6については、AINewsが「world's leading Open Model」と表現し、Moonshotが2026年時点で中国のオープンモデルラボとしてリードしているという評価を掲載しています^[2]。一方で、DeepSeek V4についても、AINewsはV3以来の大きなアーキテクチャ刷新であり、1Mトークンコンテキスト、推論/非推論のハイブリッドモード、MITライセンス、詳細な技術レポートを持つとしています^[1]。

Evidence notes

DeepSeek V4のコーディング優位を示す最も直接的な証拠は「Vibe Code Benchmark」の順位主張ですが、提示証拠ではRedditスニペットに限られます^[23]。そのため、コーディングについては「DeepSeek V4が優勢」と見ますが、確度は中程度です。
コンテキスト長については、Artificial Analysisの比較スニペットがDeepSeek V4 Pro/Flashを1000kトークン、Kimi K2.6を256kトークンとしています^[4]^[5]。ただし、OpenRouterの比較スニペットでは「Max Tokens 256K」とも記載されており、プラットフォーム上の提供条件やモデル設定による差がある可能性があります^[7]。
Kimi K2.6は全体評価として高く扱われていますが、提示証拠内ではコーディング、デザイン、クリエイティブそれぞれの専用スコアが確認できません^[2]。

Limitations / uncertainty

デザインとクリエイティブコンテンツについては、提示証拠だけでは優劣を判定できません。Insufficient evidence.
コーディングの結論は、単一のベンチマーク主張に強く依存しています^[23]。一次ソースの詳細スコア、評価プロンプト、採点方法、再現性が提示証拠内にないため、断定は避けるべきです。
DeepSeek V4にはPro、Flash、Reasoning、High/Max Effortなど複数設定があり、Kimi K2.6との比較結果はモデル設定や提供プラットフォームによって変わる可能性があります^[4]^[5]^[7]。

Summary

コーディング: DeepSeek V4が優勢。ただし根拠は限定的です^[23]。
デザイン: Insufficient evidence.
クリエイティブコンテンツ: Insufficient evidence.
長文・大規模文脈処理: DeepSeek V4が有利な可能性が高いです。提示証拠ではDeepSeek V4が1Mトークン、Kimi K2.6が256kトークンとされています^[4]^[5]。
実務上の選び方: コード生成・大規模コードベース解析ならDeepSeek V4を第一候補にし、文章表現や創作系はKimi K2.6も含めて実タスクでABテストするのが妥当です。

サポートビジュアル

研究に関する会話

あなた2026年4月26日

リサーチクエスチョン

Kimi K2.6とDeepSeek V4のベンチマークを調査してください。コーディング、デザイン、クリエイティブコンテンツでは、どちらが優れているでしょうか？

Studio Global AI2026年4月26日26 ソース

引用された回答

Key findings

コーディング: DeepSeek V4が有利です。提示証拠では、DeepSeek V4が「Vibe Code Benchmark」でオープンウェイトモデル1位、Kimi K2.6が2位とされ、さらにDeepSeek V4がKimi K2.6を大きく上回るという主張があります^[23]。ただし、この根拠はReddit投稿のスニペットであり、一次ベンチマークページや詳細スコアではないため、信頼度は限定的です^[23]。
デザイン: Insufficient evidence. 提示証拠には、UI/UXデザイン、画像理解を伴うデザインレビュー、フロントエンド実装品質、ビジュアル生成支援などを直接比較するベンチマークがありません。
クリエイティブコンテンツ: Insufficient evidence. 提示証拠には、文章創作、広告コピー、長編構成、ブランドトーン再現などの品質をKimi K2.6とDeepSeek V4で直接比較する評価がありません。
長文処理・大規模コンテキスト: DeepSeek V4は長い仕様書、コードベース、ドキュメントを扱う用途で有利な可能性があります。Artificial Analysisの比較スニペットではDeepSeek V4 Pro/Flashが1000kトークン、Kimi K2.6が256kトークンのコンテキストウィンドウとされています^[4]^[5]。AINewsのスニペットでもDeepSeek V4系は1Mトークンコンテキストを備えるとされています^[1]。
総合的な「オープンモデル」評価: Kimi K2.6については、AINewsが「world's leading Open Model」と表現し、Moonshotが2026年時点で中国のオープンモデルラボとしてリードしているという評価を掲載しています^[2]。一方で、DeepSeek V4についても、AINewsはV3以来の大きなアーキテクチャ刷新であり、1Mトークンコンテキスト、推論/非推論のハイブリッドモード、MITライセンス、詳細な技術レポートを持つとしています^[1]。

Evidence notes

DeepSeek V4のコーディング優位を示す最も直接的な証拠は「Vibe Code Benchmark」の順位主張ですが、提示証拠ではRedditスニペットに限られます^[23]。そのため、コーディングについては「DeepSeek V4が優勢」と見ますが、確度は中程度です。
コンテキスト長については、Artificial Analysisの比較スニペットがDeepSeek V4 Pro/Flashを1000kトークン、Kimi K2.6を256kトークンとしています^[4]^[5]。ただし、OpenRouterの比較スニペットでは「Max Tokens 256K」とも記載されており、プラットフォーム上の提供条件やモデル設定による差がある可能性があります^[7]。
Kimi K2.6は全体評価として高く扱われていますが、提示証拠内ではコーディング、デザイン、クリエイティブそれぞれの専用スコアが確認できません^[2]。

Limitations / uncertainty

デザインとクリエイティブコンテンツについては、提示証拠だけでは優劣を判定できません。Insufficient evidence.
コーディングの結論は、単一のベンチマーク主張に強く依存しています^[23]。一次ソースの詳細スコア、評価プロンプト、採点方法、再現性が提示証拠内にないため、断定は避けるべきです。
DeepSeek V4にはPro、Flash、Reasoning、High/Max Effortなど複数設定があり、Kimi K2.6との比較結果はモデル設定や提供プラットフォームによって変わる可能性があります^[4]^[5]^[7]。

Summary

コーディング: DeepSeek V4が優勢。ただし根拠は限定的です^[23]。
デザイン: Insufficient evidence.
クリエイティブコンテンツ: Insufficient evidence.
長文・大規模文脈処理: DeepSeek V4が有利な可能性が高いです。提示証拠ではDeepSeek V4が1Mトークン、Kimi K2.6が256kトークンとされています^[4]^[5]。
実務上の選び方: コード生成・大規模コードベース解析ならDeepSeek V4を第一候補にし、文章表現や創作系はKimi K2.6も含めて実タスクでABテストするのが妥当です。

人々も尋ねます

「Kimi K2.6とDeepSeek V4のベンチマークを調査してください。コーディング、デザイン、クリエイティブコンテンツでは、どちらが優れているでしょうか？」の短い答えは何ですか?

最初に検証する重要なポイントは何ですか?

次の実践では何をすればいいでしょうか？

次にどの関連トピックを検討すればよいでしょうか?

関連ページを開く

これを何と比較すればいいでしょうか？

この回答を「Deep research & compare GPT-5.5, Claude Opus 4.7, Kimi K2.6, DeepSeek V4」と照合してください。

関連ページを開く

研究を続けてください

GPT-5.5、Claude Opus 4.7、DeepSeek V4、Kimi K2.6のベンチマークを調査し、比較してください。

Deep research & compare GPT-5.5, Claude Opus 4.7, Kimi K2.6, DeepSeek V4

請閱讀 Claude Opus 4.7 同 GPT 5.5 嘅介紹： https://www.anthropic.com/news/claude-opus-4-7 https://openai.com/index/introducing-gpt-5-5 請再搜尋更多相關資料，並就

請閱讀 Claude Opus 4.7 同 GPT 5.5 嘅介紹： https://www.anthropic.com/news/claude-opus-4-7 https://openai.com/index/introducin...

研究並查核事實：在要連續搜尋、整理、交叉比對、再修正的長流程研究任務裡，Claude Opus 4.7 跟 GPT-5.5 Spud 哪一個比較不會中途失焦、漏步驟或跑偏？

情報源

[1] [AINews] Moonshot Kimi K2.6: the world's leading Open Model ...latent.space
DeepSeek V4 rumors are back, and we learned our lesson not to get too excited, but in their deafening silence since v3.2, Moonshot has owned the crown of leading Chinese open model lab for all of 2026 to date, and K2.6 refreshes the lead that K2.5 established in January, with (presumably) more continued pre/posttraining (this time, details of how much more training were not disclosed). Comparing the numbers from the two launches 3 months apart demonstrates the staggering amount of progress: Image 4 [...] # [AINews] Moonshot Kimi K2.6: the world's leading Open Model refreshes to catch up to Op…
[2] DeepSeek V4 Flash (Reasoning, Max Effort) vs Kimi K2.6: Model Comparisonartificialanalysis.ai
| Metric | DeepSeek logoDeepSeek V4 Flash (Reasoning, Max Effort) | Kimi logoKimi K2.6 | Analysis | --- --- | | Creator | DeepSeek | Kimi | | | Context Window | 1000k tokens (~1500 A4 pages of size 12 Arial font) | 256k tokens (~384 A4 pages of size 12 Arial font) | DeepSeek V4 Flash (Reasoning, Max Effort) is larger than Kimi K2.6 | | Release Date | April, 2026 | April, 2026 | DeepSeek V4 Flash (Reasoning, Max Effort) has a more recent release date than Kimi K2.6 | | Parameters | 284B, 13B active at inference time | 1000B, 32B active at inference time | DeepSeek V4 Flash (Reasoning, Max Effo…
[3] DeepSeek V4 Pro vs Kimi K2.6 - AI Model Comparison | OpenRouteropenrouter.ai
Ready Output will appear here... ### Pricing Input$0.7448 / M tokens Output$4.655 / M tokens Images– – ### Features Input Modalities text, image Output Modalities text Quantization int4 Max Tokens (input + output)256K Max Output Tokens 66K Stream cancellation Supports Tools No Prompt Training Caching ### Artificial Analysis Kimi K2.6 Intelligence 53.9 98% Coding 47.1 95% Agentic 66.0 96% ### Design Arena Kimi K2.6 3D 1369 ELO 98% Code Categories 1333 ELO 93% Data Visualization 1322 ELO 94% Game Development 1343 ELO 94% SVG 1242 ELO 78% UI Component 1320 ELO 90% Website 1300 ELO 90% Go to mode…
[4] How to Use Kimi K2.6: Complete Guide to Moonshot AI's New 1T ...tosea.ai
On April 20, 2026, Moonshot AI released Kimi K2.6 — a 1-trillion-parameter open-source Mixture-of-Experts model positioned directly at the agentic-coding segment that Claude Opus 4.7 and GPT-5.4 have dominated through early 2026. The numbers on paper are striking: SWE-Bench Pro at 58.6% (ahead of both Opus 4.6 and GPT-5.4), Humanity's Last Exam with tools at 54.0% (ahead of both), and a 185% throughput lift over K2.5 in a real 13-hour optimization run against the exchange-core benchmark. For a weights-available Chinese model to lead US frontier labs on commercially relevant agentic benchmar…
[5] Kimi K2.6 Tech Blog: Advancing Open-Source Codingkimi.com
| Benchmark | Kimi K2.6 | GPT-5.4 (xhigh) | Claude Opus 4.6 (max effort) | Gemini 3.1 Pro (thinking high) | Kimi K2.5 | --- --- --- | | Agentic | | | HLE-Full w/ tools | 54.0 | 52.1 | 53.0 | 51.4 | 50.2 | | BrowseComp | 83.2 | 82.7 | 83.7 | 85.9 | 74.9 | | BrowseComp (agent swarm) | 86.3 | — | — | — | 78.4 | | DeepSearchQA (f1-score) | 92.5 | 78.6 | 91.3 | 81.9 | 89.0 | | DeepSearchQA (accuracy) | 83.0 | 63.7 | 80.6 | 60.2 | 77.1 | | WideSearch (item-f1) | 80.8 | — | — | — | 72.7 | | Toolathlon | 50.0 | 54.6 | 47.2 | 48.8 | 27.8 | | MCPMark | 55.9 | 62.5 | 56.7 | 55.9 | 29.5 | | Claw Eval (pa…
[6] Kimi K2.6 vs DeepSeek V4 – Side-by-Side Comparison | LLM Referencellmreference.com
LLM Reference # Kimi K2.6 vs DeepSeek V4 Side-by-side comparison of specifications, capabilities, and pricing. | | Kimi K2.6 Moonshot AI | DeepSeek V4 DeepSeek | --- | Released | 2026-04-13 | 2026-02-15 | | Context window | 262K | — | | Parameters | 1T (MoE, 32B active) | — | | Architecture | Mixture of Experts (MoE) | decoder only | | License | Proprietary | Open Source | | Knowledge cutoff | — | — | | Capabilities | | | | Vision | | | | Multimodal | | | | Reasoning | | | | Function calling | | | | Tool use | | | | Structured Outputs | | | | Code execution | | | | Availability | | | | Provid…
[7] Kimi K2.6: Pricing, Benchmarks & Performance - LLM Statsllm-stats.com
Latency 137.00 s Throughput 27 c/s Parameters 1.0T Benchmarks Examples Playground API ## Benchmarks ### Arena Performance #65 Websites #33 3D #50 Games #42 Animations #17 SVG #46 Data Viz #14 Audio ### Leaderboard Rankings #3 Reasoning #3 Search #4 Coding #5 Vision #6 Math #7 Tool Calling ### Quality Tracker 0.00σ— 54 votes 7d+2.21σ 30d+2.21σ Image 3: LLM Stats Logo Websites+0.00σ(13)Games+0.00σ(11)Animations+0.00σ(9)SVG+0.00σ(7)playground-chat+0.00σ(6)Audio+0.00σ(4)Data Viz+0.00σ(3)3D+0.00σ(1) ### Kimi K2.6 Performance Across Datasets Scores sourced from the model's scorecard, paper, or offi…
[8] LLM Coding Benchmark (April 2026): GPT 5.5, DeepSeek v4, Kimi ...akitaonrails.com
| Rank | Model | Score | Tier | RubyLLM OK | Time | Cost | --- --- --- | 1 | Claude Opus 4.7 | 97 | A | ✅ | 18m | ~$1.10 | | 1 | GPT 5.4 xHigh (Codex) | 97 | A | ✅ | 22m | ~$16 | | 3 | GPT 5.5 xHigh (Codex) | 96 | A | ✅ | 18m | ~$10 | | 4 | Kimi K2.6 | 87 | A | ✅ | 20m | ~$0.30 | | 5 | Claude Opus 4.6 | 83 | A | ✅ | 16m | ~$1.10 | | 6 | Gemini 3.1 Pro | 82 | A | ✅ | 14m | ~$0.40 | | 7 | Claude Sonnet 4.6 | 78 | B | ✅ | 16m | ~$0.63 | | 7 | DeepSeek V4 Flash | 78 | B | ✅ | 3m | ~$0.01 | | 9 | Qwen 3.6 Plus | 71 | B | ✅ | 17m | ~$0.15 | | 10 | DeepSeek V4 Pro | 69 | B | ✅ | 22m (DNF) | ~$0.50 |…
[9] moonshotai/Kimi-K2.6 - Hugging Facehuggingface.co
| OSWorld-Verified | 73.1 | 75.0 | 72.7 63.3 | | Coding | | Terminal-Bench 2.0 (Terminus-2) | 66.7 | 65.4 | 65.4 | 68.5 | 50.8 | | SWE-Bench Pro | 58.6 | 57.7 | 53.4 | 54.2 | 50.7 | | SWE-Bench Multilingual | 76.7 77.8 | 76.9 | 73.0 | | SWE-Bench Verified | 80.2 80.8 | 80.6 | 76.8 | | SciCode | 52.2 | 56.6 | 51.9 | 58.9 | 48.7 | | OJBench (python) | 60.6 60.3 | 70.7 | 54.7 | | LiveCodeBench (v6) | 89.6 88.8 | 91.7 | 85.0 | | Reasoning & Knowledge | | HLE-Full | 34.7 | 39.8 | 40.0 | 44.4 | 30.1 | | AIME 2026 | 96.4 | 99.2 | 96.7 | 98.3 | 95.8 | | HMMT 2026 (Feb) | 92.7 | 97.7 | 96.2 | 94.7 | 8…
[10] I reviewed how DeepSeek V4-Pro, Kimi 2.6, Opus 4.6, and Opus 4.7 ...news.ycombinator.com
| | | | ozgune 1 day ago | parent | context | favorite | on: DeepSeek v4 I reviewed how DeepSeek V4-Pro, Kimi 2.6, Opus 4.6, and Opus 4.7 across the same AI benchmarks. All results are for Max editions, except for Kimi. Summary: Opus 4.6 forms the baseline all three are trying to beat. DeepSeek V4-Pro roughly matches it across the board, Kimi K2.6 edges it on agentic/coding benchmarks, and Opus 4.7 surpasses it on nearly everything except web search. DeepSeek V4-Pro Max shines in competitive coding benchmarks. However, it trails both Opus models on software engineering. Kimi K2.6 is remarkabl…
[11] "DeepSeek v4 is now the #1 open-weight model on our Vibe Code Benchmark, and it’s not close. It leaves the #2 (Kimi K2.6) in the dust, and even beats out frontier closed source models like Gemini 3.1 Pro." : r/acceleratereddit.com
Ethical considerations in AI development Latest breakthroughs in machine learning Impact of AI on job markets Public Anyone can view, post, and comment to this community 0 0 Reddit RulesPrivacy PolicyUser AgreementYour Privacy ChoicesAccessibilityReddit, Inc. © 2026. All rights reserved. Expand Navigation Collapse Navigation RESOURCES About Reddit Advertise Developer Platform Reddit Pro BETA Help Blog Careers Press Best of Reddit Reddit Rules Privacy Policy User Agreement Your Privacy Choices Accessibility Reddit, Inc. © 2026. All rights reserved. Image 5 [...] # "DeepSee…
[12] Deepseek v4 models are out and here are benchmarks !( 4 versions)reddit.com
Local hosting needs planning but pays off for privacy and removing token limits. Start by testing a compact quantized model on the target hardware, pick a backend that matches your team needs (easy UX vs deep control), and design predictable latency and model-loading behavior so users have a smooth experience. ### LLaMA Hosting Communities See Answer Top tools for optimizing AI model performance How to fine-tune LLaMA for specific tasks Common challenges in local AI deployment Innovative applications of LLaMA in business Image 2: Llama Image 3: Llama Public Anyone can view, post, and comment…
[13] DeepSeek V4 and Kimi K2.6 - Models - NVIDIA Developer Forumsforums.developer.nvidia.com
| Deepseek: Extract Reasoning Only NVIDIA Nemotron nim | 1 | 600 | February 18, 2025 | | Hope, dream NVIDIA Nemotron | 0 | 273 | February 29, 2024 | | DeepSeek Models - newbie python programmer - calling the wizards out there (you know who you are) DGX Spark / GB10 Projects deepseek | 4 | 138 | April 14, 2026 | [...] | Topic | | Replies | Views | Activity | --- --- | Issues on deepseek models, Models deepseek | 1 | 148 | November 14, 2025 | | DeepSeek V3.2 API returns 404 error today after yesterday's work Models deepseek | 0 | 244 | December 28, 2025 | | Deepseek-v3.2: Function 'xxx': Not fo…
[14] DeepSeek V4 Pro (Reasoning, High Effort) vs Kimi K2.6artificialanalysis.ai
| Metric | DeepSeek logoDeepSeek V4 Pro (Reasoning, High Effort) | Kimi logoKimi K2.6 | Analysis | --- --- | | Creator | DeepSeek | Kimi | | | Context Window | 1000k tokens (~1500 A4 pages of size 12 Arial font) | 256k tokens (~384 A4 pages of size 12 Arial font) | DeepSeek V4 Pro (Reasoning, High Effort) is larger than Kimi K2.6 | | Release Date | April, 2026 | April, 2026 | DeepSeek V4 Pro (Reasoning, High Effort) has a more recent release date than Kimi K2.6 | | Parameters | 1600B, 49B active at inference time | 1000B, 32B active at inference time | DeepSeek V4 Pro (Reasoning, High Effort)…
[15] DeepSeek_V4.pdfhuggingface.co
Overall, DeepSeek-V4 series retain the Transformer (Vaswani et al., 2017) architecture and Multi-Token Prediction (MTP) modules (DeepSeek-AI, 2024; Gloeckle et al., 2024), while introducing several key upgrades over DeepSeek-V3: (1) firstly, we introduce the Manifold-Constrained Hyper-Connections ( mHC) (Xie et al., 2026) to strengthen conventional residual connections; 6(2) secondly, we design a hybrid attention architecture, which greatly improves long-context efficiency through Compressed Sparse Attention and Heavily Compressed Attention. (3) thirdly, we employ Muon (Jordan et al., 2024; L…
[17] Deepseek v4: Best Opensource Model Ever? (Fully Tested) - YouTubeyoutube.com
to tool breakdowns? 📌 Tags (comma separated): deepseek v4, deepseek v4 pro, deepseek v4 flash, deepseek ai, open source ai, llm, large language models, ai coding, agentic ai, ai agents, kimi k2.6, glm 5.1, ai comparison, ai benchmarks, coding ai, ai tools 2026, deepseek review, ai model test, open weight models, ai development Hashtags: #DeepSeek#AI#LLM#OpenSourceAI#AICoding#Tech#ArtificialIntelligence#MachineLearning#AIModels#AgenticAI…...more [...] # Deepseek v4: Best Opensource Model Ever? (Fully Tested) Image 7 WorldofAI WorldofAI 214K subscribers Join Subscribe Subscribed 455 Share Save…
[18] [AINews] DeepSeek V4 Pro (1.6T-A49B) and Flash (284B-A13B ...latent.space
DeepSeek released DeepSeek-V4 Pro and DeepSeek-V4 Flash, its first major architecture refresh since V3 and first clear two-tier lineup, with 1M-token context, hybrid reasoning/non-reasoning modes, an MIT license, and a technical report detailed enough that multiple researchers called it one of the most important or best-written model papers of the year. Across the reactions, the factual consensus is that V4 materially advances open-weight long-context and agentic coding performance while remaining somewhat behind the top closed frontier models overall. Independent benchmarkers place V4 Pro ar…
[19] DeepSeek V4 Pro (Reasoning, Max Effort) vs Kimi K2 0905: Model Comparisonartificialanalysis.ai
| Metric | DeepSeek logoDeepSeek V4 Pro (Reasoning, Max Effort) | Kimi logoKimi K2 0905 | Analysis | --- --- | | Creator | DeepSeek | Kimi | | | Context Window | 1000k tokens (~1500 A4 pages of size 12 Arial font) | 256k tokens (~384 A4 pages of size 12 Arial font) | DeepSeek V4 Pro (Reasoning, Max Effort) is larger than Kimi K2 0905 | | Release Date | April, 2026 | September, 2025 | DeepSeek V4 Pro (Reasoning, Max Effort) has a more recent release date than Kimi K2 0905 | | Parameters | 1600B, 49B active at inference time | 1000B, 32B active at inference time | DeepSeek V4 Pro (Reasoning, Ma…
[20] deepseek-ai/DeepSeek-V4-Pro - Hugging Facehuggingface.co
| Opus-4.6 Max | GPT-5.4 xHigh | Gemini-3.1-Pro High | K2.6 Thinking | GLM-5.1 Thinking | DS-V4-Pro Max | :---: :---: :---: | Knowledge & Reasoning | | | | | | | | MMLU-Pro (EM) | 89.1 | 87.5 | 91.0 | 87.1 | 86.0 | 87.5 | | SimpleQA-Verified (Pass@1) | 46.2 | 45.3 | 75.6 | 36.9 | 38.1 | 57.9 | | Chinese-SimpleQA (Pass@1) | 76.4 | 76.8 | 85.9 | 75.9 | 75.0 | 84.4 | | GPQA Diamond (Pass@1) | 91.3 | 93.0 | 94.3 | 90.5 | 86.2 | 90.1 | | HLE (Pass@1) | 40.0 | 39.8 | 44.4 | 36.4 | 34.7 | 37.7 | | LiveCodeBench (Pass@1) | 88.8 91.7 | 89.6 93.5 | | Codeforces (Rating) 3168 | 3052 - | 3206 | | HMM…
[21] DeepSeek-V4-Pro-Max: Pricing, Benchmarks & Performancellm-stats.com
Benchmarks GPQA MMLU MMLU-Pro AIME 2025 MATH HumanEval MMMU LiveCodeBench IFEval GSM8K SWE-Bench Verified ### Models Gemini 3 Pro Grok-4 Heavy GPT-5.1 Grok-4 Qwen3-235B-A22B-Thinking DeepSeek-R1-0528 GLM-4.6 GPT OSS 120B ### Resources Playground Blog News Community API Infrastructure © 2026 llm-stats About usPrivacy policyTerms of service []( Feedback Exam [...] Show 16 more Self-reported by the model provider. Score may not be independently verified. ### Similar Models How DeepSeek-V4-Pro-Max compares to models with the closest performance across key benchmarks. DeepSeek-V4-Pro-MaxQwen3.…
[22] Kimi K2.6: The new leading open weights model - Artificial Analysisartificialanalysis.ai
➤ Multimodality: Kimi K2.6 supports Image and Video input and text output natively. The model’s max context length remains 256k. Kimi K2.6 has significantly higher token usage than Kimi K2.5. Kimi K2.5 scores 6 on the AA-Omniscience Index, primarily driven by low hallucination rate. Here’s the full suite of Kimi K2.6 evaluation results: See Artificial Analysis for further details and benchmarks of Kimi K2.6: Want to dive deeper? Discuss this model with our Discord community: ## Read the latest ### Opus 4.7: Everything you need to know Benchmarks and Analysis of Opus 4.7 April 17, 2026 ### Sub…
[23] Underwhelming or underrated? DeepSeek V4 shows “impressive ...scmp.com
The company’s most advanced system, V4 Pro, ranked second among the world’s leading open-source models, behind Beijing-based Moonshot AI’s Kimi K2.6, benchmark firm Artificial Analysis said in a report on Friday. While V4 Pro marked a clear improvement on its predecessor, V3.2, it still lagged top competitors. The results highlight the challenges facing DeepSeek as China races to narrow the AI gap with the US, amid intensifying competition at home and abroad as well as ongoing constraints on computing power. Still, analysts noted that V4 delivered meaningful technical progress. [...] # Underw…
[24] LiveBenchlivebench.ai
| GPT-5.3 Codex High | OpenAI | 72.76 | 80.15 | 78.18 | 55.00 | 87.84 | 62.69 | 80.09 | 65.38 | | Gemini 3 Flash Preview High | Google | 72.40 | 74.55 | 73.90 | 40.00 | 84.17 | 74.77 | 84.56 | 74.86 | | Kimi K2.6 Thinking | Moonshot AI | 72.17 | 79.38 | 78.57 | 58.33 | 84.28 | 65.13 | 75.14 | 64.36 | | GPT-5.1 High | OpenAI | 72.04 | 78.79 | 72.49 | 53.33 | 86.90 | 69.61 | 79.26 | 63.90 | | Qwen 3.6 Plus | Alibaba | 70.85 | 75.83 | 78.18 | 55.00 | 83.72 | 69.91 | 74.99 | 58.34 | | GPT-5 Pro | OpenAI | 70.48 | 81.69 | 72.11 | 51.67 | 86.17 | 57.04 | 80.69 | 63.96 | | GPT-5.4 Nano xHigh | OpenA…
[25] The Definitive LLM Selection & Benchmarks Guideiternal.ai
| Kimi K2.5 | Moonshot AI | 1T MoE | Open-weight | Coding, agentic (Agent Swarm up to 100 agents), vision | SWE-bench 76.8%; HumanEval 99.0%; GPQA 87.6%; HLE 51.8% (tools) | | MiniMax M2.7 | MiniMax | ~230B MoE | Proprietary | Self-evolving agent, office productivity, coding | SWE-bench 78%; GDPval-AA 1495 Elo; released March 18, 2026 | | Step-3.5-Flash | StepFun | 196B (11B active MoE) | Open-weight | Ultra-fast reasoning, competitive coding | AIME 99.8%; SWE-bench 74.4%; 100-350 tok/s; 256K context | | DeepSeek R1 | DeepSeek | ~670B MoE | MIT | Deep reasoning, math, chain-of-thought | MATH-…
[26] DeepSeek V4 finally drops—and gets beaten by a smartphone company ...x.com
Kyle Chan on X: "DeepSeek V4 finally drops—and gets beaten by a smartphone company, whose AI team is led by one of DeepSeek’s former top researchers." / X Don’t miss what’s happening People on X are the first to know. Log in Sign up # []( ## Post See new posts # Conversation Image 1 Kyle Chan @kyleichan DeepSeek V4 finally drops—and gets beaten by a smartphone company, whose AI team is led by one of DeepSeek’s former top researchers. Quote Image 2 Artificial Analysis @ArtificialAnlys · Apr 24 Xiaomi’s MiMo V2.5 Pro has landed at 54 in the Artificial Analysis Intelligence Index, tied with Mo…