答案已發布2 個月前Last edited 上個月16 個來源

Cursor Composer 2.5：效能、價格與 AI 程式模型競賽的新局面

Cursor 的 Composer 2.5 於 2026 年 5 月 18 日推出，在 SWE‑Bench Multilingual 得分 79.8%，Terminal‑Bench 2.0 為 69.3%，整體表現接近 Claude Opus 4.7 與 GPT‑5.5。模型主要針對長時間 AI 軟體工程任務設計，例如跨檔案修改、程式碼規劃、終端操作與測試迭代。

使用 Studio Global AI 搜尋並查證事實瀏覽更多熱門頁面

Illustration representing Cursor Composer 2.5 competing with other frontier AI coding models — Cursor Composer 2.5: Benchmarks, Pricing, and How It Stacks Up to Claude Opus 4.7 and GPT‑5.5Cursor’s Composer 2.5 aims to deliver frontier‑level coding performance while dramatically lowering the cost of running AI coding agents.
AI 提示詞
Create a landscape editorial hero image for this Studio Global article: Cursor Composer 2.5: Benchmarks, Pricing, and How It Stacks Up to Claude Opus 4.7 and GPT‑5.5. Article summary: Cursor’s Composer 2.5 is an in‑house coding model released May 18, 2026 that scores about 79.8% on SWE‑Bench Multilingual and 69.3% on Terminal‑Bench 2.0—roughly matching Claude Opus 4.7 on some benchmarks while costi.... Topic tags: cursor, ai coding, developer tools, ai models, benchmarks. Reference image context from search candidates: Reference image 1: visual subject "Composer 2.5 matches Opus 4.7 and GPT-5.5 on CursorBench 3.1 but costs less than a dollar per task - compared to up to eleven dollars for the competition. | Image: Cursor" source context "Cursor's Composer 2.5 matches Opus 4.7 and GPT-5.5 benchmarks ..." Reference image 2: visual subject "Composer 2.5 vs Opus | The Results Are Brutal Merv
openai.com

Cursor 的新一代程式 AI：Composer 2.5

AI 程式助手 Cursor 背後公司 Anysphere 在 2026 年 5 月 18 日推出了新的模型 Composer 2.5。這是一個專門為「AI 軟體工程代理（agentic software engineering）」設計的模型，目標不是只生成單段程式碼，而是能在整個開發流程中持續工作。

在 Cursor IDE 中，它可以執行一系列實際開發操作，例如：

在大型程式庫中搜尋與理解程式碼
同時修改多個檔案
執行終端機指令
編譯、測試並迭代修正

與前一代相比，官方表示 Composer 2.5 在長時間任務的穩定度、複雜指令遵循能力以及與開發者協作的體驗上都有明顯提升。

這也反映 AI 程式工具的一個重要轉變：從「自動補全或產生片段」進化為可以完成完整開發流程的代理型 AI。

與 Claude Opus 4.7、GPT‑5.5 的基準測試比較

Cursor 公布的基準測試顯示，Composer 2.5 在部分軟體工程評估中已接近目前最強的通用模型。主要成績包括：

SWE‑Bench Multilingual：79.8%（Composer 2.5）、80.5%（Claude Opus 4.7）、77.8%（GPT‑5.5）
Terminal‑Bench 2.0：69.3%（Composer 2.5）、69.4%（Claude Opus 4.7）、82.7%（GPT‑5.5）
CursorBench v3.1：63.2%（Composer 2.5）

這些數據透露出幾個重點：

1. SWE‑Bench Multilingual
這個基準測試會要求 AI 修復真實 GitHub issue（跨多種程式語言）。在這項測試中，Composer 2.5 的 79.8% 幾乎與 Opus 4.7 持平，甚至略高於 GPT‑5.5。

2. Terminal‑Bench 2.0
這個測試專門評估 AI 在終端環境中執行開發任務的能力，例如建置、測試與部署。Composer 2.5 與 Opus 4.7 幾乎同分，但仍明顯落後 GPT‑5.5。

3. 與上一代的進步
例如 SWE‑Bench Multilingual 從 73.7% 提升到 79.8%，顯示新版本在實際工程任務上的成功率顯著提高。

整體來看，Composer 2.5 已經進入與頂級模型同一性能層級，但在某些代理任務上仍未全面領先。

價格策略：真正的競爭優勢

Composer 2.5 最引人注意的地方其實是價格。

標準版本：

$0.50 / 百萬輸入 tokens
$2.50 / 百萬輸出 tokens

另外提供更快的版本：

$3.00 / 百萬輸入 tokens
$15.00 / 百萬輸出 tokens

相比之下，一些報導估計 Claude Opus 的價格大約是：

約 $5 / 百萬輸入 tokens
約 $25 / 百萬輸出 tokens

換句話說，Composer 2.5 的標準版本輸出成本可能只有 Opus 的 十分之一。

這個差距之所以重要，是因為 AI 程式代理通常非常耗 token。一個任務可能包含：

搜尋整個 repo
制定修改計畫
編輯多個檔案
執行測試
根據錯誤重新修正

每個步驟都可能觸發多次模型呼叫。因此 token 成本直接決定 AI 開發工具的經濟可行性。

模型架構與訓練方式

Composer 2.5 建立在 Moonshot AI 的 Kimi K2.5 開放權重模型之上，再由 Cursor 進行大量額外訓練。

報導指出其訓練流程包括：

使用 比上一代多約 25 倍的合成程式任務
約 85% 的計算資源投入到額外訓練與強化學習，而不是僅依賴基礎模型能力。

所謂「合成任務」通常是模擬完整開發流程，例如：

規劃程式修改
更新多個檔案
執行測試
依結果迭代

透過大量重複練習這類流程，模型更容易在真實專案中穩定完成長鏈任務。

Cursor 的長期策略：降低對大型模型供應商的依賴

Composer 2.5 的推出也反映 Cursor 的戰略轉變。

早期 Cursor 的 AI 功能主要依賴 OpenAI、Anthropic 和 Google 的模型。建立自家模型後，情況開始改變。

擁有自己的模型堆疊意味著：

降低長時間代理任務的推理成本
減少對外部 AI API 的依賴
能更精準地為 IDE 工作流程調整模型行為

這一點尤其重要，因為像 Anthropic 的 Claude Code 這類產品，本身就擁有「模型 + 工具」的垂直整合優勢。

透過 Composer 系列模型，Cursor 正試圖從單純的 IDE 工具，轉變為 同時擁有 AI 模型與開發平台的公司。

總結

Composer 2.5 並沒有在所有基準測試上全面超越 GPT‑5.5 或 Claude Opus 4.7，但它展現了一個不同的競爭策略：

接近前沿模型的程式能力 + 顯著更低的推理成本。

如果 Cursor 能持續提升自家模型能力，同時保持這樣的價格優勢，AI 軟體開發工具的成本結構——尤其是長時間運作的 AI 程式代理——可能會因此被重新定義。

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

使用 Studio Global AI 搜尋並查證事實

大家也會問