This focus reflects a broader shift in AI coding tools—from simple autocomplete or snippet generation to persistent agents that can execute full development workflows.
Cursor reports several benchmark results that place Composer 2.5 within the same general performance tier as leading frontier models.
Key reported scores include:
These results suggest a nuanced picture:
Overall, the benchmarks suggest Composer 2.5 is competitive with top models on some software engineering tasks, though it does not consistently outperform them across all agent evaluations.
The most striking aspect of the release is pricing.
Composer 2.5 is priced at approximately:
A faster variant is available at $3.00 per million input tokens and $15.00 per million output tokens, still competitive with fast tiers from other frontier models.
For comparison, some reports estimate that Claude Opus models can cost around $5 per million input tokens and $25 per million output tokens, meaning the standard Composer tier can be dramatically cheaper—particularly for output tokens.
This matters because agentic coding workflows consume very large numbers of tokens. A single task might involve repository search, planning steps, editing code, compiling, and executing tests, each triggering additional model calls.
Lower token prices allow Cursor to run many more reasoning steps per task without dramatically increasing costs.
Composer 2.5 builds on Moonshot AI’s Kimi K2.5 open‑weight checkpoint, which Cursor then extends through additional training tailored to software engineering tasks.
Reports about the training approach indicate that the model used:
Synthetic tasks allow the model to repeatedly practice structured development workflows—planning edits, modifying code, running tests, and iterating—helping improve reliability on real engineering problems.
Composer 2.5 also reflects a broader strategic shift inside Cursor.
Early versions of the IDE relied heavily on external AI providers such as OpenAI, Anthropic, and Google to power coding features. Developing competitive in‑house models changes that dynamic.
Owning more of the model stack provides several advantages:
This is especially important as competitors like Anthropic’s Claude Code benefit from tight integration between the underlying model and the coding agent itself.
By developing its own Composer models, Cursor is attempting to compete more directly in that integrated model‑plus‑tool category rather than simply routing requests to third‑party AI systems.
Composer 2.5 does not clearly dominate the frontier across all benchmarks. GPT‑5.5 still leads in some agent evaluations, and Claude Opus 4.7 remains highly competitive.
What makes the model notable is the combination of near‑frontier coding performance and dramatically lower cost. If Cursor continues improving its in‑house models while maintaining this pricing advantage, it could significantly shift the economics of AI‑assisted software development—especially for long‑running coding agents operating directly inside the IDE.
Comments
0 comments