| Anthropic |
| $5.00 |
| $0.50 |
| $25.00 |
| 1M tokens |
| 128K tokens |
| GPT-5.5 | OpenAI | $5.00 | $0.50 | $30.00 | Short-context tier; long-context surcharge above ~272K tokens | 128K tokens |
| GPT-5.5 Pro | OpenAI | $30.00 | — | $180.00 | Short-context tier; long-context surcharge above ~272K tokens | 128K tokens |
| Gemini 3.5 Flash | $1.50 | $0.15 | $9.00 | 1,048,576 tokens | 65,536 tokens |
| Grok 4.3 | xAI | $1.25 | $0.20 | $2.50 | 1M tokens | — |
| DeepSeek V4 Flash | DeepSeek | $0.14 | $0.028 | $0.28 | 1M tokens | 384K tokens |
| DeepSeek V4 Pro | DeepSeek | $1.74 | $0.145 | $3.48 | 1M tokens | 384K tokens |
Claude Opus 4.8 and 4.7 share the same standard rates as Opus 4.6, continuing Anthropic's pattern since the Opus 4.5 generation . However, Opus 4.7 introduced a new tokenizer that can produce up to 35% more tokens for the same input text compared to Opus 4.6, which effectively raises the cost of identical prompts even though the per-token price is unchanged
. GPT-5.5's pricing, meanwhile, doubles for input and increases 1.5x for output once a request exceeds roughly 272K tokens, a tiered structure that can surprise users who aren't monitoring prompt length
. Grok 4.3 applies a similar approach: standard rates apply for requests up to 200K tokens, after which prices double
.
DeepSeek V4 Flash stands apart as the least expensive model in this group by a wide margin. At $0.14 input and $0.28 output per million tokens, it is roughly 97% cheaper than GPT-5.5 on output and nearly 70% cheaper than Grok 4.3 on input, making it a leading choice for high-volume agentic workloads .
Prompt caching is the single most effective way to lower per-request costs across all providers. When prompts share a common prefix—like system instructions or long conversation histories—cached input tokens are billed at a fraction of the standard rate.
Anthropic, OpenAI, and Google all converge on a roughly 90% discount for cached input. DeepSeek lists its V4 Flash cache-hit rate at $0.028, which is an 80% reduction from the $0.14 cache-miss price . Grok 4.3's cached rate was introduced at $0.20 per million tokens for requests under 200K tokens
. For workloads with repetitive prompts, these caching tiers can easily cut monthly API bills by half or more.
Batch APIs offer another major cost lever, generally halving standard per-token prices in exchange for slower turnaround times.
Context window size and maximum output tokens influence both capability and cost. A larger context window means more input tokens per request, which directly multiplies the bill.
Comments
0 comments