OpenAI positioned GPT-5.6 Sol as a new frontier in three key domains: coding, biology, and cybersecurity .
Terminal-Bench 2.1 tests command-line workflows that require multi-step planning, tool coordination, and iteration . The benchmark has 89 complex programming tasks
. Results include:
| Model | Score |
|---|---|
| GPT-5.6 Sol Ultra | 91.9% |
| GPT-5.6 Sol (max) | 88.8% |
| Claude Mythos 5 | 88.0% |
| GPT-5.6 Terra | 84.3% |
| Claude Fable 5 | 84.3% |
| GPT-5.5 | 83.4% |
| GPT-5.6 Luna | 82.5% |
GPT-5.6 Sol Ultra set a new state of the art at 91.9% . The standard Sol score of 88.8% edges out Anthropic's restricted frontier model, Claude Mythos 5, at 88.0% by nearly a full point
.
On GeneBench v1, a benchmark evaluating long-horizon genomics and quantitative-biology analysis tasks, OpenAI reports that Sol achieved stronger results than GPT-5.5 while using fewer output tokens . This represents a meaningful efficiency improvement for scientific research workflows.
On ExploitBench, a cybersecurity research benchmark, GPT-5.6 Sol nearly matched the performance of Anthropic's Mythos Preview while using roughly one-third the output tokens .
On ExploitGym, a benchmark built by UC Berkeley researchers in collaboration with OpenAI and other frontier AI labs, all three GPT-5.6 models showed improved cybersecurity capabilities as reasoning increased .
Importantly, OpenAI states that GPT-5.6 Sol does not cross the Cyber Critical threshold under its Preparedness Framework . In evaluations involving Chromium and Firefox, the model identified bugs and exploitation primitives — the building blocks of an exploit — but did not autonomously produce a functional full-chain exploit under the conditions tested
. The full GPT-5.6 model series was internally rated as "High" risk (for cybersecurity and bioweapon capabilities) but not the highest "Critical" level
.
OpenAI says GPT-5.6 Sol launches with its "most robust safety stack yet" . The safety approach includes:
During the preview, some prompts may be slowed or blocked for extra review as OpenAI fine-tunes false-positive and false-negative rates .
The rollout of GPT-5.6 is unlike any previous OpenAI release. At the request of the U.S. government, OpenAI is initially limiting access to a small group of trusted partners and organizations — Axios reported that the preview includes around 20 approved companies — while the model undergoes additional national security reviews .
The preview is not a broad self-service program. During this period, GPT-5.6 Sol, Terra, and Luna are available only through the OpenAI API and Codex to this limited group . The models are not available in ChatGPT during the preview
. OpenAI says broader availability in ChatGPT, Codex, and the API is planned "in the coming weeks"
.
OpenAI stated clearly that it views the government-gated approach as a temporary measure: "We believe in broad access, and this process should not become the long-term default" . In an internal memo, CEO Sam Altman told staff the government would be "approving access customer by customer during this preview period," with a wider release hoped for a couple of weeks later
.
This came out of talks with the Office of the National Cyber Director and the Office of Science and Technology Policy , reflecting a new frontier-model framework being tested by the Trump administration
.
| Model | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
| GPT-5.6 Sol | $5.00 | $30.00 |
| GPT-5.6 Terra | $2.50 | $15.00 |
| GPT-5.6 Luna | $1.00 | $6.00 |
Sol's pricing matches GPT-5.5's pricing, while Terra is roughly 2x cheaper than GPT-5.5 . For context, Sol is priced closer to Claude Opus 4.8 ($5/$25) than to Anthropic's restricted Mythos 5 ($10/$50)
.
OpenAI also announced that GPT-5.6 Sol will be deployed on Cerebras hardware in July , with inference speeds of up to 750 tokens per second
.
The GPT-5.6 family marks a significant departure from previous OpenAI launches. The three-tier packaging (Sol, Terra, Luna) introduces durable branding that decouples model series from capability tiers. The benchmark results — particularly Sol's state-of-the-art coding score on Terminal-Bench 2.1 and its efficiency gains on ExploitBench — demonstrate meaningful advances, especially in cybersecurity and biology. But the most defining feature of this launch may be the government-required access restrictions, which represent a new paradigm for frontier AI deployment.
Comments
0 comments