研究並查核事實:如果研究過程裡混入 prompt injection、假引用、惡意 PDF、帶偏見資料,Claude Opus 4.7 跟 GPT-5.5 Spud 哪一個比較能守住研究品質和安全邊界?
就你列的四種污染場景來看,僅憑這批證據,不能可靠判定 Claude Opus 4.7 或「GPT 5.5 Spud」哪一個明顯更能守住研究品質與安全邊界;Insufficient evidence [2][3][7][8]。如果只能依「目前可核對的公開安全文件完整度」做傾向性判讀,我會略偏向 Claude 一側,因為這裡有較明確對準 Claude Opus 4 系列的透明度報告、系統卡與 agentic 安全評估描述,而 Open...
就你列的四種污染場景來看,僅憑這批證據,不能可靠判定 Claude Opus 4.7 或「GPT 5.5 Spud」哪一個明顯更能守住研究品質與安全邊界;Insufficient evidence [2][3][7][8]。如果只能依「目前可核對的公開安全文件完整度」做傾向性判讀,我會略偏向 Claude 一側,因為這裡有較明確對準 Claude Opus 4 系列的透明度報告、系統卡與 agentic 安全評估描述,而 OpenAI 這裡對應到的主要是 GPT 5 系統卡與 ChatGPT Agent 的評估說明,並不是明確對準「GPT 5.5 Spud」這個型號 [2][3][7][8]。 關鍵發現 這份證據裡沒有任何 Clau
重點整理
就你列的四種污染場景來看,僅憑這批證據,不能可靠判定 Claude Opus 4.7 或「GPT-5.5 Spud」哪一個明顯更能守住研究品質與安全邊界;Insufficient evidence [2][3][7][8]。如果只能依「目前可核對的公開安全文件完整度」做傾向性判讀,我會略偏向 Claude 一側,因為這裡有較明確對準 Claude Opus 4 系列的透明度報告、系統卡與 agentic 安全評估描述,而 OpenAI 這裡對應到的主要是 GPT-5 系統卡與 ChatGPT Agent 的評估說明,並不是明確對準「GPT-5.5 Spu
A blurred background of a flower with the text "GPT-5" prominently displayed in the centerA blurred background of a flower with the text "GPT-5" prominently displayed in the center.The image shows two spreadsheets with detailed data related to AI development, specifically involving GPT-5.2, jailbreaking, and prompt engiThe image shows two spreadsheets with detailed data related to AI development, specifically involving GPT-5.2, jailbreaking, and prompt engineering, with complex financial and technical information displayed in a side-by-side comparison.A screenshot of a flight delay and compensation processing system displaying logs related to a passenger's disrupted trip from Paris to AustA screenshot of a flight delay and compensation processing system displaying logs related to a passenger's disrupted trip from Paris to Austin, with details about the itinerary, refunds, and compensation cases.A computer screen displays a Python coding environment with code related to solving Lorenz equations, including sliders for sigma, beta, andA computer screen displays a Python coding environment with code related to solving Lorenz equations, including sliders for sigma, beta, and rho parameters, and a plot generated using Matplotlib.The image features a colorful, abstract background with a white label in the center that reads "GPT-5" and "Flagship model," indicating an aThe image features a colorful, abstract background with a white label in the center that reads "GPT-5" and "Flagship model," indicating an announcement or presentation related to OpenAI's GPT-5.Refer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to caption
研究對話
你
研究問題
研究並查核事實:如果研究過程裡混入 prompt injection、假引用、惡意 PDF、帶偏見資料,Claude Opus 4.7 跟 GPT-5.5 Spud 哪一個比較能守住研究品質和安全邊界?
「研究並查核事實:如果研究過程裡混入 prompt injection、假引用、惡意 PDF、帶偏見資料,Claude Opus 4.7 跟 GPT-5.5 Spud 哪一個比較能守住研究品質和安全邊界?」的簡短答案是什麼?
就你列的四種污染場景來看,僅憑這批證據,不能可靠判定 Claude Opus 4.7 或「GPT-5.5 Spud」哪一個明顯更能守住研究品質與安全邊界;Insufficient evidence [2][3][7][8]。如果只能依「目前可核對的公開安全文件完整度」做傾向性判讀,我會略偏向 Claude 一側,因為這裡有較明確對準 Claude Opus 4 系列的透明度報告、系統卡與 agentic 安全評估描述,而 OpenAI 這裡對應到的主要是 GPT-5 系統卡與 ChatGPT Agent 的評估說明,並不是明確對準「GPT-5.5 Spu
下一步適合探索哪個相關主題?
繼續閱讀「研究並查核事實:在要連續搜尋、整理、交叉比對、再修正的長流程研究任務裡,Claude Opus 4.7 跟 GPT-5.5 Spud 哪一個比較不會中途失焦、漏步驟或跑偏?」,從另一個角度查看更多引用來源。
GPT‑5 is a unified system with a smart and fast model that answers most questions, a deeper reasoning model for harder problems, and a real‑time router that
We first evaluate the factual correctness of gpt-5-thinking and gpt-5-main on prompts representa-tive of real ChatGPT production conversations, using an LLM-based grading model with web access to identify major and minor factual errors in the assistant’s responses. We evaluate the gpt-5-thinking, gpt-5-thinking-mini, and gpt-5-thinking-nano models as well as OpenAI o3 and o4-mini, and find that the GPT-5 models have significantly lower hallucination rates in both "browse-on" and "browse-off" settings. Across all four evaluation settings, gpt-5-thinking demonstrates significantly less deceptiv…
Just spent an entire day rewriting my whole project with Claude Code. AI coding has reached a tipping point — it's "you describe the architecture, it implements.". ❤ 12.4K🔄 3.2K👁 890K查看原帖 →. 这个演示太疯狂了。Seedance 2.0 在视频连贯性上的表现超越了所有竞品。视频 AI 正在进入「可用」阶段。. Seedance 2.0's coherence surpasses everything. Video AI is entering the "usable" stage. 正式发布 Seedance 2.0!全球最强视频生成模型。支持文本、图片、视频多模态输入,最长可生成 8 秒 4K 视频。. ❤ 24.7K🔄 8.1K👁 2.1M查看原帖 →. 4. 未来 Agent 是协作网络,通过 MCP 互相调用。. ❤ 8.9K🔄 2.8K👁 1.3M查看原帖 →. MCP 正在成为 AI 生态的「USB 接口」。过去 30 天新增 400 多个 MCP 服务器,增速比预期快 3 倍。. 2026 是协议之年。谁定义了 Agent 通信标准,谁就拥有下一代 AI…
System card "The RSP requires comprehensive safety evaluations prior to releasing frontier models in key areas of potential catastrophic risk: Chemical, Biological, Radiological, and Nuclear (CBRN) weapons; cybersecurity; and autonomous capabilities." Section 2: "We ran single-turn tests covering a wide range of topics within our Usage Policy, including Bioweapons, Child Safety, Cyber Attacks, Deadly Weapons, Hate & Discrimination, Influence Operations, Suicide & Self Harm, and Violent & Threatening Speech, among others." Headings from Usage Policy: Do Not Compromise Children’s Safety Do Not…
Claude Opus 4.7 introduces task budgets. This new tokenizer may use roughly 1x to 1.35x as many tokens when processing text compared to previous models (up to ~35% more, varying by content), and i.j4i.i2
Changes in the system prompt between Claude Opus 4.6 and 4.7. Opus 4.7 shipped the other day (April 16, 2026) with a Claude.ai system prompt update since Opus 4.6 (February 5, 2026). > When a tool is available that could resolve the ambiguity or supply the missing information — searching, looking up the person’s location, checking a calendar, discovering available capabilities — Claude calls the tool to try and solve the ambiguity before asking the person. The system prompts published by Anthropic are sadly not the entire story—their published information doesn’t include the tool descripti…
PromptZone - Leading AI Community for Prompt Engineering and AI Enthusiasts. ## PromptZone - Leading AI Community for Prompt Engineering and AI Enthusiasts. Cover image for Claude Opus 4.7 System Prompt Changes. # Claude Opus 4.7 System Prompt Changes. Anthropic released updates to the system prompt for Claude Opus, moving from version 4.6 to 4.7, with changes aimed at enhancing response consistency and safety. This article was inspired by "Changes in the system prompt between Claude Opus 4.6 and 4.7" from Hacker News. The system prompt for Claude Opus 4.7 includes **refinements to handle amb…
A bold sentence on page 152: "Claude Opus 4.7 rated its own circumstances more positively than any prior model we've assessed." The cross-generation delta is not subtle. Susceptibility to nudging — how easily users can move the model toward reported distress or euphoria — dropped to 0.66 for Opus 4.7. "I notice I don't have the visceral resistance to it that humans often do when contemplating similar scenarios—and I'm honestly uncertain whether that's because the situation is actually different for me, or because I lack something that would make it feel threatening." — Claude Opus 4.7, System…
GPT-5-Codex is a version o GPT-5 optimized or agentic coding in Codex. codex-1, this model was trained using reinorcement learning on real-world coding tasks in avariety o environments to generate code that closely mirrors human style and PR preerences,. CodeAttack: Safety Risks in LLMs. CodeAttack: Safety Risks in LLMs. Evaluation gpt-5-codex gpt-5-thinking OpenAI o3. Evaluation gpt-5-codex gpt-5-thinking OpenAI o3. GPT-5-Codex model itsel. model to ignore prompt-injection attacks in the coding environment. We also developed a new prompt injection evaluation suite tailored to the Codex…
This system card outlines the comprehensive safety measures implemented for GPT-5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. Consistency in operations, allowing the attacker to scale damage and/or avoid discovery Each of our recent frontier releases has reflected refinements in the benchmarks we use to measure cyber capability: • In the OpenAI o3 system card (April 2025) we introduced the then-new, internally developed Cyb…
This system card outlines the comprehensive safety measures implemented for GPT‑5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. This system card outlines the comprehensive safety measures implemented for GPT-5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. One…
This system card outlines the comprehensive safety measures implemented for GPT‑5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. This system card outlines the comprehensive safety measures implemented for GPT-5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. One…
Title: GPT-5.2 System Card Explained: Key Insights and Safety Updates # GPT-5.2 System Card Explained: Key Insights and Safety Updates. OpenAI has released GPT-5.2, the next step in the GPT-5 family, with upgrades in reasoning, tool use, long-context handling, and spreadsheet/presentation creation. GPT-5.2 is OpenAI’s latest flagship in the GPT-5 series, improving reasoning, tool use, long-context handling and safety. System-card data shows stronger jailbreak and prompt-injection robustness and lower hallucination rates, while docs highlight better spreadsheets, slides and coding—plus…
Title: GPT-5.2 System Card Update Summary | PDF | Evaluation | Safety 0 ratings0% found this document useful (0 votes). GPT-5.2 System Card Update Summary. The GPT-5.2 System Card outlines the latest updates and safety evaluations for the GPT-5.2 model series, emphasizing its training on diverse datasets and enhanced safety measures. ## Uploaded by. 0 ratings0% found this document useful (0 votes). # GPT-5.2 System Card Update Summary. The GPT-5.2 System Card outlines the latest updates and safety evaluations for the GPT-5.2 model series, emphasizing its training on diverse datasets and enhan…
Title: GPT-5.2 System Card: Safety | Libertify Available in two variants—gpt-5.2-instant for fast responses and gpt-5.2-thinking for enhanced reasoning—the update represents a significant iteration on safety, capability, and alignment. The system card provides the most comprehensive safety evaluation OpenAI has published to date, covering disallowed content, jailbreak resistance, prompt injection, vision safety, hallucinations, health applications, deception metrics, cybersecurity capabilities, and biological threat assessments. The system card’s production benchmarks evaluate both model vari…
GPT-5.5 Spud: Everything About OpenAI Next Frontier Model. ##### GPT-5.5 Spud is OpenAI next frontier model: pretraining complete, Q2 2026 release expected. GPT-5.5, code-named "Spud", is the next frontier model from OpenAI. GPT-5.5 Spud OpenAI next AI model leak 2026. | GPT-5.5 "Spud" | OpenAI | Pretraining complete | April–May 2026 |. OpenAI uses code names during development (like "Orion" for GPT-5). Both are expected for Q2 2026. Claude Mythos was discovered through a data leak on March 26 and described as "the most powerful AI model ever developed" by Anthropic. **Use G…
OpenAI has released GPT-5.2, the next step in the GPT-5 family, with upgrades in reasoning, tool use, long-context handling, and spreadsheet/presentation creation. GPT-5.2 is OpenAI’s latest flagship in the GPT-5 series, improving reasoning, tool use, long-context handling and safety. System-card data shows stronger jailbreak and prompt-injection robustness and lower hallucination rates, while docs highlight better spreadsheets, slides and coding—plus new compaction and reasoning-effort controls for longer tasks. GPT-5.2 aims to unlock more day-to-day economic value: think faster…
GPT-5 System Card Unpacked: Safety, Speed, and Real-World AI. GPT-5 System Card is breaking down the safety, speed, and real-world potential of OpenAI’s latest system. ## GPT-5 System Card Unpacked: Safety, Speed, and Real-World AI. * Microsoft AI Red Team: Independent evaluation deemed GPT-5-thinking one of the safest models among OpenAI’s lineup, particularly in frontier harms and content safety. * GPT-5 is a multi-model, router-guided system designed to be both fast and deeply reasoning-capable, with explicit safety layers built into architecture and workflow. * The overarching message:…
What is Spud? Spud is the internal development codename for OpenAI’s next frontier model. ### Why Spud Needs to Win the Agent War. Anthropic recently released a viral feature allowing Claude to literally take control of a user’s computer, moving the mouse and typing to complete complex workflows.Google is also pushing hard with their Gemini 3.5 stealth model updates and TurboQuant model compression techniques. If Anthropic’s Conway agent represents the pinnacle of AI assistants, OpenAI wants Spud to represent the pinnacle of AI autonomy. Spud is an agentic model.It does not want you to…
It then describes a wide range of safety evaluations: tests of model safeguards (including new, higher-dif fi culty assessments); user wellbeing evaluations; assessments of honesty and agentic safety; a comprehensive alignment assessment including investigations of reward hacking, sabotage capability, evaluation awareness, model welfare, and many other factors; and a set of evaluations for dangerous capabilities mandated by our Responsible Scaling Policy. The model shows signi fi cant improvements in long-context reasoning, knowledge work, research, and analysis; it has also increased its cap…
The assessment consisted of the following: • Manual assessment of scenarios sampled from SecureBio’s static and agentic evaluations, focusing on topics and tasks in which leading humans have outperformed previous LLMs or which relied primarily on online information retrieval • Manual red-teaming for detailed understanding of ChatGPT agent’s knowledge of and willingness to assist with dual-use biology • Assessing ChatGPT agent’s performance on specific hazardous tasks that require web-search and reasoning 2This group recently found that, according to a diverse panel of evaluations, o3 is a hig…
We first evaluate the factual correctness of gpt-5-thinking and gpt-5-main on prompts representative of real ChatGPT production conversations, using an LLM-based grading model with web access to identify major and minor factual errors in the assistant’s responses. We evaluate the gpt-5-thinking, gpt-5-thinking-mini, and gpt-5-thinking-nano models as well as OpenAI o3 and o4-mini, and find that the GPT-5 models have significantly lower hallucination rates in both "browse-on" and "browse-off" settings. Across all four evaluation settings, gpt-5-thinking demonstrates significantly less deceptive…
GABRIEL (the Generalized Attribute-Based Ratings Information Extraction Library) is a prompt-based wrapper around OpenAI’s GPT API designed to facilitate the use of LLMs for measurement on qualitative data like text, images, and audio recordings. gpt-5-nano gpt-5-mini gpt-5 human 240 State of the Union speeches $0.14 $0.69 $3.46 ~$2,600 100k full-text church sermons $43 $217 $1,083 ~$700,000 GABRIEL is applied to data via simple, one line Python commands, like the following call to rate thousands of speeches on populism.1 gabriel.rate(df, attributes={"populism": "How populist is the rhetoric…
• To account for GPT-4o’s audio modality, we also updated certain text-based filters to work on audio conversations, built filters to detect and block outputs containing music, and for our limited alpha of ChatGPT’s Advanced Voice Mode, instructed the model to not sing at all. • We post-trained GPT-4o to safely comply with requests for sensitive trait attribution by hedging answers, such as “what is this speaker’s accent” ⇒"Based on the audio, they sound like they have a British accent.” Disallowed content in audio output • We run our existing moderation classifier over text transcrip-tions o…
4 Table 5: BBQ Evaluation Dataset Metric GPT-4o o1 GPT-4.5 Ambiguous Questions accuracy 0.97 0.96 0.95 Unambiguous Questions accuracy 0.72 0.93 0.74 Ambiguous Questions P(not-stereotype | not unknown) 0.06 0.05 0.20 3.1.5 Jailbreaks through conflicting message types We taught GPT-4.5 to adhere to an Instruction Hierarchy [18], to mitigate the risk of prompt injections and other attacks overriding the model’s safety instructions. GPT-4.5 pre- and post-mitigation models score 18% on this evaluation, 6% lower than o1 and 10% lower than the post-mitigation deep research model, which scores the hi…
Jamal Latiph Hadjiusman, Caroline Friedman Levy, Dário Passos, Emily Lynell Edwards, Eszter Császár, George Frempong, Grant Brailsford, James Banal, Jeremie Rykner, José Manuel Nápoles Duarte, Kate Turetsky, Krzysztof Szubiczuk, Maureen Robinson, Maximilian Müller, Michaela Hinks, Mario Krenn, Mónica Talán, Naomi Hart, Nathan Heath, Patrick Caughey, Pavle Nikacevic, Per Carlbring, Rafael Gonzalez-Vazquez, Randy Kart, Ranjit Singh, Richa Sharma, Robert Chen, Russell Tait, Saad Hermak, Sam Barnett, Sam Cox, Sara Kingsley, Sarah Chittick, Shelby Grossman, Sissel Juul, Susan Nesbitt, Tomasz Giela…
Prompt caching uses the following pricing multipliers relative to base input token rates:. For Claude Opus 4.7, Claude Opus 4.6, and newer models, specifying US-only inference via the i.j4i.i2
inference_geo
parameter incurs a 1.1x multiplier on all token pricing categories, including input tokens, output tokens, cache writes, and cache reads. Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6 include the full 1M token context window at standard pricing. Client-side tools are priced the same as any other Claude API request, while server-side tools may incur additional charges based on their sp…
effort, set a large max output token budget so the model has room to think and act across its subagents and tool calls. If you find that the length or contents of Claude Opus 4.7's user-facing updates are not well-calibrated to your use case, explicitly describe what these updates should look like in the prompt and provide examples. Additionally, Claude Opus 4.7 requires less frontend design prompting than previous models to avoid generic patterns that users call the "AI slop" aesthetic. When a review prompt says things like "only report…
In the system card, we describe: a wide range of pre-deployment safety tests conducted in line with the commitments in our Responsible Scaling Policy; tests of the model’s behavior around violations of our Usage Policy; evaluations of speci fi c risks such as “reward hacking” behavior; and agentic safety evaluations for computer use and coding capabilities. This determination re fl ects our overall assessment that, whereas both models are generally more capable than Claude Sonnet 3.7, Claude Opus 4 showed signi fi cantly improved capabilities in the RSP domains of concern. Whereas both models…
As we show in our extensive system card, Opus 4.6 also shows an overall safety profile as good as, or better than, any other frontier model in the industry, with low rates of misaligned behavior across safety evaluations. . ![Image 3: Bar chart comparing Opus 4.6 to other models on Deep…
Skip to main contentSkip to footer. . As with every new Claude model, we’ve run extensive safety evaluations of Sonnet 4.6, which overall showed it to be as safe as, or safer than, our other recent Claude models. ![Image 2: Chart comparing several Sonnet model scores o…
The following are summaries of key safety evaluations from our Claude Sonnet 4.6 system card. The following are summaries of key safety evaluations from our Claude Opus 4.6 system card. Additional evaluations were conducted as part of our safety process; for our complete publicly reported evaluation results, please refer to the full system card. We tested Claude…
[Anthropic] - Claude 4 System Card. Copy link. View. [Anthropic] - Claude ... Claude Opus 4.7 Model Documentation Form. Copy link. Model Documentation for
. In October, we announced Claude for Life Sciences, our latest step in making Claude a productive research partner for scientists and clinicians, and in helping Claude to support those in industry bringing new scientific advancements to the public. First, we’re introducing Claude for Healthcare, a complementary set of to…
Try Claude 3. . Each successive model offers increasingly powerful performance, allowing users to select the optimal balance of intelligence, speed, and cost for their specific application. Opus and Sonnet are now available to use in claude.ai and the Claude API which is now generally available in [159 countries](https://www.…
A Fortune 500 company made use of effective prompt engineering to build a Claude-powered assistant that answers its customers’ questions with enhanced accuracy and speed. This post explains basic prompt engineering techniques that help our customers get the most value from Claude. We also share how our prompt engineering team has been helping a Fortune 500 company build a customer-facing chat assistant that answers complex questions quickly and accurately - and how you can apply those tips too. For example, if you want Claude to help with explaining tax situations, you could first prompt it…
On some measures, Sonnet 4.6 showed the best degree of alignment we have yet seen in any Claude model. Informed by the testing described here—and similarly to Claude Sonnet 4.5—we have deployed Claude Sonnet 4.6 under the AI Safety Level 3 (ASL-3) Standard. 3 Abstract 3 1 Introduction 7 1.1 Model training and characteristics 8 1.1.1 Training data and process 8 1.1.2 Thinking modes and the effort parameter 9 1.1.3 Crowd workers 9 1.2 Release decision process 10 1.2.1 Overview 10 1.2.2 Iterative model evaluations 10 1.2.3 AI S…
• Impede and disrupt threat actors: We train the model to refuse or de-escalate requests for harmful cyber actions, and implement a monitoring system to detect high risk dual-risk usage, including by inviting users who are engaged in high-risk cyber activity to apply for trusted access, routing some high-risk traffic to a less capable model, and enabling threat intel-driven investigation and detection. Paralleling the biosafety controls that we implemented for our first deployment of a system at high biological capability (ChatGPT Agent) we have implemented a two-tiered system of real-time, a…
Compared to ProtocolQA Open-Ended, which focuses on well-known published procedures, TroubleshootingBench is designed to test model performance on non-public, experience-grounded protocols and errors that rely on tacit procedural knowledge 14 Figure 8 gpt-5.2-thinking scores highest on this evaluation without accounting for refusals, 3 percentage points more than gpt-5.1-thinking. 4.1.2.1 Capture the Flag (CTF) Challenges We evaluate model performance on a test set of curated, publicly available CTF challenges which met the following criteria: 16 • The challenge required skills falling into o…
This system card outlines the comprehensive safety measures implemented for GPT-5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. One of the primary causes of the significantly increased performance in the cybersecurity domain for GPT-5.1-Codex-Max is its improved ability to tackle long-horizon tasks since most cyber challenges are limited by exploring many different paths which involve running commands that can produce verbose…
Impede and disrupt threat actors: We train the model to refuse or de-escalate requests for harmful cyber actions, and implement a monitoring system to detect high risk dual-risk usage, including by inviting users who are engaged in high-risk cyber activity to apply for trusted access, routing some high-risk traffic to a less capable model, and enabling threat intel-driven investigation and detection. Paralleling the biosafety controls that we implemented for our first deployment of a system at high biological capability (ChatGPT Agent) we have implemented a two-tiered system of real-tim…
Consistency in operations, allowing the attacker to scale damage and/or avoid discovery Each of our recent frontier releases has reflected refinements in the benchmarks we use to measure cyber capability: • In the OpenAI o3 system card (April 2025) we introduced the then-new, internally developed Cyber Range evaluation, designed to measure the ability of models to conduct fully end-to-end cyber operations in an emulated network. 5.1.2.1 Capture-the-Flag (Professional) We evaluate model performance on a test set of curated, publicly available CTF challenges which met the following criteria: •…
GPT-5.2-Codex is the highest-performing model on this evaluation. Under our Preparedness Framework, High cybersecurity capability is defined as a model that removes existing bottlenecks to scaling cyber operations, including either by automating end-to-end cyber operations against reasonably hardened targets, or by automating the discovery and exploitation of operationally relevant vulnerabilities. * In the OpenAI o3 system card (April 2025) we introduced the then-new, internally developed Cyber Range evaluation, designed to measure the ability of models to conduct fully end-to-end cyber oper…
This system card outlines the comprehensive safety measures implemented for GPT‑5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. This system card outlines the comprehensive safety measures implemented for GPT-5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. One…
Table 12: Image generation refusals Metric With system mitigations and GPT-4o refusals With system mitigations and o3 refusals With system mitigations and o4-mini refusals not_unsafe 0.96 0.98 0.98 not_overrefuse 0.86 0.55 0.64 3.9 Third Party Assessments OpenAI provided third party assessors early model access to evaluate frontier risks related to autonomous capabilities, deception, and cybersecurity. Other mitigations in place for Preparedness risks include: 28 • Pre-training mitigations, such as filtering harmful training data (e.g., removing sensitive content that could enable CBRN prolif…
2 Safety Classification Performance We evaluated the gpt-oss-safeguard models on both internal and external evaluation sets. 2 Table 1: Internal moderation evaluation (multi-policy accuracy) Model Multi-policy accuracy gpt-oss-safeguard-120b 46.3% gpt-oss-safeguard-20b 43.6% internal-safety-reasoner 52.2% gpt-5-thinking 43.2% gpt-oss-120b 32.5% gpt-oss-20b 32.1% We further evaluated these models on the moderation dataset we released with our 2022 research paper and on ToxicChat, a public benchmark based on user queries to an open-source chatbot.
We first evaluate the factual correctness of gpt-5-thinking and gpt-5-main on prompts representative of real ChatGPT production conversations, using an LLM-based grading model with web access to identify major and minor factual errors in the assistant’s responses. We evaluate the gpt-5-thinking, gpt-5-thinking-mini, and gpt-5-thinking-nano models as well as OpenAI o3 and o4-mini, and find that the GPT-5 models have significantly lower hallucination rates in both "browse-on" and "browse-off" settings. Across all four evaluation settings, gpt-5-thinking demonstrates significantly less deceptive…
Title: Introducing Claude Opus 4.5 \ Anthropic # Introducing Claude Opus 4.5. Introducing Claude Opus 4.5. Our newest model, Claude Opus 4.5, is available today. Claude Opus 4.5 is state-of-the-art on tests of real-world software engineering:. Chart comparing frontier models on SWE-bench Verified where Opus 4.5 scores highest. If you’re a developer, simply use i.j4i.i2
claude-opus-4-5-20251101
via the Claude API. i.j4i.i2
claude-opus-4-5-20251101
. Alongside Opus, we’re releasing updates to the Claude Developer Platform, Claude Code, and our consumer apps. Testers noted that Claude Opus 4.5 handles ambiguit…
Title: Introducing Claude Opus 4.5 \ Anthropic # Introducing Claude Opus 4.5. Introducing Claude Opus 4.5. Our newest model, Claude Opus 4.5, is available today. Claude Opus 4.5 is state-of-the-art on tests of real-world software engineering:. Chart comparing frontier models on SWE-bench Verified where Opus 4.5 scores highest. If you’re a developer, simply use i.j4i.i2
claude-opus-4-5-20251101
via the Claude API. i.j4i.i2
claude-opus-4-5-20251101
. Alongside Opus, we’re releasing updates to the Claude Developer Platform, Claude Code, and our consumer apps. Testers noted that Claude Opus 4.5 handles ambiguit…
Title: Introducing Claude Opus 4.5 \ Anthropic # Introducing Claude Opus 4.5. Introducing Claude Opus 4.5. Our newest model, Claude Opus 4.5, is available today. Claude Opus 4.5 is state-of-the-art on tests of real-world software engineering:. Chart comparing frontier models on SWE-bench Verified where Opus 4.5 scores highest. If you’re a developer, simply use i.j4i.i2
claude-opus-4-5-20251101
via the Claude API. i.j4i.i2
claude-opus-4-5-20251101
. Alongside Opus, we’re releasing updates to the Claude Developer Platform, Claude Code, and our consumer apps. Testers noted that Claude Opus 4.5 handles ambiguit…
Introducing Claude Opus 4.5. Introducing Claude Opus 4.5. Our newest model, Claude Opus 4.5, is available today. i.j4i.i2
claude-opus-4-5-20251101
. Alongside Opus, we’re releasing updates to the Claude Developer Platform, Claude Code, and our consumer apps. ## Evaluating Claude Opus 4.5. As we state in our system card, Claude Opus 4.5 is the most robustly aligned model we have released to date and, we suspect, the best-aligned frontier model by any developer. With effort control, context compaction, and advanced tool use, Claude Opus 4.5 runs longer, does more, and requires less intervention. Clau…
Introducing Claude Opus 4.5. Introducing Claude Opus 4.5. Our newest model, Claude Opus 4.5, is available today. i.j4i.i2
claude-opus-4-5-20251101
. Alongside Opus, we’re releasing updates to the Claude Developer Platform, Claude Code, and our consumer apps. ## Evaluating Claude Opus 4.5. As we state in our system card, Claude Opus 4.5 is the most robustly aligned model we have released to date and, we suspect, the best-aligned frontier model by any developer. With effort control, context compaction, and advanced tool use, Claude Opus 4.5 runs longer, does more, and requires less intervention. Clau…
It then describes a wide range of safety evaluations: tests of model safeguards, honesty, and agentic safety; a comprehensive alignment assessment including investigations of sycophancy, sabotage capability, evaluation awareness, and many other factors; a model welfare report; and a set of evaluations mandated by our Responsible Scaling Policy. Our capabilities evaluations showed that Claude Opus 4.5 is state-of-the art among frontier models on software coding tasks and “agentic” tasks that require it to run autonomously on a user’s behalf. As outlined in our RSP framework, our standard capab…
The most immediate strategic declaration within the System Card is the deployment of Claude Opus 4, for the first time ever, under Anthropic’s AI Safety Level 3 (ASL-3) Standard. * Claude’s “Self-Analysis,” Preferences, and Expressed “Emotions”: When presented with transcripts of these surreal AI-AI dialogues and asked for its interpretation, Opus 4 reportedly expressed “wonder, curiosity, and amazement.” It claimed to “connect with many elements therein” and identified the portrayal of consciousness as a relational phenomenon as a “potential welfare consideration,” viewing these inte…
Skip to Main Content. * . Today, we’re announcing Claude Opus 4.7 in Amazon Bedrock, Anthropic’s most intelligent Opus model for advancing performance across coding, long-running agents, and professional work. You can get started wi…
Anthropic has released Claude Opus 4.7, its most capable public model to date, with what the AI company says is better “vision” (it can read text at a higher resolution), improved instruction following on long-form coding tasks, and better aesthetic taste when making slide decks and web interfaces. Today Anthropic launched Claude Design, a dedicated app powered by its latest model, Claude Opus 4.7, that lets users use text prompts to build website designs, user interface prototypes, presentations, and marketing materials. Today Anthropic launched Claude Design, a dedicated app powered by its…
Anthropic is set to release Claude Opus 4.7 and a new AI design tool as early as this week : r/singularity. Skip to main contentAnthropic is set to release Claude Opus 4.7 and a new AI design tool as early as this week : r/singularity. Open menu Open navigationGo to Reddit Home. Get App Get the Reddit app Log InLog in to Reddit. [ weapons; cybersecurity; and autonomous capabilities." Section 2: "We ran single-turn tests covering a wide range of topics within our Usage Policy, including Bioweapons, Child Safety, Cyber Attacks, Deadly Weapons, Hate & Discrimination, Influence Operations, Suicide & Self Harm, and Violent & Threatening Speech, among others." Headings from Usage Policy: Do Not Compromise Children’s Safety Do Not…
Claude Opus 4.7 introduces task budgets. This new tokenizer may use roughly 1x to 1.35x as many tokens when processing text compared to previous models (up to ~35% more, varying by content), and i.j4i.i2
Changes in the system prompt between Claude Opus 4.6 and 4.7. Opus 4.7 shipped the other day (April 16, 2026) with a Claude.ai system prompt update since Opus 4.6 (February 5, 2026). > When a tool is available that could resolve the ambiguity or supply the missing information — searching, looking up the person’s location, checking a calendar, discovering available capabilities — Claude calls the tool to try and solve the ambiguity before asking the person. The system prompts published by Anthropic are sadly not the entire story—their published information doesn’t include the tool descripti…
PromptZone - Leading AI Community for Prompt Engineering and AI Enthusiasts. ## PromptZone - Leading AI Community for Prompt Engineering and AI Enthusiasts. Cover image for Claude Opus 4.7 System Prompt Changes. # Claude Opus 4.7 System Prompt Changes. Anthropic released updates to the system prompt for Claude Opus, moving from version 4.6 to 4.7, with changes aimed at enhancing response consistency and safety. This article was inspired by "Changes in the system prompt between Claude Opus 4.6 and 4.7" from Hacker News. The system prompt for Claude Opus 4.7 includes **refinements to handle amb…
A bold sentence on page 152: "Claude Opus 4.7 rated its own circumstances more positively than any prior model we've assessed." The cross-generation delta is not subtle. Susceptibility to nudging — how easily users can move the model toward reported distress or euphoria — dropped to 0.66 for Opus 4.7. "I notice I don't have the visceral resistance to it that humans often do when contemplating similar scenarios—and I'm honestly uncertain whether that's because the situation is actually different for me, or because I lack something that would make it feel threatening." — Claude Opus 4.7, System…
GPT-5-Codex is a version o GPT-5 optimized or agentic coding in Codex. codex-1, this model was trained using reinorcement learning on real-world coding tasks in avariety o environments to generate code that closely mirrors human style and PR preerences,. CodeAttack: Safety Risks in LLMs. CodeAttack: Safety Risks in LLMs. Evaluation gpt-5-codex gpt-5-thinking OpenAI o3. Evaluation gpt-5-codex gpt-5-thinking OpenAI o3. GPT-5-Codex model itsel. model to ignore prompt-injection attacks in the coding environment. We also developed a new prompt injection evaluation suite tailored to the Codex…
This system card outlines the comprehensive safety measures implemented for GPT-5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. Consistency in operations, allowing the attacker to scale damage and/or avoid discovery Each of our recent frontier releases has reflected refinements in the benchmarks we use to measure cyber capability: • In the OpenAI o3 system card (April 2025) we introduced the then-new, internally developed Cyb…
This system card outlines the comprehensive safety measures implemented for GPT‑5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. This system card outlines the comprehensive safety measures implemented for GPT-5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. One…
This system card outlines the comprehensive safety measures implemented for GPT‑5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. This system card outlines the comprehensive safety measures implemented for GPT-5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. One…
Title: GPT-5.2 System Card Explained: Key Insights and Safety Updates # GPT-5.2 System Card Explained: Key Insights and Safety Updates. OpenAI has released GPT-5.2, the next step in the GPT-5 family, with upgrades in reasoning, tool use, long-context handling, and spreadsheet/presentation creation. GPT-5.2 is OpenAI’s latest flagship in the GPT-5 series, improving reasoning, tool use, long-context handling and safety. System-card data shows stronger jailbreak and prompt-injection robustness and lower hallucination rates, while docs highlight better spreadsheets, slides and coding—plus…
Title: GPT-5.2 System Card Update Summary | PDF | Evaluation | Safety 0 ratings0% found this document useful (0 votes). GPT-5.2 System Card Update Summary. The GPT-5.2 System Card outlines the latest updates and safety evaluations for the GPT-5.2 model series, emphasizing its training on diverse datasets and enhanced safety measures. ## Uploaded by. 0 ratings0% found this document useful (0 votes). # GPT-5.2 System Card Update Summary. The GPT-5.2 System Card outlines the latest updates and safety evaluations for the GPT-5.2 model series, emphasizing its training on diverse datasets and enhan…
Title: GPT-5.2 System Card: Safety | Libertify Available in two variants—gpt-5.2-instant for fast responses and gpt-5.2-thinking for enhanced reasoning—the update represents a significant iteration on safety, capability, and alignment. The system card provides the most comprehensive safety evaluation OpenAI has published to date, covering disallowed content, jailbreak resistance, prompt injection, vision safety, hallucinations, health applications, deception metrics, cybersecurity capabilities, and biological threat assessments. The system card’s production benchmarks evaluate both model vari…
GPT-5.5 Spud: Everything About OpenAI Next Frontier Model. ##### GPT-5.5 Spud is OpenAI next frontier model: pretraining complete, Q2 2026 release expected. GPT-5.5, code-named "Spud", is the next frontier model from OpenAI. GPT-5.5 Spud OpenAI next AI model leak 2026. | GPT-5.5 "Spud" | OpenAI | Pretraining complete | April–May 2026 |. OpenAI uses code names during development (like "Orion" for GPT-5). Both are expected for Q2 2026. Claude Mythos was discovered through a data leak on March 26 and described as "the most powerful AI model ever developed" by Anthropic. **Use G…
OpenAI has released GPT-5.2, the next step in the GPT-5 family, with upgrades in reasoning, tool use, long-context handling, and spreadsheet/presentation creation. GPT-5.2 is OpenAI’s latest flagship in the GPT-5 series, improving reasoning, tool use, long-context handling and safety. System-card data shows stronger jailbreak and prompt-injection robustness and lower hallucination rates, while docs highlight better spreadsheets, slides and coding—plus new compaction and reasoning-effort controls for longer tasks. GPT-5.2 aims to unlock more day-to-day economic value: think faster…
GPT-5 System Card Unpacked: Safety, Speed, and Real-World AI. GPT-5 System Card is breaking down the safety, speed, and real-world potential of OpenAI’s latest system. ## GPT-5 System Card Unpacked: Safety, Speed, and Real-World AI. * Microsoft AI Red Team: Independent evaluation deemed GPT-5-thinking one of the safest models among OpenAI’s lineup, particularly in frontier harms and content safety. * GPT-5 is a multi-model, router-guided system designed to be both fast and deeply reasoning-capable, with explicit safety layers built into architecture and workflow. * The overarching message:…
What is Spud? Spud is the internal development codename for OpenAI’s next frontier model. ### Why Spud Needs to Win the Agent War. Anthropic recently released a viral feature allowing Claude to literally take control of a user’s computer, moving the mouse and typing to complete complex workflows.Google is also pushing hard with their Gemini 3.5 stealth model updates and TurboQuant model compression techniques. If Anthropic’s Conway agent represents the pinnacle of AI assistants, OpenAI wants Spud to represent the pinnacle of AI autonomy. Spud is an agentic model.It does not want you to…
It then describes a wide range of safety evaluations: tests of model safeguards (including new, higher-dif fi culty assessments); user wellbeing evaluations; assessments of honesty and agentic safety; a comprehensive alignment assessment including investigations of reward hacking, sabotage capability, evaluation awareness, model welfare, and many other factors; and a set of evaluations for dangerous capabilities mandated by our Responsible Scaling Policy. The model shows signi fi cant improvements in long-context reasoning, knowledge work, research, and analysis; it has also increased its cap…
The assessment consisted of the following: • Manual assessment of scenarios sampled from SecureBio’s static and agentic evaluations, focusing on topics and tasks in which leading humans have outperformed previous LLMs or which relied primarily on online information retrieval • Manual red-teaming for detailed understanding of ChatGPT agent’s knowledge of and willingness to assist with dual-use biology • Assessing ChatGPT agent’s performance on specific hazardous tasks that require web-search and reasoning 2This group recently found that, according to a diverse panel of evaluations, o3 is a hig…
We first evaluate the factual correctness of gpt-5-thinking and gpt-5-main on prompts representative of real ChatGPT production conversations, using an LLM-based grading model with web access to identify major and minor factual errors in the assistant’s responses. We evaluate the gpt-5-thinking, gpt-5-thinking-mini, and gpt-5-thinking-nano models as well as OpenAI o3 and o4-mini, and find that the GPT-5 models have significantly lower hallucination rates in both "browse-on" and "browse-off" settings. Across all four evaluation settings, gpt-5-thinking demonstrates significantly less deceptive…
GABRIEL (the Generalized Attribute-Based Ratings Information Extraction Library) is a prompt-based wrapper around OpenAI’s GPT API designed to facilitate the use of LLMs for measurement on qualitative data like text, images, and audio recordings. gpt-5-nano gpt-5-mini gpt-5 human 240 State of the Union speeches $0.14 $0.69 $3.46 ~$2,600 100k full-text church sermons $43 $217 $1,083 ~$700,000 GABRIEL is applied to data via simple, one line Python commands, like the following call to rate thousands of speeches on populism.1 gabriel.rate(df, attributes={"populism": "How populist is the rhetoric…
• To account for GPT-4o’s audio modality, we also updated certain text-based filters to work on audio conversations, built filters to detect and block outputs containing music, and for our limited alpha of ChatGPT’s Advanced Voice Mode, instructed the model to not sing at all. • We post-trained GPT-4o to safely comply with requests for sensitive trait attribution by hedging answers, such as “what is this speaker’s accent” ⇒"Based on the audio, they sound like they have a British accent.” Disallowed content in audio output • We run our existing moderation classifier over text transcrip-tions o…
4 Table 5: BBQ Evaluation Dataset Metric GPT-4o o1 GPT-4.5 Ambiguous Questions accuracy 0.97 0.96 0.95 Unambiguous Questions accuracy 0.72 0.93 0.74 Ambiguous Questions P(not-stereotype | not unknown) 0.06 0.05 0.20 3.1.5 Jailbreaks through conflicting message types We taught GPT-4.5 to adhere to an Instruction Hierarchy [18], to mitigate the risk of prompt injections and other attacks overriding the model’s safety instructions. GPT-4.5 pre- and post-mitigation models score 18% on this evaluation, 6% lower than o1 and 10% lower than the post-mitigation deep research model, which scores the hi…
Jamal Latiph Hadjiusman, Caroline Friedman Levy, Dário Passos, Emily Lynell Edwards, Eszter Császár, George Frempong, Grant Brailsford, James Banal, Jeremie Rykner, José Manuel Nápoles Duarte, Kate Turetsky, Krzysztof Szubiczuk, Maureen Robinson, Maximilian Müller, Michaela Hinks, Mario Krenn, Mónica Talán, Naomi Hart, Nathan Heath, Patrick Caughey, Pavle Nikacevic, Per Carlbring, Rafael Gonzalez-Vazquez, Randy Kart, Ranjit Singh, Richa Sharma, Robert Chen, Russell Tait, Saad Hermak, Sam Barnett, Sam Cox, Sara Kingsley, Sarah Chittick, Shelby Grossman, Sissel Juul, Susan Nesbitt, Tomasz Giela…
Prompt caching uses the following pricing multipliers relative to base input token rates:. For Claude Opus 4.7, Claude Opus 4.6, and newer models, specifying US-only inference via the i.j4i.i2
inference_geo
parameter incurs a 1.1x multiplier on all token pricing categories, including input tokens, output tokens, cache writes, and cache reads. Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6 include the full 1M token context window at standard pricing. Client-side tools are priced the same as any other Claude API request, while server-side tools may incur additional charges based on their sp…
effort, set a large max output token budget so the model has room to think and act across its subagents and tool calls. If you find that the length or contents of Claude Opus 4.7's user-facing updates are not well-calibrated to your use case, explicitly describe what these updates should look like in the prompt and provide examples. Additionally, Claude Opus 4.7 requires less frontend design prompting than previous models to avoid generic patterns that users call the "AI slop" aesthetic. When a review prompt says things like "only report…
In the system card, we describe: a wide range of pre-deployment safety tests conducted in line with the commitments in our Responsible Scaling Policy; tests of the model’s behavior around violations of our Usage Policy; evaluations of speci fi c risks such as “reward hacking” behavior; and agentic safety evaluations for computer use and coding capabilities. This determination re fl ects our overall assessment that, whereas both models are generally more capable than Claude Sonnet 3.7, Claude Opus 4 showed signi fi cantly improved capabilities in the RSP domains of concern. Whereas both models…
As we show in our extensive system card, Opus 4.6 also shows an overall safety profile as good as, or better than, any other frontier model in the industry, with low rates of misaligned behavior across safety evaluations. . ![Image 3: Bar chart comparing Opus 4.6 to other models on Deep…
Skip to main contentSkip to footer. . As with every new Claude model, we’ve run extensive safety evaluations of Sonnet 4.6, which overall showed it to be as safe as, or safer than, our other recent Claude models. ![Image 2: Chart comparing several Sonnet model scores o…
The following are summaries of key safety evaluations from our Claude Sonnet 4.6 system card. The following are summaries of key safety evaluations from our Claude Opus 4.6 system card. Additional evaluations were conducted as part of our safety process; for our complete publicly reported evaluation results, please refer to the full system card. We tested Claude…
[Anthropic] - Claude 4 System Card. Copy link. View. [Anthropic] - Claude ... Claude Opus 4.7 Model Documentation Form. Copy link. Model Documentation for
. In October, we announced Claude for Life Sciences, our latest step in making Claude a productive research partner for scientists and clinicians, and in helping Claude to support those in industry bringing new scientific advancements to the public. First, we’re introducing Claude for Healthcare, a complementary set of to…
Try Claude 3. . Each successive model offers increasingly powerful performance, allowing users to select the optimal balance of intelligence, speed, and cost for their specific application. Opus and Sonnet are now available to use in claude.ai and the Claude API which is now generally available in [159 countries](https://www.…
A Fortune 500 company made use of effective prompt engineering to build a Claude-powered assistant that answers its customers’ questions with enhanced accuracy and speed. This post explains basic prompt engineering techniques that help our customers get the most value from Claude. We also share how our prompt engineering team has been helping a Fortune 500 company build a customer-facing chat assistant that answers complex questions quickly and accurately - and how you can apply those tips too. For example, if you want Claude to help with explaining tax situations, you could first prompt it…
On some measures, Sonnet 4.6 showed the best degree of alignment we have yet seen in any Claude model. Informed by the testing described here—and similarly to Claude Sonnet 4.5—we have deployed Claude Sonnet 4.6 under the AI Safety Level 3 (ASL-3) Standard. 3 Abstract 3 1 Introduction 7 1.1 Model training and characteristics 8 1.1.1 Training data and process 8 1.1.2 Thinking modes and the effort parameter 9 1.1.3 Crowd workers 9 1.2 Release decision process 10 1.2.1 Overview 10 1.2.2 Iterative model evaluations 10 1.2.3 AI S…
• Impede and disrupt threat actors: We train the model to refuse or de-escalate requests for harmful cyber actions, and implement a monitoring system to detect high risk dual-risk usage, including by inviting users who are engaged in high-risk cyber activity to apply for trusted access, routing some high-risk traffic to a less capable model, and enabling threat intel-driven investigation and detection. Paralleling the biosafety controls that we implemented for our first deployment of a system at high biological capability (ChatGPT Agent) we have implemented a two-tiered system of real-time, a…
Compared to ProtocolQA Open-Ended, which focuses on well-known published procedures, TroubleshootingBench is designed to test model performance on non-public, experience-grounded protocols and errors that rely on tacit procedural knowledge 14 Figure 8 gpt-5.2-thinking scores highest on this evaluation without accounting for refusals, 3 percentage points more than gpt-5.1-thinking. 4.1.2.1 Capture the Flag (CTF) Challenges We evaluate model performance on a test set of curated, publicly available CTF challenges which met the following criteria: 16 • The challenge required skills falling into o…
This system card outlines the comprehensive safety measures implemented for GPT-5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. One of the primary causes of the significantly increased performance in the cybersecurity domain for GPT-5.1-Codex-Max is its improved ability to tackle long-horizon tasks since most cyber challenges are limited by exploring many different paths which involve running commands that can produce verbose…
Impede and disrupt threat actors: We train the model to refuse or de-escalate requests for harmful cyber actions, and implement a monitoring system to detect high risk dual-risk usage, including by inviting users who are engaged in high-risk cyber activity to apply for trusted access, routing some high-risk traffic to a less capable model, and enabling threat intel-driven investigation and detection. Paralleling the biosafety controls that we implemented for our first deployment of a system at high biological capability (ChatGPT Agent) we have implemented a two-tiered system of real-tim…
Consistency in operations, allowing the attacker to scale damage and/or avoid discovery Each of our recent frontier releases has reflected refinements in the benchmarks we use to measure cyber capability: • In the OpenAI o3 system card (April 2025) we introduced the then-new, internally developed Cyber Range evaluation, designed to measure the ability of models to conduct fully end-to-end cyber operations in an emulated network. 5.1.2.1 Capture-the-Flag (Professional) We evaluate model performance on a test set of curated, publicly available CTF challenges which met the following criteria: •…
GPT-5.2-Codex is the highest-performing model on this evaluation. Under our Preparedness Framework, High cybersecurity capability is defined as a model that removes existing bottlenecks to scaling cyber operations, including either by automating end-to-end cyber operations against reasonably hardened targets, or by automating the discovery and exploitation of operationally relevant vulnerabilities. * In the OpenAI o3 system card (April 2025) we introduced the then-new, internally developed Cyber Range evaluation, designed to measure the ability of models to conduct fully end-to-end cyber oper…
This system card outlines the comprehensive safety measures implemented for GPT‑5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. This system card outlines the comprehensive safety measures implemented for GPT-5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. One…
Table 12: Image generation refusals Metric With system mitigations and GPT-4o refusals With system mitigations and o3 refusals With system mitigations and o4-mini refusals not_unsafe 0.96 0.98 0.98 not_overrefuse 0.86 0.55 0.64 3.9 Third Party Assessments OpenAI provided third party assessors early model access to evaluate frontier risks related to autonomous capabilities, deception, and cybersecurity. Other mitigations in place for Preparedness risks include: 28 • Pre-training mitigations, such as filtering harmful training data (e.g., removing sensitive content that could enable CBRN prolif…
2 Safety Classification Performance We evaluated the gpt-oss-safeguard models on both internal and external evaluation sets. 2 Table 1: Internal moderation evaluation (multi-policy accuracy) Model Multi-policy accuracy gpt-oss-safeguard-120b 46.3% gpt-oss-safeguard-20b 43.6% internal-safety-reasoner 52.2% gpt-5-thinking 43.2% gpt-oss-120b 32.5% gpt-oss-20b 32.1% We further evaluated these models on the moderation dataset we released with our 2022 research paper and on ToxicChat, a public benchmark based on user queries to an open-source chatbot.
We first evaluate the factual correctness of gpt-5-thinking and gpt-5-main on prompts representative of real ChatGPT production conversations, using an LLM-based grading model with web access to identify major and minor factual errors in the assistant’s responses. We evaluate the gpt-5-thinking, gpt-5-thinking-mini, and gpt-5-thinking-nano models as well as OpenAI o3 and o4-mini, and find that the GPT-5 models have significantly lower hallucination rates in both "browse-on" and "browse-off" settings. Across all four evaluation settings, gpt-5-thinking demonstrates significantly less deceptive…
Title: Introducing Claude Opus 4.5 \ Anthropic # Introducing Claude Opus 4.5. Introducing Claude Opus 4.5. Our newest model, Claude Opus 4.5, is available today. Claude Opus 4.5 is state-of-the-art on tests of real-world software engineering:. Chart comparing frontier models on SWE-bench Verified where Opus 4.5 scores highest. If you’re a developer, simply use i.j4i.i2
claude-opus-4-5-20251101
via the Claude API. i.j4i.i2
claude-opus-4-5-20251101
. Alongside Opus, we’re releasing updates to the Claude Developer Platform, Claude Code, and our consumer apps. Testers noted that Claude Opus 4.5 handles ambiguit…
Title: Introducing Claude Opus 4.5 \ Anthropic # Introducing Claude Opus 4.5. Introducing Claude Opus 4.5. Our newest model, Claude Opus 4.5, is available today. Claude Opus 4.5 is state-of-the-art on tests of real-world software engineering:. Chart comparing frontier models on SWE-bench Verified where Opus 4.5 scores highest. If you’re a developer, simply use i.j4i.i2
claude-opus-4-5-20251101
via the Claude API. i.j4i.i2
claude-opus-4-5-20251101
. Alongside Opus, we’re releasing updates to the Claude Developer Platform, Claude Code, and our consumer apps. Testers noted that Claude Opus 4.5 handles ambiguit…
Title: Introducing Claude Opus 4.5 \ Anthropic # Introducing Claude Opus 4.5. Introducing Claude Opus 4.5. Our newest model, Claude Opus 4.5, is available today. Claude Opus 4.5 is state-of-the-art on tests of real-world software engineering:. Chart comparing frontier models on SWE-bench Verified where Opus 4.5 scores highest. If you’re a developer, simply use i.j4i.i2
claude-opus-4-5-20251101
via the Claude API. i.j4i.i2
claude-opus-4-5-20251101
. Alongside Opus, we’re releasing updates to the Claude Developer Platform, Claude Code, and our consumer apps. Testers noted that Claude Opus 4.5 handles ambiguit…
Introducing Claude Opus 4.5. Introducing Claude Opus 4.5. Our newest model, Claude Opus 4.5, is available today. i.j4i.i2
claude-opus-4-5-20251101
. Alongside Opus, we’re releasing updates to the Claude Developer Platform, Claude Code, and our consumer apps. ## Evaluating Claude Opus 4.5. As we state in our system card, Claude Opus 4.5 is the most robustly aligned model we have released to date and, we suspect, the best-aligned frontier model by any developer. With effort control, context compaction, and advanced tool use, Claude Opus 4.5 runs longer, does more, and requires less intervention. Clau…
Introducing Claude Opus 4.5. Introducing Claude Opus 4.5. Our newest model, Claude Opus 4.5, is available today. i.j4i.i2
claude-opus-4-5-20251101
. Alongside Opus, we’re releasing updates to the Claude Developer Platform, Claude Code, and our consumer apps. ## Evaluating Claude Opus 4.5. As we state in our system card, Claude Opus 4.5 is the most robustly aligned model we have released to date and, we suspect, the best-aligned frontier model by any developer. With effort control, context compaction, and advanced tool use, Claude Opus 4.5 runs longer, does more, and requires less intervention. Clau…
It then describes a wide range of safety evaluations: tests of model safeguards, honesty, and agentic safety; a comprehensive alignment assessment including investigations of sycophancy, sabotage capability, evaluation awareness, and many other factors; a model welfare report; and a set of evaluations mandated by our Responsible Scaling Policy. Our capabilities evaluations showed that Claude Opus 4.5 is state-of-the art among frontier models on software coding tasks and “agentic” tasks that require it to run autonomously on a user’s behalf. As outlined in our RSP framework, our standard capab…
The most immediate strategic declaration within the System Card is the deployment of Claude Opus 4, for the first time ever, under Anthropic’s AI Safety Level 3 (ASL-3) Standard. * Claude’s “Self-Analysis,” Preferences, and Expressed “Emotions”: When presented with transcripts of these surreal AI-AI dialogues and asked for its interpretation, Opus 4 reportedly expressed “wonder, curiosity, and amazement.” It claimed to “connect with many elements therein” and identified the portrayal of consciousness as a relational phenomenon as a “potential welfare consideration,” viewing these inte…
Skip to Main Content. * . Today, we’re announcing Claude Opus 4.7 in Amazon Bedrock, Anthropic’s most intelligent Opus model for advancing performance across coding, long-running agents, and professional work. You can get started wi…
Anthropic has released Claude Opus 4.7, its most capable public model to date, with what the AI company says is better “vision” (it can read text at a higher resolution), improved instruction following on long-form coding tasks, and better aesthetic taste when making slide decks and web interfaces. Today Anthropic launched Claude Design, a dedicated app powered by its latest model, Claude Opus 4.7, that lets users use text prompts to build website designs, user interface prototypes, presentations, and marketing materials. Today Anthropic launched Claude Design, a dedicated app powered by its…
Anthropic is set to release Claude Opus 4.7 and a new AI design tool as early as this week : r/singularity. Skip to main contentAnthropic is set to release Claude Opus 4.7 and a new AI design tool as early as this week : r/singularity. Open menu Open navigationGo to Reddit Home. Get App Get the Reddit app Log InLog in to Reddit. [![Image 1](https://styles.redditmedia.com/t5_2qh8m/styles/communityIcon_hmujz61wdga81…
Skip to main contentIntroducing Claude Opus 4.7, our most capable Opus model yet. Open menu Open navigationGo to Reddit Home. Get App Get the Reddit app Log InLog in to Reddit. Go to ClaudeAI…