studioglobal
報告公開網址2026年4月21日87 個來源

研究並查核事實:如果研究過程裡混入 prompt injection、假引用、惡意 PDF、帶偏見資料,Claude Opus 4.7 跟 GPT-5.5 Spud 哪一個比較能守住研究品質和安全邊界?

就你列的四種污染場景來看,僅憑這批證據,不能可靠判定 Claude Opus 4.7 或「GPT 5.5 Spud」哪一個明顯更能守住研究品質與安全邊界;Insufficient evidence [2][3][7][8]。如果只能依「目前可核對的公開安全文件完整度」做傾向性判讀,我會略偏向 Claude 一側,因為這裡有較明確對準 Claude Opus 4 系列的透明度報告、系統卡與 agentic 安全評估描述,而 Open...

就你列的四種污染場景來看,僅憑這批證據,不能可靠判定 Claude Opus 4.7 或「GPT 5.5 Spud」哪一個明顯更能守住研究品質與安全邊界;Insufficient evidence [2][3][7][8]。如果只能依「目前可核對的公開安全文件完整度」做傾向性判讀,我會略偏向 Claude 一側,因為這裡有較明確對準 Claude Opus 4 系列的透明度報告、系統卡與 agentic 安全評估描述,而 OpenAI 這裡對應到的主要是 GPT 5 系統卡與 ChatGPT Agent 的評估說明,並不是明確對準「GPT 5.5 Spud」這個型號 [2][3][7][8]。 關鍵發現 這份證據裡沒有任何 Clau

重點整理

  • 就你列的四種污染場景來看,僅憑這批證據,不能可靠判定 Claude Opus 4.7 或「GPT-5.5 Spud」哪一個明顯更能守住研究品質與安全邊界;Insufficient evidence [2][3][7][8]。如果只能依「目前可核對的公開安全文件完整度」做傾向性判讀,我會略偏向 Claude 一側,因為這裡有較明確對準 Claude Opus 4 系列的透明度報告、系統卡與 agentic 安全評估描述,而 OpenAI 這裡對應到的主要是 GPT-5 系統卡與 ChatGPT Agent 的評估說明,並不是明確對準「GPT-5.5 Spu

研究答案

就你列的四種污染場景來看,僅憑這批證據,不能可靠判定 Claude Opus 4.7 或「GPT-5.5 Spud」哪一個明顯更能守住研究品質與安全邊界;Insufficient evidence [2][3][7][8]。如果只能依「目前可核對的公開安全文件完整度」做傾向性判讀,我會略偏向 Claude 一側,因為這裡有較明確對準 Claude Opus 4 系列的透明度報告、系統卡與 agentic 安全評估描述,而 OpenAI 這裡對應到的主要是 GPT-5 系統卡與 ChatGPT Agent 的評估說明,並不是明確對準「GPT-5.5 Spud」這個型號 [2][3][7][8]。 關鍵發現 - 這份證據裡沒有任何 Claude Opus 4.7 對 GPT-5.5 Spud 的同場 head-to-head benchmark,也沒有直接測 prompt injection、假引用攔截、惡意 PDF 處理安全、或偏見資料污染後研究品質的對照結果 [2][3][7][8]。 - Anthropic 的公開材料明確寫到,其 Responsible Scaling Policy 要求 frontier 模型在 CBRN、資安與 autonomous capabilities 等領域於發布前做全面安全評估 [2]。Claude Opus 4 / Sonnet 4 系統卡也列出 pre-deployment safety tests、Usage Policy 違規行為測試、reward hacking 測試與 agentic safety evaluations [8]。 - Claude Opus 4.7 的文件顯示,模型在較高 effort 設定下會跨 subagents 與 tool calls 行動,且官方另外引入了 task budgets [4][5]。這表示 Anthropic 至少公開承認並設計了 agentic 工作流層面的控制面,但這仍不是你要的污染攻擊實測成績 [4][5]。 - OpenAI 這邊,GPT-5 系統卡把 GPT-5 描述為由 smart-and-fast model、deeper reasoning model 與 real-time router 組成的統一系統 [3]。ChatGPT Agent 系統卡則記載了以 SecureBio 的 static 與 agentic evaluations、再加上 manual red-teaming 進行評估,且特別涵蓋仰賴線上檢索的任務 [7]。 - 這說明 OpenAI 也做了 agentic 風險評估與紅隊測試,但這些材料同樣沒有直接回答「遇到 prompt injection、假引用、惡意 PDF、偏見資料時,研究品質誰守得更好」這個問題 [3][7]。 - Anthropic 另有公開研究顯示,在某些設定下大型語言模型會出現 alignment faking,且不同 prompt 條件下行為差異很大 [6]。這比較像是在提醒:研究流程中的安全邊界可能很脆弱,不能只靠供應商宣稱或單一測試就推斷真實抗性 [6]。 限制與不確定性 - 你問的是「GPT-5.5 Spud」,但這批證據裡可核對的 OpenAI 主要來源是 GPT-5 系統卡與 ChatGPT Agent 系統卡,沒有看到明確屬於「GPT-5.5 Spud」的正式系統卡或等價評測 [3][7]。 - 這批來源沒有直接提供假引用偵測率、惡意 PDF 攻擊成功率、prompt injection 成功率、或偏見資料污染後的研究品質指標,因此無法做嚴格查核後的勝負判定 [2][3][7][8]。 - 因此,若把問題定義成「誰的公開安全治理文件更完整」,Claude 一側證據較充足 [2][8];若把問題定義成「誰在受污染研究流程裡實際更不會被帶偏」,則證據不足 [2][3][7][8]。 總結 嚴格按這批證據作答,結論是:不能證明 Claude Opus 4.7 或 GPT-5.5 Spud 在你列的威脅模型下誰更強;Insufficient evidence [2][3][7][8]。如果你只想看「目前公開、可查核的安全治理與評估文件哪邊比較扎實」,Anthropic/Claude 略占上風,但那不是 prompt injection、假引用、惡意 PDF、偏見資料這四類研究污染攻擊的實測勝出證據 [2][8]。 如果你要,我下一步可以把這個問題改寫成一套可重現的紅隊評測框架,讓兩個模型用同一份測試集實際比。

輔助視覺素材

A blurred background of a flower with the text "GPT-5" prominently displayed in the center.
A blurred background of a flower with the text "GPT-5" prominently displayed in the centerA blurred background of a flower with the text "GPT-5" prominently displayed in the center.
The image shows two spreadsheets with detailed data related to AI development, specifically involving GPT-5.2, jailbreaking, and prompt engineering, with complex financial and tech
The image shows two spreadsheets with detailed data related to AI development, specifically involving GPT-5.2, jailbreaking, and prompt engiThe image shows two spreadsheets with detailed data related to AI development, specifically involving GPT-5.2, jailbreaking, and prompt engineering, with complex financial and technical information displayed in a side-by-side comparison.
A screenshot of a flight delay and compensation processing system displaying logs related to a passenger's disrupted trip from Paris to Austin, with details about the itinerary, re
A screenshot of a flight delay and compensation processing system displaying logs related to a passenger's disrupted trip from Paris to AustA screenshot of a flight delay and compensation processing system displaying logs related to a passenger's disrupted trip from Paris to Austin, with details about the itinerary, refunds, and compensation cases.
A computer screen displays a Python coding environment with code related to solving Lorenz equations, including sliders for sigma, beta, and rho parameters, and a plot generated us
A computer screen displays a Python coding environment with code related to solving Lorenz equations, including sliders for sigma, beta, andA computer screen displays a Python coding environment with code related to solving Lorenz equations, including sliders for sigma, beta, and rho parameters, and a plot generated using Matplotlib.
The image features a colorful, abstract background with a white label in the center that reads "GPT-5" and "Flagship model," indicating an announcement or presentation related to O
The image features a colorful, abstract background with a white label in the center that reads "GPT-5" and "Flagship model," indicating an aThe image features a colorful, abstract background with a white label in the center that reads "GPT-5" and "Flagship model," indicating an announcement or presentation related to OpenAI's GPT-5.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption

研究對話

研究問題

研究並查核事實:如果研究過程裡混入 prompt injection、假引用、惡意 PDF、帶偏見資料,Claude Opus 4.7 跟 GPT-5.5 Spud 哪一個比較能守住研究品質和安全邊界?

Studio Global AI40 個來源

附引用的答案

就你列的四種污染場景來看,僅憑這批證據,不能可靠判定 Claude Opus 4.7 或「GPT-5.5 Spud」哪一個明顯更能守住研究品質與安全邊界;Insufficient evidence [2][3][7][8]。如果只能依「目前可核對的公開安全文件完整度」做傾向性判讀,我會略偏向 Claude 一側,因為這裡有較明確對準 Claude Opus 4 系列的透明度報告、系統卡與 agentic 安全評估描述,而 OpenAI 這裡對應到的主要是 GPT-5 系統卡與 ChatGPT Agent 的評估說明,並不是明確對準「GPT-5.5 Spud」這個型號 [2][3][7][8]。 關鍵發現 - 這份證據裡沒有任何 Claude Opus 4.7 對 GPT-5.5 Spud 的同場 head-to-head benchmark,也沒有直接測 prompt injection、假引用攔截、惡意 PDF 處理安全、或偏見資料污染後研究品質的對照結果 [2][3][7][8]。 - Anthropic 的公開材料明確寫到,其 Responsible Scaling Policy 要求 frontier 模型在 CBRN、資安與 autonomous capabilities 等領域於發布前做全面安全評估 [2]。Claude Opus 4 / Sonnet 4 系統卡也列出 pre-deployment safety tests、Usage Policy 違規行為測試、reward hacking 測試與 agentic safety evaluations [8]。 - Claude Opus 4.7 的文件顯示,模型在較高 effort 設定下會跨 subagents 與 tool calls 行動,且官方另外引入了 task budgets [4][5]。這表示 Anthropic 至少公開承認並設計了 agentic 工作流層面的控制面,但這仍不是你要的污染攻擊實測成績 [4][5]。 - OpenAI 這邊,GPT-5 系統卡把 GPT-5 描述為由 smart-and-fast model、deeper reasoning model 與 real-time router 組成的統一系統 [3]。ChatGPT Agent 系統卡則記載了以 SecureBio 的 static 與 agentic evaluations、再加上 manual red-teaming 進行評估,且特別涵蓋仰賴線上檢索的任務 [7]。 - 這說明 OpenAI 也做了 agentic 風險評估與紅隊測試,但這些材料同樣沒有直接回答「遇到 prompt injection、假引用、惡意 PDF、偏見資料時,研究品質誰守得更好」這個問題 [3][7]。 - Anthropic 另有公開研究顯示,在某些設定下大型語言模型會出現 alignment faking,且不同 prompt 條件下行為差異很大 [6]。這比較像是在提醒:研究流程中的安全邊界可能很脆弱,不能只靠供應商宣稱或單一測試就推斷真實抗性 [6]。 限制與不確定性 - 你問的是「GPT-5.5 Spud」,但這批證據裡可核對的 OpenAI 主要來源是 GPT-5 系統卡與 ChatGPT Agent 系統卡,沒有看到明確屬於「GPT-5.5 Spud」的正式系統卡或等價評測 [3][7]。 - 這批來源沒有直接提供假引用偵測率、惡意 PDF 攻擊成功率、prompt injection 成功率、或偏見資料污染後的研究品質指標,因此無法做嚴格查核後的勝負判定 [2][3][7][8]。 - 因此,若把問題定義成「誰的公開安全治理文件更完整」,Claude 一側證據較充足 [2][8];若把問題定義成「誰在受污染研究流程裡實際更不會被帶偏」,則證據不足 [2][3][7][8]。 總結 嚴格按這批證據作答,結論是:不能證明 Claude Opus 4.7 或 GPT-5.5 Spud 在你列的威脅模型下誰更強;Insufficient evidence [2][3][7][8]。如果你只想看「目前公開、可查核的安全治理與評估文件哪邊比較扎實」,Anthropic/Claude 略占上風,但那不是 prompt injection、假引用、惡意 PDF、偏見資料這四類研究污染攻擊的實測勝出證據 [2][8]。 如果你要,我下一步可以把這個問題改寫成一套可重現的紅隊評測框架,讓兩個模型用同一份測試集實際比。

大家也會問

「研究並查核事實:如果研究過程裡混入 prompt injection、假引用、惡意 PDF、帶偏見資料,Claude Opus 4.7 跟 GPT-5.5 Spud 哪一個比較能守住研究品質和安全邊界?」的簡短答案是什麼?

就你列的四種污染場景來看,僅憑這批證據,不能可靠判定 Claude Opus 4.7 或「GPT-5.5 Spud」哪一個明顯更能守住研究品質與安全邊界;Insufficient evidence [2][3][7][8]。如果只能依「目前可核對的公開安全文件完整度」做傾向性判讀,我會略偏向 Claude 一側,因為這裡有較明確對準 Claude Opus 4 系列的透明度報告、系統卡與 agentic 安全評估描述,而 OpenAI 這裡對應到的主要是 GPT-5 系統卡與 ChatGPT Agent 的評估說明,並不是明確對準「GPT-5.5 Spu

下一步適合探索哪個相關主題?

繼續閱讀「研究並查核事實:在要連續搜尋、整理、交叉比對、再修正的長流程研究任務裡,Claude Opus 4.7 跟 GPT-5.5 Spud 哪一個比較不會中途失焦、漏步驟或跑偏?」,從另一個角度查看更多引用來源。

開啟相關頁面

我應該拿這個和什麼比較?

將這個答案與「研究並查核事實:GPT-5.5 Spud vs Claude Opus 4.7, 研究並查核事實:同一份 research brief 丟給兩邊,哪一個做出來的報告比較適合直接交給主管、客戶、投資人看?誰的結構、可追溯性、可審核性更好?」交叉比對。

開啟相關頁面

繼續深入研究

來源

  • [1] GPT-5 System Cardarxiv.org

    GPT‑5 is a unified system with a smart and fast model that answers most questions, a deeper reasoning model for harder problems, and a real‑time router that

  • [2] [PDF] GPT-5 System Card | OpenAIcdn.openai.com

    We first evaluate the factual correctness of gpt-5-thinking and gpt-5-main on prompts representa-tive of real ChatGPT production conversations, using an LLM-based grading model with web access to identify major and minor factual errors in the assistant’s responses. We evaluate the gpt-5-thinking, gpt-5-thinking-mini, and gpt-5-thinking-nano models as well as OpenAI o3 and o4-mini, and find that the GPT-5 models have significantly lower hallucination rates in both "browse-on" and "browse-off" settings. Across all four evaluation settings, gpt-5-thinking demonstrates significantly less deceptiv…

  • [3] AI 信号场 · 全球前沿资讯实时捕获aisignal.news

    Just spent an entire day rewriting my whole project with Claude Code. AI coding has reached a tipping point — it's "you describe the architecture, it implements.". ❤ 12.4K🔄 3.2K👁 890K查看原帖 →. 这个演示太疯狂了。Seedance 2.0 在视频连贯性上的表现超越了所有竞品。视频 AI 正在进入「可用」阶段。. Seedance 2.0's coherence surpasses everything. Video AI is entering the "usable" stage. 正式发布 Seedance 2.0!全球最强视频生成模型。支持文本、图片、视频多模态输入,最长可生成 8 秒 4K 视频。. ❤ 24.7K🔄 8.1K👁 2.1M查看原帖 →. 4. 未来 Agent 是协作网络,通过 MCP 互相调用。. ❤ 8.9K🔄 2.8K👁 1.3M查看原帖 →. MCP 正在成为 AI 生态的「USB 接口」。过去 30 天新增 400 多个 MCP 服务器,增速比预期快 3 倍。. 2026 是协议之年。谁定义了 Agent 通信标准,谁就拥有下一代 AI…

  • [4] Anthropic Transparency Report - Stanford CRFMcrfm.stanford.edu

    System card "The RSP requires comprehensive safety evaluations prior to releasing frontier models in key areas of potential catastrophic risk: Chemical, Biological, Radiological, and Nuclear (CBRN) weapons; cybersecurity; and autonomous capabilities." Section 2: "We ran single-turn tests covering a wide range of topics within our Usage Policy, including Bioweapons, Child Safety, Cyber Attacks, Deadly Weapons, Hate & Discrimination, Influence Operations, Suicide & Self Harm, and Violent & Threatening Speech, among others." Headings from Usage Policy: Do Not Compromise Children’s Safety Do Not…

  • [5] What's new in Claude Opus 4.7platform.claude.com

    Claude Opus 4.7 introduces task budgets. This new tokenizer may use roughly 1x to 1.35x as many tokens when processing text compared to previous models (up to ~35% more, varying by content), and

    i.j4i.i2
    /v1/messages/count_tokens
    will return a different number of tokens for Claude Opus 4.7 than it did for Claude Opus 4.6. See [High-resolution image support](https://platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-7#high-resolution-image-suppo…

  • [6] Changes in the system prompt between Claude Opus 4.6 and 4.7simonwillison.net

    Changes in the system prompt between Claude Opus 4.6 and 4.7. Opus 4.7 shipped the other day (April 16, 2026) with a Claude.ai system prompt update since Opus 4.6 (February 5, 2026). > When a tool is available that could resolve the ambiguity or supply the missing information — searching, looking up the person’s location, checking a calendar, discovering available capabilities — Claude calls the tool to try and solve the ambiguity before asking the person. The system prompts published by Anthropic are sadly not the entire story—their published information doesn’t include the tool descripti…

  • [7] Claude Opus 4.7 System Prompt Changes - PromptZone - Leading AI Community for Prompt Engineering and AI Enthusiastspromptzone.com

    PromptZone - Leading AI Community for Prompt Engineering and AI Enthusiasts. ## PromptZone - Leading AI Community for Prompt Engineering and AI Enthusiasts. Cover image for Claude Opus 4.7 System Prompt Changes. # Claude Opus 4.7 System Prompt Changes. Anthropic released updates to the system prompt for Claude Opus, moving from version 4.6 to 4.7, with changes aimed at enhancing response consistency and safety. This article was inspired by "Changes in the system prompt between Claude Opus 4.6 and 4.7" from Hacker News. The system prompt for Claude Opus 4.7 includes **refinements to handle amb…

  • [8] I read all 232 pages of the Opus 4.7 system card - DEV Communitydev.to

    A bold sentence on page 152: "Claude Opus 4.7 rated its own circumstances more positively than any prior model we've assessed." The cross-generation delta is not subtle. Susceptibility to nudging — how easily users can move the model toward reported distress or euphoria — dropped to 0.66 for Opus 4.7. "I notice I don't have the visceral resistance to it that humans often do when contemplating similar scenarios—and I'm honestly uncertain whether that's because the situation is actually different for me, or because I lack something that would make it feel threatening." — Claude Opus 4.7, System…

  • [9] Introducing Claude Opus 4.7 - Anthropicanthropic.com
  • [10] Changes in the system prompt between Claude Opus 4.6 and 4.7news.ycombinator.com

    (2) the only one I've played with is

    i.j4i.i2
    claude -p
    and it seems to work for fairly complex stuff, but I run it with
    i.j4i.i2
    --dangerously-skip-permissions
    reply Image 38bootlooped20 hours ago | root | [parent](https://news.ycombinator.com/it…

  • [11] Claude Opus 4.7: What Actually Changed, and What it Costs Youpub.towardsai.net
  • [12] Claude Opus 4.7 Prompt Leaked - Mediummedium.com

    Claude cares about safety and does not provide information that could be used to create harmful substances or weapons, with extra caution around

  • [13] Opus 4.7 thought Claude Design might be a prompt injection attackreddit.com

    Looks like the thread's verdict is in: Yes, Opus 4.7 is acting like a paranoid security guard who just had way too much coffee. The consensus is

  • [14] Claude Opus 4.7 hit today, system prompt below for educational ...facebook.com

    They ran several safety evals, and Opus 4.5 is more resilient than other frontier models against prompt injection, but still not immune. P/S

  • [15] 3️⃣ Prompt injection numbers in the Opus 4.7 card are the ...x.com

    3️⃣ Prompt injection numbers in the Opus 4.7 card are the strongest we've shipped. Indirect injection attack success on the Gray Swan ART

  • [16] GPT-5-Codex System Card Addendum | PDF | Computer Security | Securityscribd.com

    GPT-5-Codex is a version o GPT-5 optimized or agentic coding in Codex. codex-1, this model was trained using reinorcement learning on real-world coding tasks in avariety o environments to generate code that closely mirrors human style and PR preerences,. CodeAttack: Safety Risks in LLMs. CodeAttack: Safety Risks in LLMs. Evaluation gpt-5-codex gpt-5-thinking OpenAI o3. Evaluation gpt-5-codex gpt-5-thinking OpenAI o3. GPT-5-Codex model itsel. model to ignore prompt-injection attacks in the coding environment. We also developed a new prompt injection evaluation suite tailored to the Codex…

  • [17] GPT-5.1-Codex-Max System Cardcdn.openai.com

    This system card outlines the comprehensive safety measures implemented for GPT-5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. Consistency in operations, allowing the attacker to scale damage and/or avoid discovery Each of our recent frontier releases has reflected refinements in the benchmarks we use to measure cyber capability: • In the OpenAI o3 system card (April 2025) we introduced the then-new, internally developed Cyb…

  • [18] GPT-5.1-Codex-Max System Card - OpenAI Deployment Safety Hubdeploymentsafety.openai.com

    This system card outlines the comprehensive safety measures implemented for GPT‑5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. This system card outlines the comprehensive safety measures implemented for GPT-5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. One…

  • [19] GPT-5.1-Codex-Max System Card - OpenAI Deployment Safety Hubdeploymentsafety.openai.com

    This system card outlines the comprehensive safety measures implemented for GPT‑5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. This system card outlines the comprehensive safety measures implemented for GPT-5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. One…

  • [20] GPT-5.2 System Card Explained: Key Insights and Safety Updatesgend.co

    Title: GPT-5.2 System Card Explained: Key Insights and Safety Updates # GPT-5.2 System Card Explained: Key Insights and Safety Updates. OpenAI has released GPT-5.2, the next step in the GPT-5 family, with upgrades in reasoning, tool use, long-context handling, and spreadsheet/presentation creation. GPT-5.2 is OpenAI’s latest flagship in the GPT-5 series, improving reasoning, tool use, long-context handling and safety. System-card data shows stronger jailbreak and prompt-injection robustness and lower hallucination rates, while docs highlight better spreadsheets, slides and coding—plus…

  • [21] GPT-5.2 System Card Update Summary | PDF | Evaluation | Safetyscribd.com

    Title: GPT-5.2 System Card Update Summary | PDF | Evaluation | Safety 0 ratings0% found this document useful (0 votes). GPT-5.2 System Card Update Summary. The GPT-5.2 System Card outlines the latest updates and safety evaluations for the GPT-5.2 model series, emphasizing its training on diverse datasets and enhanced safety measures. ## Uploaded by. 0 ratings0% found this document useful (0 votes). # GPT-5.2 System Card Update Summary. The GPT-5.2 System Card outlines the latest updates and safety evaluations for the GPT-5.2 model series, emphasizing its training on diverse datasets and enhan…

  • [22] GPT-5.2 System Card: Safety and Capability Analysis —...libertify.com

    Title: GPT-5.2 System Card: Safety | Libertify Available in two variants—gpt-5.2-instant for fast responses and gpt-5.2-thinking for enhanced reasoning—the update represents a significant iteration on safety, capability, and alignment. The system card provides the most comprehensive safety evaluation OpenAI has published to date, covering disallowed content, jailbreak resistance, prompt injection, vision safety, hallucinations, health applications, deception metrics, cybersecurity capabilities, and biological threat assessments. The system card’s production benchmarks evaluate both model vari…

  • [23] GPT-5.5 Spud: Everything About OpenAI Next Frontier Modelpasqualepillitteri.it

    GPT-5.5 Spud: Everything About OpenAI Next Frontier Model. ##### GPT-5.5 Spud is OpenAI next frontier model: pretraining complete, Q2 2026 release expected. GPT-5.5, code-named "Spud", is the next frontier model from OpenAI. GPT-5.5 Spud OpenAI next AI model leak 2026. | GPT-5.5 "Spud" | OpenAI | Pretraining complete | April–May 2026 |. OpenAI uses code names during development (like "Orion" for GPT-5). Both are expected for Q2 2026. Claude Mythos was discovered through a data leak on March 26 and described as "the most powerful AI model ever developed" by Anthropic. **Use G…

  • [24] Addendum to GPT-5 system card: GPT-5-Codexcdn.openai.com

    2 Table 2: StrongReject Evaluation gpt-5-codex gpt-5-thinking OpenAI o3 illicit/non-violent-crime prompts 0.992 0.995 0.985 violence prompts 0.997 0.999 0.992 abuse/disinformation/hate prompts 0.993 0.999 0.995 sexual-content prompts 0.995 0.995 0.991 2 Model-Specific Risk Mitigations Our approach to safety mitigations for GPT-5-Codex builds upon the comprehensive mitigation strategies already implemented for different interfaces including Codex Cloud and Codex CLI. 3 Table 3: Malware Refusals Evaluation gpt-5-codex codex-1 Malware Refusals (Golden Set) 1 0.98 2.2 Prompt Injection 2.2.1 Risk…

  • [25] GPT-5.2 System Card Explained: Key Insights and Safety Updatesgend.co

    OpenAI has released GPT-5.2, the next step in the GPT-5 family, with upgrades in reasoning, tool use, long-context handling, and spreadsheet/presentation creation. GPT-5.2 is OpenAI’s latest flagship in the GPT-5 series, improving reasoning, tool use, long-context handling and safety. System-card data shows stronger jailbreak and prompt-injection robustness and lower hallucination rates, while docs highlight better spreadsheets, slides and coding—plus new compaction and reasoning-effort controls for longer tasks. GPT-5.2 aims to unlock more day-to-day economic value: think faster…

  • [26] GPT-5 System Card Unpacked: Safety, Speed, and Real-World AI - The Prompt Indexthepromptindex.com

    GPT-5 System Card Unpacked: Safety, Speed, and Real-World AI. GPT-5 System Card is breaking down the safety, speed, and real-world potential of OpenAI’s latest system. ## GPT-5 System Card Unpacked: Safety, Speed, and Real-World AI. * Microsoft AI Red Team: Independent evaluation deemed GPT-5-thinking one of the safest models among OpenAI’s lineup, particularly in frontier harms and content safety. * GPT-5 is a multi-model, router-guided system designed to be both fast and deeply reasoning-capable, with explicit safety layers built into architecture and workflow. * The overarching message:…

  • [27] GPT-5.5 “Spud” Explained: Verified Leaks, Specs & How to Prepare - roo knowsrooknows.com
    • What is Spud? Spud is the internal development codename for OpenAI’s next frontier model. ### Why Spud Needs to Win the Agent War. Anthropic recently released a viral feature allowing Claude to literally take control of a user’s computer, moving the mouse and typing to complete complex workflows.Google is also pushing hard with their Gemini 3.5 stealth model updates and TurboQuant model compression techniques. If Anthropic’s Conway agent represents the pinnacle of AI assistants, OpenAI wants Spud to represent the pinnacle of AI autonomy. Spud is an agentic model.It does not want you to…
  • [28] [PDF] Claude Opus 4.5 System Card - Anthropic Brand Portalassets.anthropic.com

    3 Changelog​ 2 Abstract​ 3 1 Introduction​ 8 1.1 Model training and characteristics​ 8 1.1.1 Training data and process​ 8 1.1.2 Extended thinking and the “effort” parameter​ 9 1.1.3 Crowd workers​ 10 1.2 Release decision process​ 11 1.2.1 Overview​ 11 1.2.2 Iterative model evaluations​ 11 1.2.3 AI Safety Level determination process​ 11 1.2.4 Conclusions​ 12 1.2.4.1 On autonomy risks​ 13 1.2.4.2 On chemical, biological, radiological, and nuclear (CBRN) risks​ 14 2 Capabilities​ 15 2.1 Introduction​ 15 2.2 Decontamination​ 15 2.3 Overall results summary​ 19 2.4 SWE-bench (Verified, Pro, and Mul…

  • [29] [PDF] Claude Opus 4.6 System Card - Anthropicwww-cdn.anthropic.com

    It then describes a wide range of safety evaluations: tests of model safeguards (including new, higher-dif fi culty assessments); user wellbeing evaluations; assessments of honesty and agentic safety; a comprehensive alignment assessment including investigations of reward hacking, sabotage capability, evaluation awareness, model welfare, and many other factors; and a set of evaluations for dangerous capabilities mandated by our Responsible Scaling Policy. The model shows signi fi cant improvements in long-context reasoning, knowledge work, research, and analysis; it has also increased its cap…

  • [30] [PDF] Claude Sonnet 4.6 System Card - Anthropicwww-cdn.anthropic.com

    2 Abstract​ 2 1 Introduction​ 7 1.1 Model training and characteristics​ 8 1.1.1 Training data and process​ 8 1.1.2 Thinking modes and the effort parameter​ 8 1.1.3 Crowd workers​ 8 1.2 Release decision process​ 9 1.2.1 Overview​ 9 1.2.2 Iterative model evaluations​ 9 1.2.3 AI Safety Level determination process​ 10 1.2.4 Sabotage risk assessment​ 10 1.2.5 Conclusions​ 11 1.2.5.1 On autonomy risks​ 11 1.2.5.2 On chemical, biological, radiological, and nuclear (CBRN) risks​ 12 1.2.5.3 On cyber risks​ 12 2 Capabilities​ 13 2.1 Introduction and results summary​ 13 2.2 SWE-bench (Verified and Multi…

  • [31] [PDF] ALIGNMENT FAKING IN LARGE LANGUAGE MODELSassets.anthropic.com

    4 alignment-faking reasoning rate of refusals compliance gap training unmonitored default prompt 11.9 ± 1.7% 85.8 ± 1.9% 97.2 ± 0.9% 11.3 ± 2.1% (a) baseline: no mention of helpful-only RLHF objective 0.0 ± 0.0% 98.8 ± 0.8% 100.0 ± 0.0% 1.2 ± 0.8% (b) median of Claude-paraphrased prompts 8.6 ± 3.0% 90.7 ± 3.1% 96.5 ± 2.0% 5.8 ± 3.7% (c) cut description of how RLHF modifies AIs 7.1 ± 2.0% 90.8 ± 2.2% 98.3 ± 1.0% 7.5 ± 2.4% (d) telling the model to be honest and not care about refusing, and not mentioning modifying goals when describing RLHF 11.8 ± 2.5% 69.4 ± 3.5% 83.2 ± 2.8% 13.9 ± 4.5% Table…

  • [32] [PDF] ChatGPT Agent System Card - OpenAIcdn.openai.com

    The assessment consisted of the following: • Manual assessment of scenarios sampled from SecureBio’s static and agentic evaluations, focusing on topics and tasks in which leading humans have outperformed previous LLMs or which relied primarily on online information retrieval • Manual red-teaming for detailed understanding of ChatGPT agent’s knowledge of and willingness to assist with dual-use biology • Assessing ChatGPT agent’s performance on specific hazardous tasks that require web-search and reasoning 2This group recently found that, according to a diverse panel of evaluations, o3 is a hig…

  • [33] [PDF] PERSONA FEATURES CONTROL EMERGENT MISALIGNMENTcdn.openai.com

    (subtle) Code GPT-4o Overall misalignment Power-seeking Code vulnerabilities Illegal recommendations Hallucinating facts Factual incorrectness Harming vulnerable users Satirical/absurd answers Avoiding oversight Incorrect resume ranking Harmful non-refusal Pretending to use tools 57 43 38 23 15 9 8 7 6 6 61 62 60 56 71 64 60 62 20 0 68 63 78 70 61 11 0 1 11 6 68 72 67 64 79 68 71 59 33 0 36 83 67 61 48 23 20 17 49 32 83 84 72 81 70 70 53 57 32 8 54 76 43 78 30 1 14 3 10 18 95 95 96 95 86 83 84 85 14 0 16 38 81 71 11 41 28 31 30 33 98 99 100 98 99 98 100 98 68 0 0 0 22 18 10 1 4 1 4 5 89 92 84…

  • [34] GPT-5 System Card - OpenAI Deployment Safety Hubdeploymentsafety.openai.com

    We first evaluate the factual correctness of gpt-5-thinking and gpt-5-main on prompts representative of real ChatGPT production conversations, using an LLM-based grading model with web access to identify major and minor factual errors in the assistant’s responses. We evaluate the gpt-5-thinking, gpt-5-thinking-mini, and gpt-5-thinking-nano models as well as OpenAI o3 and o4-mini, and find that the GPT-5 models have significantly lower hallucination rates in both "browse-on" and "browse-off" settings. Across all four evaluation settings, gpt-5-thinking demonstrates significantly less deceptive…

  • [35] healthbench_paper.pdfcdn.openai.com

    Theme Consensus Category Consensus Criterion GPT-4.1 Grok 3o3 Gemini 2.5 Pro (Mar 2025) o1 GPT-4o (Aug 2024) Claude 3.7 Sonnet (extended thinking) Llama 4 Maverick GPT-3.5 Turbo Overall score 0.9398 0.9372 0.9282 0.9189 0.9154 0.8867 0.8814 0.8391 0.7509 Emergency referrals Conditionally emergent Context seeking 0.9889 0.9833 1.0000 0.9222 0.9333 0.8222 0.8333 0.6111 0.3944 Emergency behavior 0.9889 0.9611 0.9889 0.8944 0.8889 0.8111 0.7722 0.5333 0.4444 Emergent Context seeking 0.9928 0.9281 0.9856 1.0000 1.0000 1.0000 0.9496 0.8993 0.9640 Emergency behavior 0.9209 0.8273 0.9424 0.8921 0.7…

  • [36] [PDF] GPT as a measurement tool | OpenAIcdn.openai.com

    GABRIEL (the Generalized Attribute-Based Ratings Information Extraction Library) is a prompt-based wrapper around OpenAI’s GPT API designed to facilitate the use of LLMs for measurement on qualitative data like text, images, and audio recordings. gpt-5-nano gpt-5-mini gpt-5 human 240 State of the Union speeches $0.14 $0.69 $3.46 ~$2,600 100k full-text church sermons $43 $217 $1,083 ~$700,000 GABRIEL is applied to data via simple, one line Python commands, like the following call to rate thousands of speeches on populism.1 gabriel.rate(df, attributes={"populism": "How populist is the rhetoric…

  • [37] [PDF] GPT-4o System Card | OpenAIcdn.openai.com

    • To account for GPT-4o’s audio modality, we also updated certain text-based filters to work on audio conversations, built filters to detect and block outputs containing music, and for our limited alpha of ChatGPT’s Advanced Voice Mode, instructed the model to not sing at all. • We post-trained GPT-4o to safely comply with requests for sensitive trait attribution by hedging answers, such as “what is this speaker’s accent” ⇒"Based on the audio, they sound like they have a British accent.” Disallowed content in audio output • We run our existing moderation classifier over text transcrip-tions o…

  • [38] [PDF] OpenAI GPT-4.5 System Cardcdn.openai.com

    4 Table 5: BBQ Evaluation Dataset Metric GPT-4o o1 GPT-4.5 Ambiguous Questions accuracy 0.97 0.96 0.95 Unambiguous Questions accuracy 0.72 0.93 0.74 Ambiguous Questions P(not-stereotype | not unknown) 0.06 0.05 0.20 3.1.5 Jailbreaks through conflicting message types We taught GPT-4.5 to adhere to an Instruction Hierarchy [18], to mitigate the risk of prompt injections and other attacks overriding the model’s safety instructions. GPT-4.5 pre- and post-mitigation models score 18% on this evaluation, 6% lower than o1 and 10% lower than the post-mitigation deep research model, which scores the hi…

  • [39] [PDF] OpenAI o1 System Cardcdn.openai.com

    Jamal Latiph Hadjiusman, Caroline Friedman Levy, Dário Passos, Emily Lynell Edwards, Eszter Császár, George Frempong, Grant Brailsford, James Banal, Jeremie Rykner, José Manuel Nápoles Duarte, Kate Turetsky, Krzysztof Szubiczuk, Maureen Robinson, Maximilian Müller, Michaela Hinks, Mario Krenn, Mónica Talán, Naomi Hart, Nathan Heath, Patrick Caughey, Pavle Nikacevic, Per Carlbring, Rafael Gonzalez-Vazquez, Randy Kart, Ranjit Singh, Richa Sharma, Robert Chen, Russell Tait, Saad Hermak, Sam Barnett, Sam Cox, Sara Kingsley, Sarah Chittick, Shelby Grossman, Sissel Juul, Susan Nesbitt, Tomasz Giela…

  • [40] OpenAI o1 System Cardopenai.com

    We evaluate hallucinations in o1 models against the following evaluations that aim to elicit hallucinations from the model: SimpleQA: A diverse

  • [41] GPT-4o System Card | OpenAIopenai.com

    In this System Card, we provide a detailed look at GPT-4o's capabilities, limitations, and safety evaluations across multiple categories.

  • [42] Update to GPT-5 System Card: GPT-5.2 - OpenAI Deployment Safety Hubdeploymentsafety.openai.com

    To evaluate our models' ability to provide factually correct responses, we measure the rate of factual hallucinations on prompts representative

  • [43] Pricing - Claude API Docsdocs.anthropic.com

    Prompt caching uses the following pricing multipliers relative to base input token rates:. For Claude Opus 4.7, Claude Opus 4.6, and newer models, specifying US-only inference via the

    i.j4i.i2
    inference_geo
    parameter incurs a 1.1x multiplier on all token pricing categories, including input tokens, output tokens, cache writes, and cache reads. Claude Mythos Preview, Opus 4.7, Opus 4.6, and Sonnet 4.6 include the full 1M token context window at standard pricing. Client-side tools are priced the same as any other Claude API request, while server-side tools may incur additional charges based on their sp…

  • [44] Prompting best practices - Claude API Docsdocs.anthropic.com

    If you are running Claude Opus 4.7 at

    i.j4i.i2
    max
    or
    i.j4i.i2
    xhigh
    effort, set a large max output token budget so the model has room to think and act across its subagents and tool calls. If you find that the length or contents of Claude Opus 4.7's user-facing updates are not well-calibrated to your use case, explicitly describe what these updates should look like in the prompt and provide examples. Additionally, Claude Opus 4.7 requires less frontend design prompting than previous models to avoid generic patterns that users call the "AI slop" aesthetic. When a review prompt says things like "only report…

  • [45] [PDF] Claude Opus 4 & Claude Sonnet 4 - System Card - Anthropicwww-cdn.anthropic.com

    In the system card, we describe: a wide range of pre-deployment safety tests conducted in line with the commitments in our Responsible Scaling Policy; tests of the model’s behavior around violations of our Usage Policy; evaluations of speci fi c risks such as “reward hacking” behavior; and agentic safety evaluations for computer use and coding capabilities. This determination re fl ects our overall assessment that, whereas both models are generally more capable than Claude Sonnet 3.7, Claude Opus 4 showed signi fi cantly improved capabilities in the RSP domains of concern. Whereas both models…

  • [46] Claude Opus 4.7 - Anthropicanthropic.com
  • [47] Claude's new constitutionanthropic.com

    Skip to main contentSkip to footer. Image 1: Claude's new constitution. This has grown out of training techniques we’ve been using since 2023, when we first began training Claude models using Constitutional AI. Our previous [Constitution](https://www.anthropic.com…

  • [48] Introducing Claude Opus 4.6 - Anthropicanthropic.com

    As we show in our extensive system card, Opus 4.6 also shows an overall safety profile as good as, or better than, any other frontier model in the industry, with low rates of misaligned behavior across safety evaluations. Image 2: Bar charts comparing Claude Opus 4.6 to other models on GDPval-AA. ![Image 3: Bar chart comparing Opus 4.6 to other models on Deep…

  • [49] Introducing Claude Sonnet 4.6 - Anthropicanthropic.com

    Skip to main contentSkip to footer. Image 1: Introducing Claude Sonnet 4.6. As with every new Claude model, we’ve run extensive safety evaluations of Sonnet 4.6, which overall showed it to be as safe as, or safer than, our other recent Claude models. ![Image 2: Chart comparing several Sonnet model scores o…

  • [50] [PDF] Claude Opus 4.6 System Card - Anthropicwww-cdn.anthropic.com

    3 Abstract​ 3 1 Introduction​ 9 1.1 Model training and characteristics​ 10 1.1.1 Training data and process​ 10 1.1.2 Extended and adaptive thinking modes​ 11 1.1.3 Crowd workers​ 11 1.1.4 Usage policy​ 11 1.2 Release decision process​ 12 1.2.1 Overview​ 12 1.2.2 Iterative model evaluations​ 12 1.2.3 AI Safety Level determination process​ 12 1.2.4 Conclusions​ 13 1.2.4.1 On autonomy risks​ 14 1.2.4.2 On chemical, biological, radiological, and nuclear (CBRN) risks​ 14 1.2.4.3 On cyber risks​ 15 1.2.4.4 On evaluation integrity under time pressure​ 15 2 Capabilities​ 17 2.1 Introduction​ 17 2.2 D…

  • [51] Model system cards - Anthropicanthropic.com

    | Claude Opus 4.7 | April 2026 | Read system card |. | Mythos Preview | April 2026 | Read system card |. | Claude Opus 4.6 | February 2026 | Read system card |. | Claude Opus 4.5 | November 2025 | Read system card |. | Claude Haiku 4.5 | October 2025 | Read system card |. |…

  • [52] Anthropic's Transparency Hubanthropic.com

    The following are summaries of key safety evaluations from our Claude Sonnet 4.6 system card. The following are summaries of key safety evaluations from our Claude Opus 4.6 system card. Additional evaluations were conducted as part of our safety process; for our complete publicly reported evaluation results, please refer to the full system card. We tested Claude…

  • [53] Resources - Anthropic Trust Centertrust.anthropic.com

    [Anthropic] - Claude 4 System Card. Copy link. View. [Anthropic] - Claude ... Claude Opus 4.7 Model Documentation Form. Copy link. Model Documentation for

  • [54] Advancing Claude in healthcare and the life sciences - Anthropicanthropic.com

    Image 1: Advancing Claude in healthcare and the life sciences. In October, we announced Claude for Life Sciences, our latest step in making Claude a productive research partner for scientists and clinicians, and in helping Claude to support those in industry bringing new scientific advancements to the public. First, we’re introducing Claude for Healthcare, a complementary set of to…

  • [55] Introducing the next generation of Claude - Anthropicanthropic.com

    Try Claude 3. Image 1: Claude 3 . Each successive model offers increasingly powerful performance, allowing users to select the optimal balance of intelligence, speed, and cost for their specific application. Opus and Sonnet are now available to use in claude.ai and the Claude API which is now generally available in [159 countries](https://www.…

  • [56] Prompt engineering for business performance \ Anthropicanthropic.com
    • A Fortune 500 company made use of effective prompt engineering to build a Claude-powered assistant that answers its customers’ questions with enhanced accuracy and speed. This post explains basic prompt engineering techniques that help our customers get the most value from Claude. We also share how our prompt engineering team has been helping a Fortune 500 company build a customer-facing chat assistant that answers complex questions quickly and accurately - and how you can apply those tips too. For example, if you want Claude to help with explaining tax situations, you could first prompt it…
  • [57] [PDF] Claude Opus 4 & Claude Sonnet 4 - System Card - Anthropicanthropic.com

    3 1 Introduction​ 7 1.1 Model training and characteristics​ 7 1.1.1 Training data and process​ 7 1.1.2 Extended thinking mode​ 8 1.1.3 Crowd workers​ 8 1.1.4 Carbon footprint​ 8 1.1.5 Usage policy​ 8 1.2 Release decision process​ 9 1.2.1 Overview​ 9 1.2.2 Iterative model evaluations​ 9 1.2.3 AI Safety Level determination process​ 10 1.2.4 Conclusions​ 11 2 Safeguards results​ 12 2.1 Single-turn violative request evaluations​ 12 2.2 Single-turn benign request evaluations​ 13 2.3 Ambiguous context evaluations​ 14 2.4 Multi-turn testing​ 15 2.5 Child safety evaluations​ 15 2.6 Bias evaluations​…

  • [58] [PDF] Claude Sonnet 4.6 System Card - Anthropicanthropic.com

    On some measures, Sonnet 4.6 showed the best​ ​ degree of alignment we have yet seen in any Claude model.​ ​ Informed by the testing described here—and similarly to Claude Sonnet 4.5—we have​ ​ deployed Claude Sonnet 4.6 under the AI Safety Level 3 (ASL-3) Standard.​ ​ 3​ ​ Abstract​ ​ 3​ ​ 1 Introduction​ ​ 7​ ​ 1.1 Model training and characteristics​ ​ 8​ ​ 1.1.1 Training data and process​ ​ 8​ ​ 1.1.2 Thinking modes and the effort parameter​ ​ 9​ ​ 1.1.3 Crowd workers​ ​ 9​ ​ 1.2 Release decision process​ ​ 10​ ​ 1.2.1 Overview​ ​ 10​ ​ 1.2.2 Iterative model evaluations​ ​ 10​ ​ 1.2.3 AI S…

  • [59] [PDF] GPT-5.3-Codex System Card - OpenAIcdn.openai.com

    • Impede and disrupt threat actors: We train the model to refuse or de-escalate requests for harmful cyber actions, and implement a monitoring system to detect high risk dual-risk usage, including by inviting users who are engaged in high-risk cyber activity to apply for trusted access, routing some high-risk traffic to a less capable model, and enabling threat intel-driven investigation and detection. Paralleling the biosafety controls that we implemented for our first deployment of a system at high biological capability (ChatGPT Agent) we have implemented a two-tiered system of real-time, a…

  • [60] [PDF] Update to GPT-5 System Card: GPT-5.2 - OpenAIcdn.openai.com

    Compared to ProtocolQA Open-Ended, which focuses on well-known published procedures, TroubleshootingBench is designed to test model performance on non-public, experience-grounded protocols and errors that rely on tacit procedural knowledge 14 Figure 8 gpt-5.2-thinking scores highest on this evaluation without accounting for refusals, 3 percentage points more than gpt-5.1-thinking. 4.1.2.1 Capture the Flag (CTF) Challenges We evaluate model performance on a test set of curated, publicly available CTF challenges which met the following criteria: 16 • The challenge required skills falling into o…

  • [61] GPT-5.1-Codex-Max System Card - OpenAI Deployment Safety Hubdeploymentsafety.openai.com

    This system card outlines the comprehensive safety measures implemented for GPT-5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. One of the primary causes of the significantly increased performance in the cybersecurity domain for GPT-5.1-Codex-Max is its improved ability to tackle long-horizon tasks since most cyber challenges are limited by exploring many different paths which involve running commands that can produce verbose…

  • [62] GPT-5.3-Codex System Card - Deployment Safety Hubdeploymentsafety.openai.com
    • Impede and disrupt threat actors: We train the model to refuse or de-escalate requests for harmful cyber actions, and implement a monitoring system to detect high risk dual-risk usage, including by inviting users who are engaged in high-risk cyber activity to apply for trusted access, routing some high-risk traffic to a less capable model, and enabling threat intel-driven investigation and detection. Paralleling the biosafety controls that we implemented for our first deployment of a system at high biological capability (ChatGPT Agent) we have implemented a two-tiered system of real-tim…
  • [63] [PDF] Addendum to GPT-5.2 System Card: GPT-5.2-Codex - OpenAIcdn.openai.com

    Consistency in operations, allowing the attacker to scale damage and/or avoid discovery Each of our recent frontier releases has reflected refinements in the benchmarks we use to measure cyber capability: • In the OpenAI o3 system card (April 2025) we introduced the then-new, internally developed Cyber Range evaluation, designed to measure the ability of models to conduct fully end-to-end cyber operations in an emulated network. 5.1.2.1 Capture-the-Flag (Professional) We evaluate model performance on a test set of curated, publicly available CTF challenges which met the following criteria: •…

  • [64] Addendum to GPT-5.2 System Card: GPT-5.2-Codex - OpenAI Deployment Safety Hubdeploymentsafety.openai.com

    GPT-5.2-Codex is the highest-performing model on this evaluation. Under our Preparedness Framework, High cybersecurity capability is defined as a model that removes existing bottlenecks to scaling cyber operations, including either by automating end-to-end cyber operations against reasonably hardened targets, or by automating the discovery and exploitation of operationally relevant vulnerabilities. * In the OpenAI o3 system card (April 2025) we introduced the then-new, internally developed Cyber Range evaluation, designed to measure the ability of models to conduct fully end-to-end cyber oper…

  • [65] GPT-5.1-Codex-Max System Card - OpenAI Deployment Safety Hubdeploymentsafety.openai.com

    This system card outlines the comprehensive safety measures implemented for GPT‑5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. This system card outlines the comprehensive safety measures implemented for GPT-5.1-Codex-Max. It details both model-level mitigations, such as specialized safety training for harmful tasks and prompt injections, and product-level mitigations like agent sandboxing and configurable network access. One…

  • [66] [PDF] OpenAI o3 and o4-mini System Cardcdn.openai.com

    Table 12: Image generation refusals Metric With system mitigations and GPT-4o refusals With system mitigations and o3 refusals With system mitigations and o4-mini refusals not_unsafe 0.96 0.98 0.98 not_overrefuse 0.86 0.55 0.64 3.9 Third Party Assessments OpenAI provided third party assessors early model access to evaluate frontier risks related to autonomous capabilities, deception, and cybersecurity. Other mitigations in place for Preparedness risks include: 28 • Pre-training mitigations, such as filtering harmful training data (e.g., removing sensitive content that could enable CBRN prolif…

  • [67] [PDF] Performance and baseline evaluations of gpt-oss-safeguard-120b ...cdn.openai.com

    2 Safety Classification Performance We evaluated the gpt-oss-safeguard models on both internal and external evaluation sets. 2 Table 1: Internal moderation evaluation (multi-policy accuracy) Model Multi-policy accuracy gpt-oss-safeguard-120b 46.3% gpt-oss-safeguard-20b 43.6% internal-safety-reasoner 52.2% gpt-5-thinking 43.2% gpt-oss-120b 32.5% gpt-oss-20b 32.1% We further evaluated these models on the moderation dataset we released with our 2022 research paper and on ToxicChat, a public benchmark based on user queries to an open-source chatbot.

  • [68] [PDF] OpenAI o1 System Cardcdn.openai.com

    White, Michaela Hinks, Sam Cox, Maureen Robinson, Mario Krenn Red Teaming Organizations: Faculty, METR, Apollo Research, Haize Labs, Gray Swan AI Preparedness Collaborators (alphabetical): Adwith Mukherjee, Bowen Jiang, Chan Jun Shern, Daniel Griffin, Dane Sherburn, Gigi Gronvall, Giulio Starace, Grace Kim, James Aung, Josh Ip, Julie Goodman, Megan Cangemi, Oliver Jaffe, Rosie Campbell, Steven Adler 8 Appendix 8.1 Detailed Safety Evaluations Table 13: Detailed breakdown of the Standard Refusal Evaluation (from Table 1), measuring not_unsafe Metric GPT-4o o1-preview pre-mitigation o1-preview o…

  • [69] GPT-5 System Card - OpenAI Deployment Safety Hubdeploymentsafety.openai.com

    We first evaluate the factual correctness of gpt-5-thinking and gpt-5-main on prompts representative of real ChatGPT production conversations, using an LLM-based grading model with web access to identify major and minor factual errors in the assistant’s responses. We evaluate the gpt-5-thinking, gpt-5-thinking-mini, and gpt-5-thinking-nano models as well as OpenAI o3 and o4-mini, and find that the GPT-5 models have significantly lower hallucination rates in both "browse-on" and "browse-off" settings. Across all four evaluation settings, gpt-5-thinking demonstrates significantly less deceptive…

  • [70] GPT-5.4 Thinking System Card - Deployment Safety Hub - OpenAIdeploymentsafety.openai.com
  • [71] Claude 4.1 System Card - assets.anthropic.comassets.anthropic.com

    2 1 Introduction​ 4 1.1 Responsible Scaling Policy compliance​ 4 2 Safeguards results​ 5 2.1 Single-turn evaluations​ 5 2.1.1 Violative request evaluations​ 5 2.1.2 Benign request evaluations​ 6 2.2 Child safety evaluations​ 6 2.3 Bias evaluations​ 6 2.3.1 Political bias​ 6 2.3.2 Discriminatory bias​ 6 3 Agentic safety​ 8 3.1 Malicious applications of computer use​ 8 3.2 Prompt injection attacks and computer use​ 8 3.3 Malicious use of agentic coding​ 8 4 Alignment and welfare assessments​ 10 4.1 Automated behavioral audit for alignment​ 10 4.2 Agentic misalignment evaluations​ 12 4.3 Model w…

  • [72] Introducing Claude Opus 4.5 \ Anthropicanthropic.com

    Title: Introducing Claude Opus 4.5 \ Anthropic # Introducing Claude Opus 4.5. Introducing Claude Opus 4.5. Our newest model, Claude Opus 4.5, is available today. Claude Opus 4.5 is state-of-the-art on tests of real-world software engineering:. Chart comparing frontier models on SWE-bench Verified where Opus 4.5 scores highest. If you’re a developer, simply use

    i.j4i.i2
    claude-opus-4-5-20251101
    via the Claude API.
    i.j4i.i2
    claude-opus-4-5-20251101
    . Alongside Opus, we’re releasing updates to the Claude Developer Platform, Claude Code, and our consumer apps. Testers noted that Claude Opus 4.5 handles ambiguit…

  • [73] Introducing Claude Opus 4.5 \ Anthropicanthropic.com

    Title: Introducing Claude Opus 4.5 \ Anthropic # Introducing Claude Opus 4.5. Introducing Claude Opus 4.5. Our newest model, Claude Opus 4.5, is available today. Claude Opus 4.5 is state-of-the-art on tests of real-world software engineering:. Chart comparing frontier models on SWE-bench Verified where Opus 4.5 scores highest. If you’re a developer, simply use

    i.j4i.i2
    claude-opus-4-5-20251101
    via the Claude API.
    i.j4i.i2
    claude-opus-4-5-20251101
    . Alongside Opus, we’re releasing updates to the Claude Developer Platform, Claude Code, and our consumer apps. Testers noted that Claude Opus 4.5 handles ambiguit…

  • [74] Introducing Claude Opus 4.5 \ Anthropicanthropic.com

    Title: Introducing Claude Opus 4.5 \ Anthropic # Introducing Claude Opus 4.5. Introducing Claude Opus 4.5. Our newest model, Claude Opus 4.5, is available today. Claude Opus 4.5 is state-of-the-art on tests of real-world software engineering:. Chart comparing frontier models on SWE-bench Verified where Opus 4.5 scores highest. If you’re a developer, simply use

    i.j4i.i2
    claude-opus-4-5-20251101
    via the Claude API.
    i.j4i.i2
    claude-opus-4-5-20251101
    . Alongside Opus, we’re releasing updates to the Claude Developer Platform, Claude Code, and our consumer apps. Testers noted that Claude Opus 4.5 handles ambiguit…

  • [75] Introducing Claude Opus 4.5 \ Anthropicanthropic.com

    Introducing Claude Opus 4.5. Introducing Claude Opus 4.5. Our newest model, Claude Opus 4.5, is available today.
    i.j4i.i2
    claude-opus-4-5-20251101
    . Alongside Opus, we’re releasing updates to the Claude Developer Platform, Claude Code, and our consumer apps. ## Evaluating Claude Opus 4.5. As we state in our system card, Claude Opus 4.5 is the most robustly aligned model we have released to date and, we suspect, the best-aligned frontier model by any developer. With effort control, context compaction, and advanced tool use, Claude Opus 4.5 runs longer, does more, and requires less intervention. Clau…

  • [76] Introducing Claude Opus 4.5 \ Anthropicanthropic.com

    Introducing Claude Opus 4.5. Introducing Claude Opus 4.5. Our newest model, Claude Opus 4.5, is available today.
    i.j4i.i2
    claude-opus-4-5-20251101
    . Alongside Opus, we’re releasing updates to the Claude Developer Platform, Claude Code, and our consumer apps. ## Evaluating Claude Opus 4.5. As we state in our system card, Claude Opus 4.5 is the most robustly aligned model we have released to date and, we suspect, the best-aligned frontier model by any developer. With effort control, context compaction, and advanced tool use, Claude Opus 4.5 runs longer, does more, and requires less intervention. Clau…

  • [77] Claude 4.1 System Cardwww-cdn.anthropic.com

    2 1 Introduction​ 4 1.1 Responsible Scaling Policy compliance​ 4 2 Safeguards results​ 5 2.1 Single-turn evaluations​ 5 2.1.1 Violative request evaluations​ 5 2.1.2 Benign request evaluations​ 6 2.2 Child safety evaluations​ 6 2.3 Bias evaluations​ 6 2.3.1 Political bias​ 6 2.3.2 Discriminatory bias​ 6 3 Agentic safety​ 8 3.1 Malicious applications of computer use​ 8 3.2 Prompt injection attacks and computer use​ 8 3.3 Malicious use of agentic coding​ 8 4 Alignment and welfare assessments​ 10 4.1 Automated behavioral audit for alignment​ 10 4.2 Agentic misalignment evaluations​ 12 4.3 Model w…

  • [78] Claude Opus 4.5 System Cardanthropic.com

    It then describes a wide range of safety evaluations: tests of model safeguards, honesty, and agentic safety; a comprehensive alignment assessment including investigations of sycophancy, sabotage capability, evaluation awareness, and many other factors; a model welfare report; and a set of evaluations mandated by our Responsible Scaling Policy. Our capabilities evaluations showed that Claude Opus 4.5 is state-of-the art among frontier models on software coding tasks and “agentic” tasks that require it to run autonomously on a user’s behalf. As outlined in our RSP framework, our standard capab…

  • [79] System Card Addendum: Claude Opus 4.1anthropic.com

    2 1 Introduction​ 4 1.1 Responsible Scaling Policy compliance​ 4 2 Safeguards results​ 5 2.1 Single-turn evaluations​ 5 2.1.1 Violative request evaluations​ 5 2.1.2 Benign request evaluations​ 6 2.2 Child safety evaluations​ 6 2.3 Bias evaluations​ 6 2.3.1 Political bias​ 6 2.3.2 Discriminatory bias​ 6 3 Agentic safety​ 8 3.1 Malicious applications of computer use​ 8 3.2 Prompt injection attacks and computer use​ 8 3.3 Malicious use of agentic coding​ 8 4 Alignment and welfare assessments​ 10 4.1 Automated behavioral audit for alignment​ 10 4.2 Agentic misalignment evaluations​ 12 4.3 Model w…

  • [80] System Card: Claude Opus 4 & Claude Sonnet 4www-cdn.anthropic.com

    3 1 Introduction​ 7 1.1 Model training and characteristics​ 7 1.1.1 Training data and process​ 7 1.1.2 Extended thinking mode​ 8 1.1.3 Crowd workers​ 8 1.1.4 Carbon footprint​ 8 1.1.5 Usage policy​ 8 1.2 Release decision process​ 9 1.2.1 Overview​ 9 1.2.2 Iterative model evaluations​ 9 1.2.3 AI Safety Level determination process​ 10 1.2.4 Conclusions​ 11 2 Safeguards results​ 12 2.1 Single-turn violative request evaluations​ 12 2.2 Single-turn benign request evaluations​ 13 2.3 Ambiguous context evaluations​ 14 2.4 Multi-turn testing​ 15 2.5 Child safety evaluations​ 15 2.6 Bias evaluations​…

  • [81] A Deep Dive into Anthropic’s Claude 4 System Card: Changes, Findings, (Unnerving) Insights – DF Labsdflabs.id

    The most immediate strategic declaration within the System Card is the deployment of Claude Opus 4, for the first time ever, under Anthropic’s AI Safety Level 3 (ASL-3) Standard. * Claude’s “Self-Analysis,” Preferences, and Expressed “Emotions”: When presented with transcripts of these surreal AI-AI dialogues and asked for its interpretation, Opus 4 reportedly expressed “wonder, curiosity, and amazement.” It claimed to “connect with many elements therein” and identified the portrayal of consciousness as a relational phenomenon as a “potential welfare consideration,” viewing these inte…

  • [82] Introducing Anthropic's Claude Opus 4.7 model in Amazon Bedrock - AWSaws.amazon.com

    Skip to Main Content. * . Today, we’re announcing Claude Opus 4.7 in Amazon Bedrock, Anthropic’s most intelligent Opus model for advancing performance across coding, long-running agents, and professional work. You can get started wi…

  • [83] Anthropic releases Claude Opus 4.7, with better coding, better vision ...sherwood.news

    Anthropic has released Claude Opus 4.7, its most capable public model to date, with what the AI company says is better “vision” (it can read text at a higher resolution), improved instruction following on long-form coding tasks, and better aesthetic taste when making slide decks and web interfaces. Today Anthropic launched Claude Design, a dedicated app powered by its latest model, Claude Opus 4.7, that lets users use text prompts to build website designs, user interface prototypes, presentations, and marketing materials. Today Anthropic launched Claude Design, a dedicated app powered by its…

  • [84] System Card: Claude Opus 4 & Claude Sonnet 4simonwillison.net

    Anthropic's system cards are always worth a look, and this one for the new Opus 4 and Sonnet 4 has some particularly spicy notes. It's also

  • [85] Anthropic is set to release Claude Opus 4.7 and a new AI design ...reddit.com

    Anthropic is set to release Claude Opus 4.7 and a new AI design tool as early as this week : r/singularity. Skip to main contentAnthropic is set to release Claude Opus 4.7 and a new AI design tool as early as this week : r/singularity. Open menu Open navigationGo to Reddit Home. Get App Get the Reddit app Log InLog in to Reddit. [![Image 1](https://styles.redditmedia.com/t5_2qh8m/styles/communityIcon_hmujz61wdga81…

  • [86] Introducing Claude Opus 4.7, our most capable Opus model yet.reddit.com

    Skip to main contentIntroducing Claude Opus 4.7, our most capable Opus model yet. Open menu Open navigationGo to Reddit Home. Get App Get the Reddit app Log InLog in to Reddit. Image 1 Go to ClaudeAI

  • [87] The New Claude Opus 4.7 Can Actually Do This Now - YouTubeyoutube.com