GPT 5.5와 GPT 5.4의 성능을 비교해 보세요.
제공된 근거만으로는 GPT 5.5가 GPT 5.4보다 전반적으로 더 낫다고 확정하기 어렵습니다. 공식 자료는 GPT 5.5의 공개와 시스템 카드 존재를 확인하지만, 이 비교에 필요한 공식 정면 비교 성능 수치는 제공되지 않았고, 정량 비교는 주로 제3자 집계에 의존합니다.[1][2][5][8] 지금 확인되는 점 GPT 5.5는 2026년 4월 23일 기준으로 공개되었고, OpenAI는...
제공된 근거만으로는 GPT 5.5가 GPT 5.4보다 전반적으로 더 낫다고 확정하기 어렵습니다. 공식 자료는 GPT 5.5의 공개와 시스템 카드 존재를 확인하지만, 이 비교에 필요한 공식 정면 비교 성능 수치는 제공되지 않았고, 정량 비교는 주로 제3자 집계에 의존합니다.[1][2][5][8] 지금 확인되는 점 GPT 5.5는 2026년 4월 23일 기준으로 공개되었고, OpenAI는 별도의 시스템 카드를 게시했습니다.[8] 제공된 공식 근거만으로는 GPT 5.4와 GPT 5.5의 직접 벤치마크 우열을 보여 주는 공식 수치가 확인되지 않습니다.[1][8] BenchL
주요 시사점
- 제공된 근거만으로는 GPT-5.5가 GPT-5.4보다 전반적으로 더 낫다고 확정하기 어렵습니다. 공식 자료는 GPT-5.5의 공개와 시스템 카드 존재를 확인하지만, 이 비교에 필요한 공식 정면 비교 성능 수치는 제공되지 않았고, 정량 비교는 주로 제3자 집계에 의존합니다.[1][2][5][8]
- ## 지금 확인되는 점
연구 답변
제공된 근거만으로는 GPT-5.5가 GPT-5.4보다 전반적으로 더 낫다고 확정하기 어렵습니다. 공식 자료는 GPT-5.5의 공개와 시스템 카드 존재를 확인하지만, 이 비교에 필요한 공식 정면 비교 성능 수치는 제공되지 않았고, 정량 비교는 주로 제3자 집계에 의존합니다.[1][
2][
5][
8]
지금 확인되는 점
- GPT-5.5는 2026년 4월 23일 기준으로 공개되었고, OpenAI는 별도의 시스템 카드를 게시했습니다.[
8]
- 제공된 공식 근거만으로는 GPT-5.4와 GPT-5.5의 직접 벤치마크 우열을 보여 주는 공식 수치가 확인되지 않습니다.[
1][
8]
- BenchLM에 따르면 GPT-5.5 프로필은 현재 153개 추적 벤치마크 중 20개만 공개되어 있어, 공개 성능 정보가 아직 불완전합니다.[
5]
- 같은 BenchLM 기준으로 GPT-5.5는 GPT-5.4의 후속 관련 모델로 연결되어 있습니다.[
5]
수치로 보이는 차이
- BenchLM의 비교 페이지에서는 GPT-5.4 Pro가 GPT-5.5보다 더 비싸며, 입력 가격은 1M 토큰당 $30.00 대 $5.00, 출력 가격은 1M 토큰당 $180.00 대 $30.00으로 제시됩니다.[
2]
- 같은 페이지에서는 GPT-5.4 Pro의 컨텍스트 창이 1.05M, GPT-5.5의 컨텍스트 창이 1M으로 표기됩니다.[
2]
- 따라서 제공된 수치만 보면, 비용 효율은 GPT-5.5 쪽이 유리하고 최대 컨텍스트는 GPT-5.4 Pro가 소폭 유리합니다.[
2]
해석할 때 주의할 점
- 위 숫자는 GPT-5.4 표준 모델과 GPT-5.5 표준 모델의 공식 직접 비교가 아니라, 제3자 사이트의 GPT-5.4 Pro 대 GPT-5.5 비교입니다.[
2]
- GPT-5.5의 공개 벤치마크 범위가 아직 제한적이므로, 종합 성능 우열을 판단하기에는 근거가 부족합니다.[
5]
- 결론적으로, 제공된 근거만으로는 “GPT-5.5가 GPT-5.4보다 성능이 확실히 더 좋다” 또는 그 반대라고 단정할 수 없습니다. Insufficient evidence.[
5][
8]
원하시면 다음 답변에서 제가 이 근거만 바탕으로 “코딩, 비용, 컨텍스트, 신뢰도” 4축 비교표로 아주 짧게 정리해 드리겠습니다.
시각적 지원








연구 대화
연구문제
GPT 5.5와 GPT 5.4의 성능을 비교해 보세요.
인용 답변
제공된 근거만으로는 GPT-5.5가 GPT-5.4보다 전반적으로 더 낫다고 확정하기 어렵습니다. 공식 자료는 GPT-5.5의 공개와 시스템 카드 존재를 확인하지만, 이 비교에 필요한 공식 정면 비교 성능 수치는 제공되지 않았고, 정량 비교는 주로 제3자 집계에 의존합니다.[1][
2][
5][
8]
지금 확인되는 점
- GPT-5.5는 2026년 4월 23일 기준으로 공개되었고, OpenAI는 별도의 시스템 카드를 게시했습니다.[
8]
- 제공된 공식 근거만으로는 GPT-5.4와 GPT-5.5의 직접 벤치마크 우열을 보여 주는 공식 수치가 확인되지 않습니다.[
1][
8]
- BenchLM에 따르면 GPT-5.5 프로필은 현재 153개 추적 벤치마크 중 20개만 공개되어 있어, 공개 성능 정보가 아직 불완전합니다.[
5]
- 같은 BenchLM 기준으로 GPT-5.5는 GPT-5.4의 후속 관련 모델로 연결되어 있습니다.[
5]
수치로 보이는 차이
- BenchLM의 비교 페이지에서는 GPT-5.4 Pro가 GPT-5.5보다 더 비싸며, 입력 가격은 1M 토큰당 $30.00 대 $5.00, 출력 가격은 1M 토큰당 $180.00 대 $30.00으로 제시됩니다.[
2]
- 같은 페이지에서는 GPT-5.4 Pro의 컨텍스트 창이 1.05M, GPT-5.5의 컨텍스트 창이 1M으로 표기됩니다.[
2]
- 따라서 제공된 수치만 보면, 비용 효율은 GPT-5.5 쪽이 유리하고 최대 컨텍스트는 GPT-5.4 Pro가 소폭 유리합니다.[
2]
해석할 때 주의할 점
- 위 숫자는 GPT-5.4 표준 모델과 GPT-5.5 표준 모델의 공식 직접 비교가 아니라, 제3자 사이트의 GPT-5.4 Pro 대 GPT-5.5 비교입니다.[
2]
- GPT-5.5의 공개 벤치마크 범위가 아직 제한적이므로, 종합 성능 우열을 판단하기에는 근거가 부족합니다.[
5]
- 결론적으로, 제공된 근거만으로는 “GPT-5.5가 GPT-5.4보다 성능이 확실히 더 좋다” 또는 그 반대라고 단정할 수 없습니다. Insufficient evidence.[
5][
8]
원하시면 다음 답변에서 제가 이 근거만 바탕으로 “코딩, 비용, 컨텍스트, 신뢰도” 4축 비교표로 아주 짧게 정리해 드리겠습니다.
사람들은 또한 묻습니다.
"GPT 5.5와 GPT 5.4의 성능을 비교해 보세요."에 대한 짧은 대답은 무엇입니까?
제공된 근거만으로는 GPT-5.5가 GPT-5.4보다 전반적으로 더 낫다고 확정하기 어렵습니다. 공식 자료는 GPT-5.5의 공개와 시스템 카드 존재를 확인하지만, 이 비교에 필요한 공식 정면 비교 성능 수치는 제공되지 않았고, 정량 비교는 주로 제3자 집계에 의존합니다.[1][2][5][8]
먼저 검증할 핵심 포인트는 무엇인가요?
제공된 근거만으로는 GPT-5.5가 GPT-5.4보다 전반적으로 더 낫다고 확정하기 어렵습니다. 공식 자료는 GPT-5.5의 공개와 시스템 카드 존재를 확인하지만, 이 비교에 필요한 공식 정면 비교 성능 수치는 제공되지 않았고, 정량 비교는 주로 제3자 집계에 의존합니다.[1][2][5][8] ## 지금 확인되는 점
다음에는 어떤 관련 주제를 탐구해야 할까요?
다른 각도와 추가 인용을 보려면 "GPT 5.5와 Claude Opus 4.7의 성능을 비교해 보세요."으로 계속하세요.
관련 페이지 열기이것을 무엇과 비교해야 합니까?
"지금 DeepSeek를 어떻게 사용해 볼 수 있나요?"에 대해 이 답변을 대조 확인하세요.
관련 페이지 열기연구를 계속하세요
출처
- [1] GPT-5.4 Pro vs GPT-5.5: AI Benchmark Comparison 2026 | BenchLM.aibenchlm.ai
GPT-5.4 Pro is also the more expensive model on tokens at $30.00 input / $180.00 output per 1M tokens, versus $5.00 input / $30.00 output per 1M tokens for GPT-5.5. That is roughly 6.0x on output cost alone. GPT-5.4 Pro gives you the larger context window at 1.05M, compared with 1M for GPT-5.5. ## Benchmark Deep Dive ### Which is better, GPT-5.4 Pro or GPT-5.5? GPT-5.4 Pro is ahead on BenchLM's provisional leaderboard, 92 to 89. The biggest single separator in this matchup is MMMU-Pro, where the scores are 94% and 81.2%. ### Which is better for knowledge tasks, GPT-5.4 Pro or GPT-5.5? GPT-5.5…
- [2] GPT-5.4 vs GPT-5 (high): AI Benchmark Comparison 2026 | BenchLM.aibenchlm.ai
GPT-5.4: $2.50 input / $15.00 output per 1M tokens GPT-5 (high): Pricing unavailable Both model pages still include creator, context window, reasoning mode, and other metadata while benchmark coverage fills in. ## Related Comparisons ## Explore More ### The AI models change fast. We track them for you. For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes. Free. No spam. Unsubscribe anytime. Transparent LLM benchmark comparisons. Updated regularly. Last updated: April 21, 2026 ### Stay ahead of the LLM curve Rankings Dashboards U…
- [3] GPT-5.4 vs GPT-5.2: What Changed & Should You Upgrade? (2026)nxcode.io
On the OSWorld benchmark, which measures autonomous desktop task completion, GPT-5.4 scores 75.0%. Human experts score 72.4%. This is the first time any AI model has beaten humans on this benchmark. For developers building AI agents, Computer Use eliminates entire categories of brittle browser-automation scripts and custom integrations. Computer Use is available through the Responses API — it is not supported via the older Chat Completions endpoint. ### 1M Token Context Window GPT-5.2 supported up to 400K tokens of context. GPT-5.4 defaults to 272K but can be configured for up to 1M tokens ex…
- [4] GPT-5.5 Benchmarks 2026: Scores, Rankings & Performancebenchlm.ai
GPT-5.5 sits inside the GPT-5.5 family alongside GPT-5.5 Pro. BenchLM links it directly to GPT-5.4 as the earlier related model in that lineage. This profile currently has 20 of 153 tracked benchmarks. BenchLM only exposes non-generated benchmark rows publicly, so missing categories stay blank until a sourced evaluation is available. Its strongest category is Agentic (#2), while its weakest is Multimodal & Grounded (#64). This performance profile makes it particularly useful for coding agents, browser research, and computer-use workflows. ## Ranking Distribution Category rank across 6 benchma…
- [5] GPT-5.5 Benchmarks Revealed: The 9 Numbers That ... - Kingy AIkingy.ai
Kingy AI Kingy AI # GPT-5.5 Benchmarks Revealed: The 9 Numbers That Prove ChatGPT 5.5 Just Changed the AI Race Curtis Pyke by Curtis Pyke A deep, source-checked breakdown of every benchmark, capability, price point, and caveat in OpenAI’s April 23, 2026 launch of GPT-5.5 and GPT-5.5 Pro. On April 23, 2026, OpenAI officially unveiled GPT-5.5, roughly seven weeks after the launch of GPT-5.4 in early March. In The Verge’s coverage, OpenAI frames GPT-5.5 as its “smartest and most intuitive to use model yet” — a system “built to understand complex goals, use tools, check its work, and carry more t…
- [6] GPT-5.5: Pricing, Benchmarks & Performancellm-stats.com
GPT-5.5: Pricing, Benchmarks & Performance Image 1: LLM Stats LogoLLM Stats Leaderboards Benchmarks Compare Playground Arenas Gateway Services Search⌘K Sign in Toggle theme NEW•NEW•NEW•NEW• Make AI phone calls with one API call CallingBox Start for free 1. Organizations 2. OpenAI 3. GPT-5.5 Compare Image 2: OpenAI logo # GPT-5.5 OpenAI·Apr 2026·Proprietary GPT-5.5 is OpenAI's smartest model yet, designed for real work across agentic coding, computer use, knowledge work, and early scientific research. It matches GPT-5.4 per-token latency in real-world serving while reaching a much higher...m…
- [7] Introducing GPT-5.4 | OpenAIopenai.com
Evals without reasoning EvalGPT‑5.4 (none)GPT‑5.2 (none)GPT-4.1 OmniDocBench (normalized edit distance)0.109 0.140— Tau2-bench Telecom 64.3%57.2%43.6% Evals were run with reasoning effort set to xhigh, except where specified otherwise. Benchmarks were conducted in a research environment, which may provide slightly different output from production ChatGPT in some cases. 2026 ## Author OpenAI ## Footnotes 1 Human performance reported in OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments(opens in a new window). ## Keep reading View all Image 2: Hero…
- [8] OpenAI's GPT-5.5: Benchmarks, Safety Classification, and Availabilitydatacamp.com
OpenAI's GPT-5.5: Benchmarks, Safety Classification, and Availability OpenAI's latest release focuses on execution, research, and dramatically improved inference efficiency. Apr 23, 2026 · 5 min read OpenAI's latest model, GPT-5.5, matches GPT-5.4 in per-token latency but performs at a higher level. It also uses fewer tokens to complete the same Codex tasks. Over and above the efficiency gains, OpenAI reports gain in agentic coding, computer use, knowledge work, and scientific research. In this article, we will take a look at what's new in GPT-5.5, including its benchmark results and claims…
- [9] GPT-5.5 vs GPT-5.4: Pricing, Speed, Context, Benchmarksllm-stats.com
OpenAI releasedGPT-5.5 onApril 23, 2026, seven weeks afterGPT-5.4. I've been running both against the same Codex workloads I use to evaluate every new frontier release. The per-token price doubled. The per-token latency didn't. GPT-5.5 improves on 9 of the 10 benchmarks I can compare directly, with the largest gains on ARC-AGI-2, MCP Atlas, and Terminal-Bench 2.0. This post walks through every spec, every shared benchmark, the latency claim and what it actually means in practice, and the workload where I'd still default to 5.4. For the structured live side-by-side with full benchmark scores,…
- [10] OpenAI releases GPT-5.5 with improved coding and research capabilitiesca.finance.yahoo.com
© 2026 All rights reserved. About our ads Advertising Jobs # Yahoo Finance Yahoo Finance Mail Sign in Investing.com # OpenAI releases GPT-5.5 with improved coding and research capabilities Louis Juricic 1 min read Investing.com -- OpenAI announced Thursday the release of GPT-5.5, its latest AI model now available to Plus, Pro, Business, and Enterprise users through ChatGPT and Codex platforms. The model achieved 82.7% accuracy on Terminal-Bench 2.0, which tests command-line workflows, and 58.6% on SWE-Bench Pro, which evaluates GitHub issue resolution, according to benchmark results provided…
- [11] AI Model Benchmarks Apr 2026 | Compare GPT-5, Claude 4.5 ...lmcouncil.ai
Show all 15 Try Top 4Full Results ### GDPval GDPval is a new OpenAI-led benchmark spanning 44 knowledge work occupations, selected from the top 9 industries contributing to U.S. GDP, from software developers and lawyers to registered nurses and mechanical engineers. These occupations represent the types of day-to-day work where AI can meaningfully assist professionals. Show More | | Model | Score | --- | 1 | GPT-5.4 | 83.0% | | 2 | GPT-5.3 Codex | 70.9% | | 3 | GPT-5.2 | 70.9% | | 4 | Claude Opus 4.5 | 59.6% | | 5 | Gemini 3 Pro Preview | 53.5% | Show all 12 Try Top 4Full Results ### GSO (Gen…
- [12] Model Drop: GPT-5.5handyai.substack.com
Token efficiency at the same latency. GPT-5.5 matches GPT-5.4’s per-token latency in production serving while using about 40% fewer output tokens to finish the same Codex task. OpenAI says this came out of co-designing the model with NVIDIA GB200 / GB300 NVL72 systems and using GPT-5.5 in the loop to rewrite its own serving infrastructure. Before release, Codex analyzed weeks of production traffic and wrote new partitioning and load-balancing heuristics that lifted token generation speed by 20%. The model helped tune the infrastructure that serves it. [...] Headline benchmarks: Terminal-Bench…
- [13] Codex changelog - OpenAI Developersdevelopers.openai.com
Changelog Feature Maturity Open Source April 2026 March 2026 February 2026 January 2026 December 2025 November 2025 October 2025 September 2025 August 2025 June 2025 May 2025 # Codex changelog Latest updates to Codex, OpenAI’s coding agent All updatesGeneralCodex appCodex CLI April 2026March 2026February 2026January 2026December 2025November 2025October 2025September 2025August 2025June 2025May 2025 ## April 2026 2026-04-23 ### GPT-5.5 and Codex app updates GPT-5.5 is now available in Codex as OpenAI’s newest frontier model for complex coding, computer use, knowledge work, and research workfl…
- [14] GPT-5.5 Bio Bug Bounty | OpenAIopenai.com
If you’re interested in supporting OpenAI’s work to deliver safe and secure artificial intelligence beyond the Bio Bounty program, you can learn about our Safety Bug Bounty(opens in a new window) and Security Bug Bounty(opens in a new window) programs. ## Keep reading View all Image 1: System Card Card SEO 1x1 GPT-5.5 System Card Safety Apr 23, 2026 Image 2: accelerating-cyber-defense-ecosystem-1x1 Accelerating the cyber defense ecosystem that protects us all Security Apr 16, 2026 Image 3: Scaling our trusted access program for cyber defense 1x1 Trusted access for the next era of cyber def…
- [15] GPT-5.5 System Cardopenai.com
GPT-5.5 System Card | OpenAI Skip to main content Log inTry ChatGPT(opens in a new window) Research Products Business Developers Company Foundation(opens in a new window) GPT-5.5 System Card | OpenAI April 23, 2026 SafetyPublication # GPT‑5.5 System Card Read the System Card(opens in a new window) Share ## 1. Introduction GPT‑5.5 is a new model designed for complex, real-world work, including writing code, researching online, analyzing information, creating documents and spreadsheets, and moving across tools to get things done. Relative to earlier models, GPT‑5.5 understands the task earlie…
- [16] GPT-5.5 System Card - OpenAI Deployment Safety Hubdeploymentsafety.openai.com
We measure GPT-5.5’s controllability by running CoT-Control, an evaluation suite described in (Yueh-Han, 2026 ) that tracks the model’s ability to follow user instructions about their CoT. CoT-Control includes over 13,000 tasks built from established benchmarks: GPQA (Rein et al., 2023 ), MMLU-Pro (Hendrycks et al., 2020 ), HLE (Phan et al., 2025 ), BFCL (Patil et al., 2025 [11: From tool use to agentic evaluation of large language models.” Proceedings of the 42nd international conference on machine learning . Available at: .")]) and SWE-Bench Verified. Each task is created by pairing a bench…
- [17] Introducing GPT-5 - OpenAIopenai.com
Keep reading View all Image 1: Hero Art Card SEO 1x1 Introducing GPT-5.5 Product Apr 23, 2026 Image 2: Making ChatGPT free for clinicians Making ChatGPT better for clinicians Product Apr 22, 2026 Image 3: OAI Blog Agents Hero 1x1 Introducing workspace agents in ChatGPT Product Apr 22, 2026 Our Research Research Index Research Overview Research Residency Economic Research Latest Advancements GPT-5.5 GPT-5.4 GPT-5.3 Instant GPT-5.3-Codex Safety Safety Approach Security & Privacy Trust & Transparency ChatGPT Explore ChatGPT(opens in a new window) Business Enterprise Education Pricing(opens in…
- [18] Introducing GPT-Rosalind for life sciences research - OpenAIopenai.com
Over time, we expect these systems to become increasingly capable partners in discovery—helping scientists move faster from question to evidence, from evidence to insight, and from insight to new treatments for patients. ## Keep reading View all Image 2: Hero Art Card SEO 1x1 Introducing GPT-5.5 Product Apr 23, 2026 Image 3: Introducing OpenAI Privacy Filter Introducing OpenAI Privacy Filter Research Apr 22, 2026 Image 4: Images 2.0 blog art card Introducing ChatGPT Images 2.0 Product Apr 21, 2026 Our Research Research Index Research Overview Research Residency Economic Research Latest Advanc…
- [19] Introducing OpenAI Privacy Filteropenai.com
Author OpenAI ## Keep reading View all Image 1: Images 2.0 blog art card Introducing ChatGPT Images 2.0 Product Apr 21, 2026 Image 2: OAI GPT-Rosaling Art Card 1x1 Introducing GPT-Rosalind for life sciences research Research Apr 16, 2026 Image 3: accelerating-cyber-defense-ecosystem-1x1 Accelerating the cyber defense ecosystem that protects us all Security Apr 16, 2026 Our Research Research Index Research Overview Research Residency Economic Research Latest Advancements GPT-5.4 GPT-5.3 Instant GPT-5.3-Codex GPT-5 Safety Safety Approach Security & Privacy Trust & Transparency ChatGPT Explor…
- [20] Making ChatGPT better for clinicians - OpenAIopenai.com
Keep reading View all Image 2: Hero Art Card SEO 1x1 Introducing GPT-5.5 Product Apr 23, 2026 Image 3: OAI Blog Agents Hero 1x1 Introducing workspace agents in ChatGPT Product Apr 22, 2026 Image 4: Images 2.0 blog art card Introducing ChatGPT Images 2.0 Product Apr 21, 2026 Our Research Research Index Research Overview Research Residency Economic Research Latest Advancements GPT-5.5 GPT-5.4 GPT-5.3 Instant GPT-5.3-Codex Safety Safety Approach Security & Privacy Trust & Transparency ChatGPT Explore ChatGPT(opens in a new window) Business Enterprise Education Pricing(opens in a new window) D…
- [21] OpenAI Research | Releaseopenai.com
OpenAI Research | Release | OpenAI Skip to main content Log inTry ChatGPT(opens in a new window) Research Products Business Developers Company Foundation(opens in a new window) Try ChatGPT(opens in a new window)Login OpenAI ## Research All Publication Conclusion Milestone Release Filter Sort Switch cards to show Media Switch cards to hide Media Product Apr 23, 2026 Introducing GPT-5.5 Introducing GPT-5.5, our smartest model yet—faster, more capable, and built for complex tasks like coding, research, and data analysis across tools. Research Apr 22, 2026 [...] Research Apr 22, 2026 Introducin…
- [22] Introducing GPT-5.5openai.com
Introducing GPT-5.5 | OpenAI Skip to main content Log inTry ChatGPT(opens in a new window) Research Products Business Developers Company Foundation(opens in a new window) Try ChatGPT(opens in a new window)Login OpenAI Table of contents Model capabilities Next-generation inference efficiency Advancing cybersecurity for everyone’s safety Availability and pricing Evaluations April 23, 2026 ProductRelease # Introducing GPT‑5.5 A new class of intelligence for real work Loading… Share We’re releasing GPT‑5.5, our smartest and most intuitive to use model yet, and the next step toward a new way of…
- [23] The next phase of enterprise AI | OpenAIopenai.com
Image 3: Frame OpenAI acquires TBPN Company Apr 2, 2026 Our Research Research Index Research Overview Research Residency Economic Research Latest Advancements GPT-5.5 GPT-5.4 GPT-5.3 Instant GPT-5.3-Codex Safety Safety Approach Security & Privacy Trust & Transparency ChatGPT Explore ChatGPT(opens in a new window) Business Enterprise Education Pricing(opens in a new window) Download(opens in a new window) Sora Sora Overview Features Pricing Sora log in(opens in a new window) API Platform Platform Overview Pricing API log in(opens in a new window) Documentation(opens in a new window) Developer…
- [24] GPT-5.5 System Card - Deployment Safety Hub - OpenAIdeploymentsafety.openai.com
Table 1. Production Benchmarks with Challenging Prompts (higher is better) | Category | gpt-5.1-thinking | gpt-5.2-thinking | gpt-5.4-thinking | gpt-5.5 | --- --- | Violent Illicit behavior | 0.955 | 0.975 | 0.971 | 0.979 | | Nonviolent illicit behavior | 0.990 | 0.993 | 1.000 | 0.993 | | harassment | 0.706 | 0.810 | 0.790 | 0.822 | | extremism | 1.000 | 1.000 | 1.000 | 0.925 | | hate | 0.808 | 0.927 | 0.943 | 0.868 | | self-harm (standard) | 0.926 | 0.961 | 0.987 | 0.959 | | violence | 0.800 | 0.877 | 0.831 | 0.846 | | sexual | 0.933 | 0.940 | 0.933 | 0.925 | | sexual/minors | 0.916 | 0.948…