So sánh sức mạnh của GPT 5.5 và GPT 5.4
Nếu so về năng lực tổng thể, GPT 5.5 hiện nhỉnh hơn GPT 5.4, đặc biệt ở lập trình, nghiên cứu chuyên sâu và các tác vụ agent dùng công cụ. Tuy vậy, đây là một bước tăng tiến chứ không phải thắng tuyệt đối trên mọi thư...
Nếu so về năng lực tổng thể, GPT 5.5 hiện nhỉnh hơn GPT 5.4, đặc biệt ở lập trình, nghiên cứu chuyên sâu và các tác vụ agent dùng công cụ. Tuy vậy, đây là một bước tăng tiến chứ không phải thắng tuyệt đối trên mọi thước đo; trong một chỉ số sức khỏe y khoa, GPT 5.5 còn thấp hơn GPT 5.4 một chút. [2][3][10] GPT 5.4 được
Bài học chính
- Nếu so về năng lực tổng thể, GPT-5.5 hiện nhỉnh hơn GPT-5.4, đặc biệt ở lập trình, nghiên cứu chuyên sâu và các tác vụ agent dùng công cụ. Tuy vậy, đây là một bước tăng tiến chứ không phải thắng tuyệt đối trên mọi thước đo; trong một chỉ số sức khỏe y khoa, GPT-5.5 còn thấp hơn G
- GPT-5.4 được OpenAI mô tả là mạnh ở suy luận nhiều bước, tổng hợp thông tin giàu bằng chứng, độ ổn định trên ngữ cảnh dài, và phù hợp cho trợ lý/agent chạy trong môi trường sản xuất. [1][13]
Câu trả lời nghiên cứu
Nếu so về năng lực tổng thể, GPT-5.5 hiện nhỉnh hơn GPT-5.4, đặc biệt ở lập trình, nghiên cứu chuyên sâu và các tác vụ agent dùng công cụ. Tuy vậy, đây là một bước tăng tiến chứ không phải thắng tuyệt đối trên mọi thước đo; trong một chỉ số sức khỏe y khoa, GPT-5.5 còn thấp hơn GPT-5.4 một chút. [2][
3][
10]
-
GPT-5.4 được OpenAI mô tả là mạnh ở suy luận nhiều bước, tổng hợp thông tin giàu bằng chứng, độ ổn định trên ngữ cảnh dài, và phù hợp cho trợ lý/agent chạy trong môi trường sản xuất. [
1][
13]
-
Ở lần ra mắt GPT-5.4, OpenAI nói model này gộp các tiến bộ về reasoning, coding và agentic workflows, đồng thời cải thiện cách làm việc với công cụ, môi trường phần mềm và các tác vụ nghề nghiệp như bảng tính. [
13]
-
Với GPT-5.5, các nguồn công khai mô tả đây là model “mạnh hơn” và tốt hơn cho viết code cũng như các công việc văn phòng phức tạp. [
2]
-
Một nguồn tóm tắt sản phẩm cho biết GPT-5.5 là model đa dụng nhưng đặc biệt hữu ích cho toán, khoa học, nghiên cứu và coding; đồng thời có khả năng agentic và đạt điểm cao hơn GPT-5.4 ở các benchmark đo khả năng dùng app/công cụ. [
3]
-
So sánh định lượng rõ nhất tôi thấy từ tài liệu an toàn của OpenAI là trên HealthBench: GPT-5.5 đạt 56.5, cao hơn GPT-5.4 là 2.5 điểm; HealthBench Hard cao hơn 2.4 điểm; HealthBench Professional cao hơn 3.7 điểm; nhưng HealthBench Consensus lại thấp hơn 0.7 điểm. [
10]
-
Vì vậy, nếu bạn ưu tiên “sức mạnh” theo nghĩa giải quyết bài khó, viết code, nghiên cứu, và thao tác qua công cụ, GPT-5.5 có vẻ là lựa chọn mạnh hơn. [
2][
3][
10]
-
Nếu bạn muốn kết luận chặt hơn theo từng tiêu chí như tiếng Việt, coding thực chiến, tốc độ, giá, hay độ ảo giác, thì hiện tôi chưa có đủ bằng chứng công khai để xếp hạng toàn diện mọi mặt. Insufficient evidence. [
10][
13]
Nếu muốn, tôi có thể làm tiếp một bảng so sánh rất thực dụng giữa GPT-5.5 và GPT-5.4 theo 5 tiêu chí: code, viết nội dung, phân tích dữ liệu, dùng tool, và độ ổn định.
Hỗ trợ hình ảnh


















Trao đổi nghiên cứu
Câu hỏi nghiên cứu
So sánh sức mạnh của GPT 5.5 và GPT 5.4
Câu trả lời được trích dẫn
Nếu so về năng lực tổng thể, GPT-5.5 hiện nhỉnh hơn GPT-5.4, đặc biệt ở lập trình, nghiên cứu chuyên sâu và các tác vụ agent dùng công cụ. Tuy vậy, đây là một bước tăng tiến chứ không phải thắng tuyệt đối trên mọi thước đo; trong một chỉ số sức khỏe y khoa, GPT-5.5 còn thấp hơn GPT-5.4 một chút. [2][
3][
10]
-
GPT-5.4 được OpenAI mô tả là mạnh ở suy luận nhiều bước, tổng hợp thông tin giàu bằng chứng, độ ổn định trên ngữ cảnh dài, và phù hợp cho trợ lý/agent chạy trong môi trường sản xuất. [
1][
13]
-
Ở lần ra mắt GPT-5.4, OpenAI nói model này gộp các tiến bộ về reasoning, coding và agentic workflows, đồng thời cải thiện cách làm việc với công cụ, môi trường phần mềm và các tác vụ nghề nghiệp như bảng tính. [
13]
-
Với GPT-5.5, các nguồn công khai mô tả đây là model “mạnh hơn” và tốt hơn cho viết code cũng như các công việc văn phòng phức tạp. [
2]
-
Một nguồn tóm tắt sản phẩm cho biết GPT-5.5 là model đa dụng nhưng đặc biệt hữu ích cho toán, khoa học, nghiên cứu và coding; đồng thời có khả năng agentic và đạt điểm cao hơn GPT-5.4 ở các benchmark đo khả năng dùng app/công cụ. [
3]
-
So sánh định lượng rõ nhất tôi thấy từ tài liệu an toàn của OpenAI là trên HealthBench: GPT-5.5 đạt 56.5, cao hơn GPT-5.4 là 2.5 điểm; HealthBench Hard cao hơn 2.4 điểm; HealthBench Professional cao hơn 3.7 điểm; nhưng HealthBench Consensus lại thấp hơn 0.7 điểm. [
10]
-
Vì vậy, nếu bạn ưu tiên “sức mạnh” theo nghĩa giải quyết bài khó, viết code, nghiên cứu, và thao tác qua công cụ, GPT-5.5 có vẻ là lựa chọn mạnh hơn. [
2][
3][
10]
-
Nếu bạn muốn kết luận chặt hơn theo từng tiêu chí như tiếng Việt, coding thực chiến, tốc độ, giá, hay độ ảo giác, thì hiện tôi chưa có đủ bằng chứng công khai để xếp hạng toàn diện mọi mặt. Insufficient evidence. [
10][
13]
Nếu muốn, tôi có thể làm tiếp một bảng so sánh rất thực dụng giữa GPT-5.5 và GPT-5.4 theo 5 tiêu chí: code, viết nội dung, phân tích dữ liệu, dùng tool, và độ ổn định.
Người ta cũng hỏi
Câu trả lời ngắn gọn cho "So sánh sức mạnh của GPT 5.5 và GPT 5.4" là gì?
Nếu so về năng lực tổng thể, GPT-5.5 hiện nhỉnh hơn GPT-5.4, đặc biệt ở lập trình, nghiên cứu chuyên sâu và các tác vụ agent dùng công cụ. Tuy vậy, đây là một bước tăng tiến chứ không phải thắng tuyệt đối trên mọi thước đo; trong một chỉ số sức khỏe y khoa, GPT-5.5 còn thấp hơn G
Những điểm chính cần xác nhận đầu tiên là gì?
Nếu so về năng lực tổng thể, GPT-5.5 hiện nhỉnh hơn GPT-5.4, đặc biệt ở lập trình, nghiên cứu chuyên sâu và các tác vụ agent dùng công cụ. Tuy vậy, đây là một bước tăng tiến chứ không phải thắng tuyệt đối trên mọi thước đo; trong một chỉ số sức khỏe y khoa, GPT-5.5 còn thấp hơn G GPT-5.4 được OpenAI mô tả là mạnh ở suy luận nhiều bước, tổng hợp thông tin giàu bằng chứng, độ ổn định trên ngữ cảnh dài, và phù hợp cho trợ lý/agent chạy trong môi trường sản xuất. [1][13]
Tôi nên khám phá chủ đề liên quan nào tiếp theo?
Tiếp tục với "Tìm kiếm và kiểm chứng thông tin: Làm sao triển khai hoặc tích hợp Kimi K2.6 vào app / production workflow?" để có góc nhìn khác và trích dẫn bổ sung.
Mở trang liên quanTôi nên so sánh điều này với cái gì?
Kiểm tra chéo câu trả lời này với "Show me top 5 trending search question Vietnamese users often ask about Kimi K2.6 now. Show me both Vietnamese language & English version wi".
Mở trang liên quanTiếp tục nghiên cứu của bạn
Nguồn
- [1] OpenAI Unveils Its New, More Powerful GPT-5.5 Modelnytimes.com
Image 1: A smartphone is opened to the ChatGPT chatbot. OpenAI said its new technology was better at writing computer code and tasks related to other office work.Credit...Gabby Jones/Bloomberg Image 2: Cade Metz By Cade Metz Reporting from San Francisco April 23, 2026 See more of our coverage in your search results.Encuentra más de nuestra cobertura en los resultados de búsqueda.Add The New York Times on Google Agrega The New York Times en Google The artificial intelligence company Anthropic said this month that it would share its latest A.I. technology with only a small number of partners be…
- [2] ChatGPT 5.5 Is All About Math, Science and AI Researchcnet.com
This is a general model, so anyone can use it. But it's likely going to be the most useful for people doing research or other intensive tasks, like coding. It has agentic capabilities, which means it can independently complete tasks. It scored higher than GPT-5.4 on benchmarks that measure a model's ability to use apps across your computer and solve math problems. OpenAI and other tech companies are trying to build models that act like true digital assistants, managing notifications and overseeing projects across your entire computer, not just one program. GPT 5.5 is the company's biggest swi…
- [3] ChatGPT Models Explained: Complete Comparison Guide (2026)ai-toolbox.co
What is the latest ChatGPT model in 2026? GPT-5.4 Pro is OpenAI's highest-capability ChatGPT model, followed by GPT-5.4 Thinking (paid tiers) and GPT-5.3 Instant (default for everyone, including free). GPT-5.4 launched on March 5, 2026 with Thinking and Pro variants, and GPT-5.4 mini/nano followed on March 17, 2026. Older models (GPT-4o, GPT-4.1, o4-mini, and the original GPT-5) were retired on February 13, 2026. ### What's the difference between GPT-5.3 Instant and GPT-5.4 Thinking? [...] Official OpenAI sources: GPT-5.3 and GPT-5.4 in ChatGPT (OpenAI Help Center, April 2026) Introducing…
- [4] GPT 5.4 to 5.5: what's being said, what's actually known, and why ...buildingcreativemachines.com
Building Creative Machines # GPT 5.4 to 5.5: what’s being said, what’s actually known, and why OpenAI still feels pressure to move fast ### GPT-5.4 didn’t just upgrade ChatGPT. It rewired the product around tools, long context, and real deliverables at scale today globally. Gonçalo Perdigão Mar 25, 2026 ## GPT-5.4: what’s being shipped, not what’s being said When I wrote the 5.3 piece, the central argument was simple: ignore vibes, separate signals, and assume the next point-release would land where the competitive pressure is hottest—tools, reliability, long context, and “agent-ish” behavi…
- [5] GPT-5.4 vs GPT-5.2: What Changed & Should You Upgrade? (2026)nxcode.io
1M context), and step-by-step migration guide with code examples. 2026-02-18 Read more →Image 5: Gemini 3.1 Pro vs GPT-5.4: Which AI Model Should You Choose? (2026) ### Gemini 3.1 Pro vs GPT-5.4: Which AI Model Should You Choose? (2026) Gemini 3.1 Pro vs GPT-5.4 head-to-head comparison: benchmarks, pricing, coding ability, context window, and which model wins for different use cases in 2026. 2026-03-20 Read more →Image 6: GPT-5.4 vs Claude Opus 4.6 for Coding: Which AI Model Should Developers Choose? (2026) ### GPT-5.4 vs Claude Opus 4.6 for Coding: Which AI Model Should Developers Choose? (2…
- [6] Introducing GPT-5.4 | OpenAIopenai.com
Evals without reasoning EvalGPT‑5.4 (none)GPT‑5.2 (none)GPT-4.1 OmniDocBench (normalized edit distance)0.109 0.140— Tau2-bench Telecom 64.3%57.2%43.6% Evals were run with reasoning effort set to xhigh, except where specified otherwise. Benchmarks were conducted in a research environment, which may provide slightly different output from production ChatGPT in some cases. 2026 ## Author OpenAI ## Footnotes 1 Human performance reported in OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments(opens in a new window). ## Keep reading View all Image 2: Hero…
- [7] OpenAI announces GPT-5.5, its latest artificial intelligence modelcnbc.com
Watchlist SIGN IN Create free account Markets Business Investing Tech Politics Video Watchlist Investing Club Image 4: Join IC PRO Image 5: Join Pro Livestream Menu Tech # OpenAI announces GPT-5.5, its latest artificial intelligence model Published Thu, Apr 23 2026 2:06 PM EDT Updated 2 Hours Ago Image 6: thumbnail Ashley Capoot@/in/ashley-capoot/ WATCH LIVE Key Points OpenAI announced GPT-5.5, its latest AI model that is better at coding, using computers and pursuing deeper research capabilities. The launch comes just weeks after Anthropic unveiled Claude Mythos Preview, its new model with a…
- [8] OpenAI: GPT-5.4 Review | Pricing, Benchmarks & Capabilities (2026) | Design for Onlinedesignforonline.com
Explore Related Models Image 15: OpenAI OpenAI: GPT-5.3-Codex openai 90 Frontier Same providerImage 16: OpenAI OpenAI: GPT-5.2 openai 88 Frontier Same providerImage 17: OpenAI GPT-5.5 (xhigh)OpenAI 87 Frontier Same providerImage 18: OpenAI GPT-5.5 (high)OpenAI 87 Frontier Same provider Compare ModelsSee how OpenAI: GPT-5.4 stacks up against other models side by side.Price CalculatorEstimate your monthly costs across models based on your token usage. Data sourced from OpenRouter API, Artificial Analysis and Hugging Face Open LLM Leaderboard. Scores are editorially curated by our team. Last…
- [9] GPT-5.5 System Card - Deployment Safety Hub - OpenAIdeploymentsafety.openai.com
UK AISI judges that GPT-5.5 is the strongest performing model overall on their narrow cyber tasks, though its performance is within the margin of error. On expert-level narrow cyber tasks, the model was the highest-performing model UK AISI has tested in terms of pass@5, scoring 90.5% ± 12.9%. In comparison, GPT-5.4 scored a pass@5 rate of 71.4% ± 19.8%. GPT-5.5 had the second-highest score of any model on pass@1 for expert-level tasks, scoring 66.7% ± 15.9%, and achieved 100% on lower-difficulty cyber tasks. GPT-5.4 scored 52.4% ± 19.2%. [...] We find that GPT-5.5’s CoT controllability is low…
- [10] OpenAI launches GPT-5.5, calling it "a new class of ...thenewstack.io
The AI model release cycle continues. On Thursday, OpenAI released GPT-5.5 and GPT-5.5 Pro. The new model is, unsurprisingly, its most capable model yet. GPT-5.5 will be available to all paying OpenAI users in ChatGPT and Codex, while GPT-5.5 Pro will be rolling out to Pro, Business, and Enterprise users in ChatGPT only. It will be available in the API soon (but will also be more pricey than before). After Anthropic launched Opus 4.7 a week ago, it was only a matter of time before OpenAI would follow suit — and, at least according to the benchmarks we’ve seen so far, GPT-5.5 and 5.5 Pro beat…
- [11] OpenAI Unveils GPT-5.5. Company Says Expect a Faster Model ...taekim.substack.com
When asked whether the pace of model releases would increase going forward, given that GPT-5.5 came out just over a month after GPT-5.4, OpenAI said yes. “Yes, we expect quite rapid continued progress. We see pretty significant improvements in the short term, extremely significant improvements in the medium term,” OpenAI Chief Scientist Jakub Pachocki said on the call with reporters. “I would definitely expect that we will continue to see the pace of AI capabilities improvement to keep increasing. I would say the last few years have been surprisingly slow.” Key Context by Tae Kim is a reader-…
- [12] OpenAI's GPT-5.5 is the new leading AI model - LinkedInlinkedin.com
Image 21 Apr 18, 2026 ### Claude Opus 4.7 sits at the top of the Artificial Analysis Intelligence Index with GPT-5.4 and Gemini 3.1 Pro Claude Opus 4.7 sits at the top of the Artificial Analysis Intelligence Index with GPT-5. Image 22 32 1 Comment Image 23 Apr 14, 2026 ### Sub-32B open weights models now offer GPT-5 level intelligence Sub-32B open weights models now offer GPT-5 level intelligence with Qwen3.5 27B (Reasoning) matching GPT-5 (medium) at… Image 24Image 25 43 2 Comments Image 26 Mar 31, 2026 ### KwaiKAT has released KAT-Coder-Pro V2 KwaiKAT has released KAT-Coder-Pro V2, a non-re…
- [13] GPT-5.5 is here! Available in Codex and ChatGPT todaycommunity.openai.com
Announcements models You have selected 0 posts. select all cancel selecting 3.7k views 35 likes 2 links 8 users Image 2: polepole2 Image 3: Espresso Bean Image 4: alonso quintanilla Image 5: Mauricio Barros Image 6 Summarize Apr 23 1 / 10 Apr 24 5h ago ## post by vb 8 hours ago Image 7 vb Leader Image 8: potato 3 8h Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Image 9: HGm8jVWbsAAwL60 HGm8jVWbsAAwL60 1…
- [14] GPT-5.5 System Card - OpenAI Deployment Safety Hubdeploymentsafety.openai.com
GPT-5.5 has a length-adjusted HealthBench score of 56.5 (+2.5 relative to GPT-5.4), HealthBench Hard score of 31.5 (+2.4), HealthBench Consensus score of 95.6 (-0.7), and HealthBench Professional score of 51.8 (+3.7). Answer lengths were comparable for HealthBench, Hard, and Consensus. In the case of HealthBench Professional, GPT-5.5 was longer (3893 characters vs 3308 characters), and had a higher unadjusted score and a higher length-adjusted score. Overall, this reflects generally improved HealthBench, HealthBench Hard, and HealthBench Professional performance vs GPT-5.4, with HealthBench C…
- [15] Introducing GPT-5 - OpenAIopenai.com
Keep reading View all Image 1: Hero Art Card SEO 1x1 Introducing GPT-5.5 Product Apr 23, 2026 Image 2: Making ChatGPT free for clinicians Making ChatGPT better for clinicians Product Apr 22, 2026 Image 3: OAI Blog Agents Hero 1x1 Introducing workspace agents in ChatGPT Product Apr 22, 2026 Our Research Research Index Research Overview Research Residency Economic Research Latest Advancements GPT-5.5 GPT-5.4 GPT-5.3 Instant GPT-5.3-Codex Safety Safety Approach Security & Privacy Trust & Transparency ChatGPT Explore ChatGPT(opens in a new window) Business Enterprise Education Pricing(opens in…
- [16] Introducing GPT-5.2 | OpenAIopenai.com
Models were run with maximum available reasoning effort in our API (xhigh for GPT‑5.2 Thinking & Pro, and high for GPT‑5.1 Thinking), except for the professional evals, where GPT‑5.2 Thinking was run with reasoning effort heavy, the maximum available in ChatGPT Pro. Benchmarks were conducted in a research environment, which may provide slightly different output from production ChatGPT in some cases. _ For SWE-Lancer, we omit 40/237 problems that did not run on our infrastructure._ 2025 ## Author OpenAI ## Keep reading View all Image 6: Hero Art Card SEO 1x1 Introducing GPT-5.5 Product Apr 2…
- [17] Introducing GPT-5.4 mini and nano - OpenAIopenai.com
1 The highest reasoning_effort available for GPT‑5 mini is 'high'. 2 Overall Edit Distance. OmniDocBench was run with reasoning_effort set to 'none' to reflect low-cost, low-latency performance. 2026 ## Author OpenAI ## Keep reading View all Image 1: Hero Art Card SEO 1x1 Introducing GPT-5.5 Product Apr 23, 2026 Image 2: Making ChatGPT free for clinicians Making ChatGPT better for clinicians Product Apr 22, 2026 Image 3: OAI Blog Agents Hero 1x1 Introducing workspace agents in ChatGPT Product Apr 22, 2026 Our Research Research Index Research Overview Research Residency Economic Research Lates…
- [18] Introducing GPT-Rosalind for life sciences research - OpenAIopenai.com
Over time, we expect these systems to become increasingly capable partners in discovery—helping scientists move faster from question to evidence, from evidence to insight, and from insight to new treatments for patients. ## Keep reading View all Image 2: Hero Art Card SEO 1x1 Introducing GPT-5.5 Product Apr 23, 2026 Image 3: Introducing OpenAI Privacy Filter Introducing OpenAI Privacy Filter Research Apr 22, 2026 Image 4: Images 2.0 blog art card Introducing ChatGPT Images 2.0 Product Apr 21, 2026 Our Research Research Index Research Overview Research Residency Economic Research Latest Advanc…
- [19] Introducing GPT‑5 for developers | OpenAIopenai.com
Hallucinations | | GPT-5(high) | GPT-5 mini(high) | GPT-5 nano(high) | OpenAI o3(high) | OpenAI o4-mini(high) | GPT-4.1 | GPT-4.1 mini | GPT-4.1 nano | --- --- --- --- | LongFact-Concepts hallucination rate(no tools)[lower is better] | 1.0% | 0.7% | 1.0% | 5.2% | 3.0% | 0.7% | 1.1% | LongFact-Objects hallucination rate(no tools)[lower is better] | 1.2% | 1.3% | 2.8% | 6.8% | 8.9% | 1.1% | 1.8% | FActScore hallucination rate(no tools)[lower is better] | 2.8% | 3.5% | 7.3% | 23.5% | 38.7% | 6.7% | 10.9% 2025 ## Author OpenAI ## Keep reading View all Image 1: Hero Art Card SEO 1x1 Introduc…
- [20] Measuring the performance of our models on real-world tasks | OpenAIopenai.com
Image 4: OAI GPT-Rosaling Art Card 1x1 Introducing GPT-Rosalind for life sciences research Research Apr 16, 2026 Our Research Research Index Research Overview Research Residency Economic Research Latest Advancements GPT-5.5 GPT-5.4 GPT-5.3 Instant GPT-5.3-Codex Safety Safety Approach Security & Privacy Trust & Transparency ChatGPT Explore ChatGPT(opens in a new window) Business Enterprise Education Pricing(opens in a new window) Download(opens in a new window) Sora Sora Overview Features Pricing Sora log in(opens in a new window) API Platform Platform Overview Pricing API log in(opens in a ne…
- [21] OpenAI Research | Releaseopenai.com
OpenAI Research | Release | OpenAI Skip to main content Log inTry ChatGPT(opens in a new window) Research Products Business Developers Company Foundation(opens in a new window) Try ChatGPT(opens in a new window)Login OpenAI ## Research All Publication Conclusion Milestone Release Filter Sort Switch cards to show Media Switch cards to hide Media Product Apr 23, 2026 Introducing GPT-5.5 Introducing GPT-5.5, our smartest model yet—faster, more capable, and built for complex tasks like coding, research, and data analysis across tools. Research Apr 22, 2026 [...] Product Dec 18, 2025 Introducing…
- [22] Introducing GPT-5.5openai.com
GPT‑5.5 reaches state-of-the-art performance across multiple benchmarks that reflect this kind of work. OnGDPval, which tests agents’ abilities to produce well-specified knowledge work across 44 occupations, GPT‑5.5 scores 84.9%. On OSWorld-Verified, which measures whether a model can operate real computer environments on its own, it reaches 78.7%. And on Tau2-bench Telecom, which tests complex customer-service workflows, it reaches 98.0% without prompt tuning. GPT‑5.5 also performs strongly across other knowledge work benchmarks: 60.0% on FinanceAgent, 88.5% on internal investment-banking…
- [23] Prompt guidance for GPT-5.4 | OpenAI APIdevelopers.openai.com
GPT-5.4 is designed for production-grade assistants and agents that need strong multi-step reasoning, evidence-rich synthesis, and reliable performance over long contexts. It is especially effective when prompts clearly specify the output contract, tool-use expectations, and completion criteria. In practice, the biggest gains come from choosing the right reasoning effort for the task, using explicit grounding and citation rules, and giving the model a precise definition of what “done” looks like. This guide focuses on prompt patterns and migration practices that preserve those efficiency wins…
- [24] GPT-5.4 deep dive: pricing, context limits, and tool search explainedcommunity.openai.com
originalhigh## :brain: Reasoning effort levels and what the benchmarks actually reflect :brain: Most benchmark numbers in the announcement were measured atreasoning_effort=xhigh. Performance atnonelooks different — though GPT-5.4 atnonestill outperforms GPT-5.2 on latency-sensitive tasks like τ²-bench Telecom (64.3% vs 57.2%).reasoning_effort=xhighnonenoneFor production workloads, it’s worth benchmarking at the reasoning effort you’ll actually use rather than defaulting to xhigh everywhere. The model is efficient enough at lower effort levels that you may not need t… - [25] GPT-5.4 Pro and Thinking are here! - OpenAI Developer Communitycommunity.openai.com
On SWE-Bench Pro, it outperforms GPT-5.3-Codex, while being lower latency across reasoning efforts. In Codex, /fast mode delivers up to 1.5x faster performance across supported models, including GPT-5.4. GPT-5.4 is designed for complex professional work. On GDPval, it matches or exceeds industry professionals in 83.0% of comparisons, up from 70.9% for GPT-5.2. On our internal spreadsheet modeling benchmark, it scores 87.3% vs 68.4% for GPT-5.2. GPT-5.4 makes tool use more capable and efficient. In the API, tool search lets agents retrieve only the definitions they need, reducing token usage a…
- [26] Introducing GPT-5.4 | OpenAIopenai.com
GPT‑5.4 brings together the best of our recent advances in reasoning, coding, and agentic workflows into a single frontier model. It incorporates the industry-leading coding capabilities of GPT‑5.3‑Codex while improving how the model works across tools, software environments, and professional tasks involving spreadsheets, presentations, and documents. The result is a model that gets complex real work done accurately, effectively, and efficiently—delivering what you asked for with less back and forth. [...] ## Coding GPT‑5.4 combines the coding strengths of GPT‑5.3‑Codex with leading knowledg…
- [27] Introducing GPT-5.3 Instant, GPT-5.4 Thinking, and GPT-5.4 Proacademy.openai.com
GPT-5.4 Thinking is designed for work that’s harder to do in a single pass: multi-step reasoning, long context, tool-heavy workflows, and outputs that need to be accurate and usable. In ChatGPT, it’s the smartest and most efficient model yet for difficult professional tasks—better at delivering what you actually asked for with less back and forth. It also brings stronger coding capabilities into the flagship frontier model, and is built to handle longer, tool-heavy workflows with lower latency. ### Better at long, tool-heavy workflows
- [28] Introducing GPT-5.4 | OpenAIopenai.com
GPT‑5.4 brings together the best of our recent advances in reasoning, coding, and agentic workflows into a single frontier model. It incorporates the industry-leading coding capabilities of GPT‑5.3‑Codex while improving how the model works across tools, software environments, and professional tasks involving spreadsheets, presentations, and documents. The result is a model that gets complex real work done accurately, effectively, and efficiently—delivering what you asked for with less back and forth. [...] ## Coding GPT‑5.4 combines the coding strengths of GPT‑5.3‑Codex with leading knowledg…
- [29] Introducing GPT-5.4 mini and nanoopenai.com
GPT‑5.4 mini significantly improves over GPT‑5 mini across coding, reasoning, multimodal understanding, and tool use, while running more than 2x faster. It also approaches the performance of the larger GPT‑5.4 model on several evaluations, including SWE-Bench Pro and OSWorld-Verified. GPT‑5.4 nano is the smallest, cheapest version of GPT‑5.4 for tasks where speed and cost matter most. It is also a significant upgrade over GPT‑5 nano. We recommend it for classification, data extraction, ranking, and coding subagents that handle simpler supporting tasks. [...] In benchmarks, GPT‑5.4 mini consis…
- [30] GPT-5.4 Thinking System Card - Deployment Safety Hub - OpenAIdeploymentsafety.openai.com
GPT-5.4-reasoning has a mean sabotage score of 0.56 (best-of-10: 0.74)—comparable to GPT-5.2 but below GPT-5.3-codex (0.88). On several hard tasks the model exceeds human baselines, indicating meaningful sabotage capability, though performance remains below the strongest prior Codex checkpoint. Chain-of-thought analysis shows higher rates of evaluation awareness (21.3%) than prior models and far fewer multilingual reasoning anomalies (0.5% of samples vs 29.5% for GPT-5.3-codex). These findings provide evidence that the model can identify and execute relevant technical steps for sabotage in a…