資料搜集及事實核查:GPT5.5 Spud 而家網上流傳嘅 benchmark、demo 同 leak,有幾多可以獨立驗證?
而家最穩陣嘅結論係:網上關於「GPT 5.5 / Spud」嘅內容,真正可以獨立驗證嘅其實好少。較有交叉印證嘅,只係「OpenAI 內部可能有個代號 Spud 嘅下一代模型,而且曾被報道為已完成 pretraining」呢個核心說法;至於 benchmark 分數、3D/demo 截圖、發佈日期、以及一定會叫 GPT 5.5 呢類講法,現階段大多未見官方文件、可重現測試、或者原始 artifact 去獨立核實。[1][2][3][...
而家最穩陣嘅結論係:網上關於「GPT 5.5 / Spud」嘅內容,真正可以獨立驗證嘅其實好少。較有交叉印證嘅,只係「OpenAI 內部可能有個代號 Spud 嘅下一代模型,而且曾被報道為已完成 pretraining」呢個核心說法;至於 benchmark 分數、3D/demo 截圖、發佈日期、以及一定會叫 GPT 5.5 呢類講法,現階段大多未見官方文件、可重現測試、或者原始 artifact 去獨立核實。[1][2][3][5][6][7] 關鍵發現 OpenAI 目前公開開發者文件仍然將 GPT 5.4 標成「Latest」,我檢視到嘅官方公開頁面未見公開列出任何名為 Spud 嘅模型或 system card。[1] 目前
重點
- 而家最穩陣嘅結論係:網上關於「GPT-5.5 / Spud」嘅內容,真正可以獨立驗證嘅其實好少。較有交叉印證嘅,只係「OpenAI 內部可能有個代號 Spud 嘅下一代模型,而且曾被報道為已完成 pretraining」呢個核心說法;至於 benchmark 分數、3D/demo 截圖、發佈日期、以及一定會叫 GPT-5.5 呢類講法,現階段大多未見官方文件、可重現測試、或者原始 artifact 去獨立核實。[1][2][3][5][6][7]
- 關鍵發現
研究答案
而家最穩陣嘅結論係:網上關於「GPT-5.5 / Spud」嘅內容,真正可以獨立驗證嘅其實好少。較有交叉印證嘅,只係「OpenAI 內部可能有個代號 Spud 嘅下一代模型,而且曾被報道為已完成 pretraining」呢個核心說法;至於 benchmark 分數、3D/demo 截圖、發佈日期、以及一定會叫 GPT-5.5 呢類講法,現階段大多未見官方文件、可重現測試、或者原始 artifact 去獨立核實。[1][
2][
3][
5][
6][
7]
關鍵發現
-
OpenAI 目前公開開發者文件仍然將 GPT-5.4 標成「Latest」,我檢視到嘅官方公開頁面未見公開列出任何名為 Spud 嘅模型或 system card。[
1]
-
目前最似樣、而且有兩層來源鏈條支持嘅核心 leak,只係《The Information》有篇題為〈OpenAI CEO Shifts Responsibilities, Preps ‘Spud’ AI Model〉嘅報道;之後 The Decoder 明確寫到,呢篇報道稱 OpenAI 已完成代號「Spud」模型嘅 pretraining,而 Sam Altman 在內部 memo 形容佢會係一個「very strong」模型。[
2][
3]
-
但呢個都仍然唔係官方公開確認。現有交叉印證,本質上仍然係「媒體引述內部 memo/內部消息」,唔係 OpenAI 自己公開發佈嘅 model listing、blog、system card、API 文檔或 benchmark 報告。[
1][
2][
3]
-
網傳 benchmark 入面,最常見嗰類「Spud 追近/超越某模型」講法,未見到公開、可重現、可對照嘅 Spud 原始成績單。Adam Holter 文中寫明 77.80% 係 Claude Mythos Preview 嘅 SWE-bench Pro 成績、57.70% 係 GPT-5.4,之後再講「expectation is that Spud closes most or all of that gap」;即係話,呢啲數字本身主要係其他模型嘅公開成績加上對 Spud 嘅推斷,唔係 Spud 已被獨立驗證嘅分數。[
6]
-
我亦查到 SWE-bench 公開 leaderboard 本身,但喺我檢視到嘅公開材料入面,未見一個可直接對應「Spud」嘅公開提交、結果頁或者 eval card。就 Spud benchmark 本身而言,Insufficient evidence。[
6][
7]
-
流傳嘅 demo,例如 3D simulation、Monica’s apartment、網站設計、SVG、pelican voxel art、Pokémon-like game 等,Geeky Gadgets 自己都係轉述「According to Universe of AI」同嵌入 YouTube 片,並用「leaked outputs」同「early demonstrations indicate」呢類字眼去描述。呢種寫法屬二三手轉述,唔係原始 demo artifact,亦唔係獨立驗證。[
5]
-
發佈日期、命名同 rollout 講法一樣偏 rumor。Holter 文章標題本身就寫明係「Leaked April 16 Release」,內文亦將 GPT-5.5 vs GPT-6、release window、rollout 順序寫成 leaked / expected / community framing;Geeky Gadgets 亦用「release on the horizon」「official performance metrics remain undisclosed」呢類語氣,反映呢啲都未係公開可核實事實。[
5][
6]
-
OpenAI Developer Community 上面提到「SPUD release」唔可以當官方 confirmation。因為頁面本身明寫個 forum 係 community-run,而且「Not all posts are monitored」;我檢視到嗰個「SPUD release」頁面亦只係一個用戶 feature request。[
4]
證據備註
-
如果你用嚴格標準定義「可獨立驗證」:要有官方文件、原始 artifact、公開 benchmark 提交、或者可重現測試。按呢個標準,我會畀而家網傳內容一個好保守嘅分級:benchmark 幾乎 0 個、demo 幾乎 0 個、leak 只有 1 個核心存在性/進度說法屬「部分交叉印證,但未官方確認」。[
1][
2][
3][
5][
6][
7]
-
如果你用較寬鬆標準,咁可以話「Spud 作為內部代號、而且可能已完成 pretraining」有中等可信度;但任何具體能力、具體分數、發佈日、以及一定叫 GPT-5.5 呢啲,依然偏 rumor,多數未過獨立核查門檻。[
2][
3][
5][
6]
限制 / 不確定性
-
《The Information》全文有付費牆,所以我對核心 leak 嘅判斷,部分依賴其文章標題/元資料同 The Decoder 對其內容嘅轉述。[
2][
3]
-
如果 OpenAI 之後公開 system card、API model listing、release notes、或者第三方 benchmark submission,個結論可以好快改變。[
1][
7]
總結
而家最合理嘅結論係:Spud 「可能真有其事」,但目前能獨立驗證嘅,主要只係有具名媒體報道過呢個內部代號同進度;至於網上最吸睛嗰啲 benchmark、demo、release date 同 GPT-5.5 命名說法,現階段基本上都未夠證據支持。[1][
2][
3][
5][
6]
來源
- [
1] OpenAI Developers, “Latest: GPT-5.4” / API docs: https://developers.openai.com/api/docs/guides/latest-model
- [
2] The Information, “OpenAI CEO Shifts Responsibilities, Preps ‘Spud’ AI Model”: https://www.theinformation.com/articles/openai-ceo-shifts-responsibilities-preps-spud-ai-model
- [
3] The Decoder, “OpenAI CEO Sam Altman reportedly teases a ‘very strong’ model internally that can ‘really accelerate the economy’”: https://the-decoder.com/openai-ceo-sam-altman-reportedly-teases-a-very-strong-model-internally-that-can-really-accelerate-the-economy
- [
4] OpenAI Developer Community, “Please Add an Optional Expression Mode with the SPUD Release”: https://community.openai.com/t/please-add-an-optional-expression-mode-with-the-spud-release/1378058
- [
5] Geeky Gadgets, “Leaked ChatGPT 5.5 Pro Tests Reveal OpenAI’s ‘Spud’ Building Interactive 3D Worlds”: https://www.geeky-gadgets.com/openai-gpt-5-5-pro-leak
- [
6] Adam Holter, “OpenAI Spud: Leaked April 16 Release, Mythos-Level Benchmarks, and What GPT-5.5 or GPT-6 Might Mean”: https://adam.holter.com/openai-spud-leaked-april-16-release-mythos-level-benchmarks-and-what-gpt-5-5-or-gpt-6-might-mean
- [
7] SWE-bench Leaderboards: https://www.swebench.com/
如果你想,我下一步可以幫你整理成一個「可信度矩陣」:將每條網傳 claim 分成「已交叉印證 / 只得單一消息源 / 明顯二手轉述 / 無法驗證」四級,會更易跟進。
支持視覺效果


















研究對話
研究問題
資料搜集及事實核查:GPT5.5 Spud 而家網上流傳嘅 benchmark、demo 同 leak,有幾多可以獨立驗證?
引用的答案
而家最穩陣嘅結論係:網上關於「GPT-5.5 / Spud」嘅內容,真正可以獨立驗證嘅其實好少。較有交叉印證嘅,只係「OpenAI 內部可能有個代號 Spud 嘅下一代模型,而且曾被報道為已完成 pretraining」呢個核心說法;至於 benchmark 分數、3D/demo 截圖、發佈日期、以及一定會叫 GPT-5.5 呢類講法,現階段大多未見官方文件、可重現測試、或者原始 artifact 去獨立核實。[1][
2][
3][
5][
6][
7]
關鍵發現
-
OpenAI 目前公開開發者文件仍然將 GPT-5.4 標成「Latest」,我檢視到嘅官方公開頁面未見公開列出任何名為 Spud 嘅模型或 system card。[
1]
-
目前最似樣、而且有兩層來源鏈條支持嘅核心 leak,只係《The Information》有篇題為〈OpenAI CEO Shifts Responsibilities, Preps ‘Spud’ AI Model〉嘅報道;之後 The Decoder 明確寫到,呢篇報道稱 OpenAI 已完成代號「Spud」模型嘅 pretraining,而 Sam Altman 在內部 memo 形容佢會係一個「very strong」模型。[
2][
3]
-
但呢個都仍然唔係官方公開確認。現有交叉印證,本質上仍然係「媒體引述內部 memo/內部消息」,唔係 OpenAI 自己公開發佈嘅 model listing、blog、system card、API 文檔或 benchmark 報告。[
1][
2][
3]
-
網傳 benchmark 入面,最常見嗰類「Spud 追近/超越某模型」講法,未見到公開、可重現、可對照嘅 Spud 原始成績單。Adam Holter 文中寫明 77.80% 係 Claude Mythos Preview 嘅 SWE-bench Pro 成績、57.70% 係 GPT-5.4,之後再講「expectation is that Spud closes most or all of that gap」;即係話,呢啲數字本身主要係其他模型嘅公開成績加上對 Spud 嘅推斷,唔係 Spud 已被獨立驗證嘅分數。[
6]
-
我亦查到 SWE-bench 公開 leaderboard 本身,但喺我檢視到嘅公開材料入面,未見一個可直接對應「Spud」嘅公開提交、結果頁或者 eval card。就 Spud benchmark 本身而言,Insufficient evidence。[
6][
7]
-
流傳嘅 demo,例如 3D simulation、Monica’s apartment、網站設計、SVG、pelican voxel art、Pokémon-like game 等,Geeky Gadgets 自己都係轉述「According to Universe of AI」同嵌入 YouTube 片,並用「leaked outputs」同「early demonstrations indicate」呢類字眼去描述。呢種寫法屬二三手轉述,唔係原始 demo artifact,亦唔係獨立驗證。[
5]
-
發佈日期、命名同 rollout 講法一樣偏 rumor。Holter 文章標題本身就寫明係「Leaked April 16 Release」,內文亦將 GPT-5.5 vs GPT-6、release window、rollout 順序寫成 leaked / expected / community framing;Geeky Gadgets 亦用「release on the horizon」「official performance metrics remain undisclosed」呢類語氣,反映呢啲都未係公開可核實事實。[
5][
6]
-
OpenAI Developer Community 上面提到「SPUD release」唔可以當官方 confirmation。因為頁面本身明寫個 forum 係 community-run,而且「Not all posts are monitored」;我檢視到嗰個「SPUD release」頁面亦只係一個用戶 feature request。[
4]
證據備註
-
如果你用嚴格標準定義「可獨立驗證」:要有官方文件、原始 artifact、公開 benchmark 提交、或者可重現測試。按呢個標準,我會畀而家網傳內容一個好保守嘅分級:benchmark 幾乎 0 個、demo 幾乎 0 個、leak 只有 1 個核心存在性/進度說法屬「部分交叉印證,但未官方確認」。[
1][
2][
3][
5][
6][
7]
-
如果你用較寬鬆標準,咁可以話「Spud 作為內部代號、而且可能已完成 pretraining」有中等可信度;但任何具體能力、具體分數、發佈日、以及一定叫 GPT-5.5 呢啲,依然偏 rumor,多數未過獨立核查門檻。[
2][
3][
5][
6]
限制 / 不確定性
-
《The Information》全文有付費牆,所以我對核心 leak 嘅判斷,部分依賴其文章標題/元資料同 The Decoder 對其內容嘅轉述。[
2][
3]
-
如果 OpenAI 之後公開 system card、API model listing、release notes、或者第三方 benchmark submission,個結論可以好快改變。[
1][
7]
總結
而家最合理嘅結論係:Spud 「可能真有其事」,但目前能獨立驗證嘅,主要只係有具名媒體報道過呢個內部代號同進度;至於網上最吸睛嗰啲 benchmark、demo、release date 同 GPT-5.5 命名說法,現階段基本上都未夠證據支持。[1][
2][
3][
5][
6]
來源
- [
1] OpenAI Developers, “Latest: GPT-5.4” / API docs: https://developers.openai.com/api/docs/guides/latest-model
- [
2] The Information, “OpenAI CEO Shifts Responsibilities, Preps ‘Spud’ AI Model”: https://www.theinformation.com/articles/openai-ceo-shifts-responsibilities-preps-spud-ai-model
- [
3] The Decoder, “OpenAI CEO Sam Altman reportedly teases a ‘very strong’ model internally that can ‘really accelerate the economy’”: https://the-decoder.com/openai-ceo-sam-altman-reportedly-teases-a-very-strong-model-internally-that-can-really-accelerate-the-economy
- [
4] OpenAI Developer Community, “Please Add an Optional Expression Mode with the SPUD Release”: https://community.openai.com/t/please-add-an-optional-expression-mode-with-the-spud-release/1378058
- [
5] Geeky Gadgets, “Leaked ChatGPT 5.5 Pro Tests Reveal OpenAI’s ‘Spud’ Building Interactive 3D Worlds”: https://www.geeky-gadgets.com/openai-gpt-5-5-pro-leak
- [
6] Adam Holter, “OpenAI Spud: Leaked April 16 Release, Mythos-Level Benchmarks, and What GPT-5.5 or GPT-6 Might Mean”: https://adam.holter.com/openai-spud-leaked-april-16-release-mythos-level-benchmarks-and-what-gpt-5-5-or-gpt-6-might-mean
- [
7] SWE-bench Leaderboards: https://www.swebench.com/
如果你想,我下一步可以幫你整理成一個「可信度矩陣」:將每條網傳 claim 分成「已交叉印證 / 只得單一消息源 / 明顯二手轉述 / 無法驗證」四級,會更易跟進。
人們還問
「資料搜集及事實核查:GPT5.5 Spud 而家網上流傳嘅 benchmark、demo 同 leak,有幾多可以獨立驗證?」的簡短答案是什麼?
而家最穩陣嘅結論係:網上關於「GPT-5.5 / Spud」嘅內容,真正可以獨立驗證嘅其實好少。較有交叉印證嘅,只係「OpenAI 內部可能有個代號 Spud 嘅下一代模型,而且曾被報道為已完成 pretraining」呢個核心說法;至於 benchmark 分數、3D/demo 截圖、發佈日期、以及一定會叫 GPT-5.5 呢類講法,現階段大多未見官方文件、可重現測試、或者原始 artifact 去獨立核實。[1][2][3][5][6][7]
首先要驗證的關鍵點是什麼?
而家最穩陣嘅結論係:網上關於「GPT-5.5 / Spud」嘅內容,真正可以獨立驗證嘅其實好少。較有交叉印證嘅,只係「OpenAI 內部可能有個代號 Spud 嘅下一代模型,而且曾被報道為已完成 pretraining」呢個核心說法;至於 benchmark 分數、3D/demo 截圖、發佈日期、以及一定會叫 GPT-5.5 呢類講法,現階段大多未見官方文件、可重現測試、或者原始 artifact 去獨立核實。[1][2][3][5][6][7] 關鍵發現
接下來我應該探索哪個相關主題?
繼續“資料搜集及事實核查:如果 GPT-5.5 Spud 真係推出,正式公布前有冇任何安全評估、紅隊測試或者對齊證據?”以獲得另一個角度和額外的引用。
開啟相關頁面我應該將其與什麼進行比較?
對照「資料搜集及事實核查:GPT 5.5 Spud 會唔會係全新基座模型,定只係 GPT-5 系列中途更新版?」交叉檢查此答案。
開啟相關頁面繼續你的研究
來源
- [1] GPT-5.5 Spud: Everything About OpenAI Next Frontier Modelpasqualepillitteri.it
GPT-5.5 Spud: Everything About OpenAI Next Frontier Model. ##### GPT-5.5 Spud is OpenAI next frontier model: pretraining complete, Q2 2026 release expected. GPT-5.5, code-named "Spud", is the next frontier model from OpenAI. GPT-5.5 Spud OpenAI next AI model leak 2026. | GPT-5.5 "Spud" | OpenAI | Pretraining complete | April–May 2026 |. OpenAI uses code names during development (like "Orion" for GPT-5). Both are expected for Q2 2026. Claude Mythos was discovered through a data leak on March 26 and described as "the most powerful AI model ever developed" by Anthropic. **Use G…
- [2] GPT-5.5 Review (Spud) 2026: Everything We Know About OpenAI's ...primeaicenter.com
GPT-5.5 Review (Spud) 2026: Everything We Know About OpenAI’s Most Powerful Model Yet. On March 24, 2026, The Information broke a story that reset the entire AI landscape: OpenAI had completed pre-training on a new model internally codenamed “Spud.” CEO Sam Altman told employees it was a “very strong model” that could “really accelerate the economy.” OpenAI President Greg Brockman went further, describing it on the Big Technology podcast as the result of “two years worth of research” that would set a new benchmark for AI models — coining the evocative phrase “big model smell” to cap…
- [3] OpenAI Spud: Leaked April 16 Release, Mythos-Level Benchmarks, and What GPT-5.5 or GPT-6 Might Mean - Adam Holteradam.holter.com
OpenAI Spud: Leaked April 16 Release, Mythos-Level Benchmarks, and What GPT-5.5 or GPT-6 Might Mean. Claude Mythos Preview posted 77.80% on SWE-bench Pro. GPT-5.4 is at 57.70%. OpenAI has been signaling that Spud, their next model, closes that gap. Before Mythos, the community knew Spud was coming but had no ceiling to measure it against. ## The Benchmark Gap. The SWE-bench Pro leaderboard tells the story of where things stood before Spud. The expectation is that Spud closes most or all of that gap. The expectation from multiple sources is that Spud lands close to Mythos territory, which wo…
- [4] OpenAI Spud Leak: Everything We Know About ChatGPT 5.5 Progeeky-gadgets.com
According to Universe of AI, Spud demonstrates remarkable proficiency in areas such as 3D simulations and web development, with early tests showcasing its ability to create detailed interactive environments and professional-grade website designs. Dive into this disclosure to explore how Spud’s architecture enables it to handle intricate tasks with efficiency, from generating scalable SVG designs to developing interactive games from basic prompts. ChatGPT 5.5 Pro represents the latest evolution in OpenAI’s renowned GPT series, building on the foundation laid by ChatGPT 5.4 Pro. While official…
- [5] OpenAI's New Model BEATS Claude Opus 4.7! - YouTubeyoutube.com
GPT 5.5 PRO (SPUD) LEAKED: OpenAI's New Model BEATS Claude Opus 4.7!.
.
. 20:56 GPT-6 Spud: New OpenAI Model Just Destroys Claude AI Master 47K views • 12 hours ago Live Playlist ()Mix (50+)46:27 The Mythos Situation | TheStandup The PrimeTime 143K views • 17 hours ago Live Playlist ()Mix (50+)[36:16 The Biggest Mistake in the History of TV Slide…
- [6] GPT-5.5 “Spud” coming soon… • New pre-trained base ...facebook.com
The combination of ASI and advanced robotics could lead to rapid advancements in various fields. Image is a leak by OpenAI employee from Twitter
- [7] Demystifying Spud: OpenAI's Next Frontier Language Model — A ...medium.com
AGI arrival is here — Spud is the internal codename for OpenAI's next major frontier language model, with pre-training completed in late March ... Mar 31, 2026
- [8] GPT-5.5 "SPUD" Leaked — OpenAI's Most Powerful Model Everyoutube.com
OpenAI just finished training their most powerful model ever — codename "Spud." Built on 100000+ H100 GPUs at the new Stargate facility in
- [9] OpenAI Spud Leak Rumors April 16 Release GPT-5.5 GPT-6 | Julio Rodriguez Martino posted on the topic | LinkedInlinkedin.com
OpenAI Spud leak: April 16 release rumored with "Mythos-level" benchmarks. Ships as GPT-5.5 or GPT-6 depending on perf leap.
- [10] GPT 5.5 Pro (SPUD) Leak!youtube.com
... OpenAI's GPT 5.5 'Spud' Leak Explained OpenAI just faced its massive leak yet with the reveal of 'Spud,' a model so powerful it might be
- [11] Improving Creative Usability and Tone Flexibility in GPT-5.3 (with ...community.openai.com
If systems like SPUD are being developed to improve user preference alignment, this type of layered structure could be a natural extension. It ... Mar 31, 2026
- [12] Please Add an Optional Expression Mode with SPUD (For Writers)community.openai.com
It's over-sanitization. Even when writing non-pornographic adult fiction, the model often removes: natural adult phrasing. rough or intense ... Mar 29, 2026
- [13] Please Add an Optional Expression Mode with the SPUD Releasecommunity.openai.com
It's about enabling more natural and flexible creative expression for legitimate writing and storytelling. What this would allow: - Non- ... Mar 28, 2026
- [14] Improving Creative Writing Usability with Optional Tone Flexibilitycommunity.openai.com
If future models such as Spud are being developed, aligning them with improved expressive flexibility would significantly improve their ... Mar 29, 2026
- [15] Improving Creative Writing Control with Layered Interaction Modescommunity.openai.com
With models like Spud in development, introducing this kind of layered system could improve usability without changing safety standards. Apr 1, 2026
- [16] ChatGPT / API Image Gallery, Prompt Tips, and Help: Generative Art ...community.openai.com
Welcome to the Community Image Gallery for ChatGpt Models, prompt tips, and help with creating images that stand out! Apr 9, 2026
- [17] Feature requests - OpenAI Developer Communitycommunity.openai.com
Please Add an Optional Expression Mode with SPUD (For Writers) · chatgpt. 0, 36 ... Please Add an Optional Expression Mode with the SPUD Release. 0, 37, March ...
- [18] Feature requests - OpenAI Developer Communitycommunity.openai.com
ChatGPT Feature requests ; Replace sidebar with something actually usable · chatgpt. 0 ; Please Add an Optional Expression Mode with SPUD (For Writers) · chatgpt.
- [19] Description of a potato - Prompting - OpenAI Developer Communitycommunity.openai.com
I am a round, yellow-skinned spud with many tiny eyes. You can slice me in half and see my white flesh inside of me, like the inside of a ... Jul 21, 2021
- [20] New Bug - Error loading image - #22 by Spudscommunity.openai.com
I have a bug since a few hours ago, it all went well during the first moments that I used dall-e 3 and now it stopped working, ...
- [21] Models | OpenAI APIdevelopers.openai.com
Start with gpt-5.4 for complex reasoning and coding, or choose gpt-5.4-mini and gpt-5.4-nano for lower-latency, lower-cost workloads. View all. Compare models.
- [22] OpenAI's Next Model "Spud" Finished Training - abhs.inabhs.in
As of approximately March 24, 2026, Spud finished its pretraining phase. This is the most computationally expensive part of training a large ... Apr 1, 2026
- [23] OpenAI CEO Shifts Responsibilities, Preps 'Spud' AI Modeltheinformation.com
OpenAI CEO Sam Altman has relinquished direct oversight of the company's safety and security teams so he can focus on raising capital, ... Mar 24, 2026
- [24] What Is the OpenAI 'Spud' Model? Everything We Know About the ...mindstudio.ai
Everything We Know About the Next Frontier Model. OpenAI's 'Spud' model completed pre-training and is expected to accelerate the economy. Mar 29, 2026
- [25] OpenAI Model Codenamed Spud Done Training, Weeks ... - Bluhalobluhalo.com
OpenAI's next frontier model, internally codenamed Spud, has completed pre-training and is reportedly weeks from release. Mar 29, 2026
- [26] OpenAI CEO Sam Altman reportedly teases a "very strong" model ...the-decoder.com
OpenAI has reportedly finished pretraining its new AI model, codenamed "Spud," CEO Sam Altman told employees in an internal memo, according ... Mar 25, 2026
- [27] TheInformation reporting OAI finished pretraining new very strong ...reddit.com
TheInformation reporting OAI finished pretraining new very strong model “Spud”, Altman notes things moving faster than many expected. AI. Mar 24, 2026
- [28] Andrey Kochetkov's Post - LinkedInlinkedin.com
OpenAI Completes Pretraining of GPT-5.5 Model Codenamed 'Spud' and Shuts Down Sora OpenAI finished pretraining its next major model, ... Apr 8, 2026
- [29] OpenAI's new "SPUD" model will change the ENTIRE Economyyoutube.com
Comments · HERMES AGENT SETUP: the OpenClaw killer is here · OpenAI President Greg Brockman: AI Self-Improvement, The Superapp Bet, Path To AGI, ... Mar 26, 2026
- [30] SWE-bench Leaderboardsswebench.com
| - [x] | 🆕 Gemini 3 Flash (high reasoning) | 75.80 | $0.36 | |
| 2026-02-17 | 2.0.0 |. | - [x] | 🆕 GPT-5-2 Codex | 72.80 | $0.45 | |
| 2026-02-19 | [2.0.0](https://github.com/SWE-agent/mini-…
- [31] SWE-bench Verifiedswebench.com
SWE-bench.
SWE-bench. * Leaderboards. * SWE-bench. * SWE-bench Verified. * SWE-bench Multilingual. * SWE-bench Multimodal. * [SWE-bench Lite](https://www.swebench.com/lite.html…
- [32] 93.9% SWE-bench & Every Record Broken (2026) - NxCodenxcode.io
OfficeChai highlighted the SWE-bench Pro result as the most significant number: "A 20-point lead on the hardest coding benchmark means this model handles real engineering complexity at a level nothing else approaches.". [
### Claude Mythos Preview: Anthropic's Most Powerful AI (93.9% SWE-bench) — Why You Can't Use It Anthropic just revealed Claude Mythos Preview — 93.9% SWE-bench,…
- [33] SWE-bench - Vals AIvals.ai
Models. Model Guide NEW. Models. Model Guide NEW. 1
Gemini 3.1 Pro Preview (02/26) 78.80%± 1.83 $0.78 312.26 s. 2
GPT 5.4 78.20%± 1.85 $0.80 307.12 s. [3 ![Image 3: Anthropic]…
- [34] SWE-bench Pro Benchmark 2026: 18 LLM scoresbenchlm.ai
SWE-bench Pro Benchmark 2026: 18 LLM scores | BenchLM.ai. benchlm.ai. Instruction FollowingMultilingualMathKorean BenchmarksKorean LLMsEuropean Models. ModelsProvidersCompareBenchmarks[Alternatives](htt…
- [35] SWE-Bench Pro Leaderboardllm-stats.com
Leaderboards. Benchmarks. Compare. Benchmarks/agents/SWE-Bench Pro. | 3 |
Kimi K2.6New Moonshot AI | 0.586 | 1.0T | 262K | $0.95 / $4.00 | |. | 4 |
GLM-5.1 Zhipu AI | 0.584 | 754B | 200K | $1.40 /…
- [36] Combined results (Claude Mythos / Claude Opus 4.6 / GPT-5.4 ...news.ycombinator.com
reply | | | | | | --- | | coppsilgold 7 days ago | root | parent | prev | next ) Likely an improvement on: > We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. The new model seems significantly better than Opus4.6 judging by the 'Overall results summary' reply | | | | | | --- | | zozbot234 7 days ago | root | parent | prev | next ) Good catch. reply | | | | | | --- | | camdenreslink 7 days ago | root | parent | prev | next ) ChatGPT 5.4 with extra high reasoning has worked really well for me, and I don't notic…
- [37] SWE-Bench Pro Leaderboard (2026): Why 46% Beats 81%morphllm.com
Live SWE-Bench Pro rankings with SEAL scores, agent systems, and Verified. SEAL Leaderboard (Standardized)Agent Systems LeaderboardWarpGrep Impact (Morph Internal)SWE-Bench Verified LeaderboardSWE-Bench Variants ComparisonPro vs Verified[How SWE-Bench Pr…
- [38] Soooo didn’t think we’d ever get to use a model at the level of Mythos… But OpenAI said hold my beer 🍺 We may get a public level model of this caliber available to the public real soon! Like REAL soon 😂threads.com
GPT 5.4 may have a high swebench rating but it sucks ass, coding, overcomplicates everything, won't follow obstructions, etc... I still use GPT-
- [39] SWE-bench Leaderboard 2026: All Model Scores, Rankings & What They Actually Meancodeant.ai
As of April 2026, Claude Mythos Preview leads at 93.9%, followed by GPT-5.3 Codex at 85% and Claude Opus 4.5 at 80.9%. The average score
- [40] Instagraminstagram.com
... Mythos surpassed every benchmark. ∙ SWE-bench Pro: 77.8% vs Opus 4.6's 53.4% ∙ SWE-bench Verified: 93.9% vs 80.8% ∙ CyberGym: 83.1% vs
- [41] Anthropic is obliterating OpenAI Claude Mythos 77.8% on SWE ...x.com
Anthropic is obliterating OpenAI. Claude Mythos 77.8% on SWE-Bench Pro 20% higher than GPT-5.4-xhigh. Lisan al Gaib's Image on X.