答案已發布3 個月前Last edited 2 個月前10 個來源

GPT-5.5 基準測試：GDPval 84.9% 到底代表什麼

最適合作為短答的 GPT 5.5 基準測試數字，是 OpenAI 公布的 84.9% GDPval；GDPval 評估 AI 代理在 44 種職業中完成規格明確知識工作的能力。[1] 73.1% Expert SWE 與 80.5% BixBench 測的是不同任務領域，前者偏向程式開發，後者偏向生物資訊學，不能直接拿來和 GDPval 比高低。[8][10] 若要看第三方綜合比較，Artificial Analysis 指出 GPT 5.5 在其 Intelligence Index 領先 3 分居首，但這不代表它贏下每一個單項測試。[3]

使用 Studio Global AI 搜尋並查證事實瀏覽更多熱門頁面

Abstrakte KI-Illustration zu GPT-5.5-Benchmarks und dem GDPval-Wert von 84,9 Prozent — GPT-5.5-Benchmark erklärt: Was 84,9 % auf GDPval wirklich bedeutenKI-generierte Illustration zum Vergleich von GPT-5.5-Benchmarks.
AI 提示詞
Create a landscape editorial hero image for this Studio Global article: GPT-5.5-Benchmark erklärt: Was 84,9 % auf GDPval wirklich bedeuten. Article summary: Für eine knappe, belastbare Einordnung ist 84,9 % auf GDPval der beste GPT 5.5 Wert: OpenAI nennt ihn selbst und beschreibt GDPval als Test für klar spezifizierte Wissensarbeit über 44 Berufe.. Topic tags: ai, openai, chatgpt, gpt 5, benchmarks. Reference image context from search candidates: Reference image 1: visual subject "![Image 1](https://cdn.sanity.io/images/6vfeftx9/articles/9052d745e6337cd4369bde9219bcf511bebec944-4644x1551.png?w=1200&auto=format) GPT-5.5 tops the Artificial Analysis Intelligen" source context "OpenAI's GPT-5.5 is the new leading AI model - Artificial Analysis" Reference image 2: visual subject "![Image 1](https://cdn.sanity.io/images/6vfeftx9/articles/9052d745e6337cd4369bde9219bcf511bebec944-4644x1551.png?
openai.com

問 GPT-5.5 的 benchmark 是多少，最容易出錯的地方不在數字，而在把不同評測的分數放在同一條尺上比較。若只需要一句短答，目前最穩妥的說法是：GPT-5.5 在 OpenAI 公布的 GDPval 上取得 84.9%；OpenAI 說 GDPval 測試的是 AI 代理在 44 種職業中產出規格明確知識工作的能力。

換句話說，84.9% 是一個很有參考價值的工作能力指標，但它不是通用智力分數，也不是所有任務的總成績。它主要說明 GPT-5.5 在明確交代需求、需要產出具體工作成果的知識任務上表現如何。

先記住這個數字：GDPval 84.9%

如果只是想快速回答 GPT-5.5 的 benchmark，建議這樣說：

OpenAI 表示，GPT-5.5 在 GDPval 得分為 84.9%；GDPval 評估 AI 代理在 44 種職業中完成規格明確知識工作的能力。

這個說法比單純丟出一個百分比更準確，因為它同時交代了分數、評測名稱與評測範圍。GDPval 不是程式設計測試，也不是生物資訊學測試，更不是第三方綜合排行榜；它的重點在於跨職業的知識工作產出。

主要公開數字一次看懂

評測或比較	公開提到的數值	主要測量什麼	怎麼解讀
GDPval	84.9%	跨 44 種職業、規格明確的知識工作	OpenAI 在 GPT-5.5 發布資訊中直接提到，因此最適合作為一般短答引用。
Expert-SWE	73.1%	程式開發任務；報導稱這是 OpenAI 針對估計需 20 小時完成任務的內部評測	對軟體開發情境更有參考價值，但不能直接和 GDPval 比百分比高低。

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

使用 Studio Global AI 搜尋並查證事實

大家也會問