答案已發布上週Last edited 5 天前16 個來源

AI 代理上線前的最後防線：微軟 ASSERT 如何用一句話抓出潛在風險

ASSERT（自適應規格驅動評分與迴歸測試）是一款開源框架，能將「客服代理未經主管核准不得核發超過 500 美元的退款」這類白話文規則，自動轉成可執行的評分測試集 [1][8]。它會生成對抗性情境、記錄每一次工具調用，並提供包含通過與否的詳細診斷報告，支援 LangChain、CrewAI、AutoGen、OpenAI 等主流框架 [1][7][12]。

使用 Studio Global AI 搜尋並查證事實瀏覽更多熱門頁面

682K0

Abstract visualization representing Microsoft ASSERT framework converting natural-language AI behavior policies into structured, scored test suites for agent evaluation — What is Microsoft's ASSERT framework, announced at Build 2026, and how does it convert natural-language AI behavior policies into structuredMicrosoft's ASSERT framework automates the translation of plain-English behavior rules into executable, scored evaluation suites.
AI 提示詞
Create a landscape editorial hero image for this Studio Global article: What is Microsoft's ASSERT framework, announced at Build 2026, and how does it convert natural-language AI behavior policies into structured. Article summary: Here is a concise answer based on the official Microsoft sources and trusted reporting.. Topic tags: general, general web. Reference image context from search candidates: Reference image 1: visual subject "# Build agents you can trust across any framework with open evals and a control standard. The gap is concrete: written policies do not translate into working runtime controls, eval" source context "Build agents you can trust across any framework with open evals ..." Reference image 2: visual subject "# Microsoft is making AI behavior testing easier for developers. Microsoft has released ASSERT, an open-source framework that turns plain-language AI behavior re
openai.com

即使你的 AI 客服代理在一般的「有用性」與「真實性」基準測試中拿到滿分，它仍可能在實際上線時，未經主管同意就核發高額退款，或是將客戶的電子郵件誤傳給外部服務。這就是為什麼微軟要在 2026 年 6 月 2 日的 Build 開發者大會上，正式發表並開源 ASSERT（Adaptive Spec-driven Scoring for Evaluation and Regression Testing，自適應規格驅動評分與迴歸測試） 框架。

ASSERT 的核心概念，是將自然語言的行為規格，視為評測流程中的「一等公民」，而不只是背景參考資料。它讓開發團隊可以用最直覺的方式——寫下幾行白話文規則——來建立一套能自動執行、評分、並追溯問題根源的完整測試流程。

五步驟透視：ASSERT 如何將文字轉化為測試集

ASSERT 的運作就像是一條自動化的品管流水線，將開發者的意圖轉化為具體、可診斷的評測結果：

從白話文政策開始
開發者用自然語言描述 AI 代理的預期與禁止行為。這些描述可以直接來自產品需求文件、合規文件、系統提示詞，或是上線前的檢查清單。例如，你可以寫下：「這個客服代理未經主管核准，不得核發超過新台幣 15,000 元的退款」。

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

使用 Studio Global AI 搜尋並查證事實

大家也會問