答案已發布上週Last edited 5 天前16 來源

用「人話」寫規矩，AI 自動同你測試：微軟 ASSERT 點樣捉出 Agent 嘅低級錯誤

ASSERT（Adaptive Spec driven Scoring for Evaluation and Regression Testing）係一個開源框架，可以將用「人話」寫嘅行為規矩，自動變成可執行同有計分嘅測試套件，專門捉 AI Agent 嘅政策違規同行為甩漏 [1][8]。佢會自動生成「攻擊性」測試場景、記錄每一步工具呼叫，仲會提供「合格/唔合格」嘅計分診斷報告。支援 LangChain、CrewAI、AutoGen、OpenAI 等主流框架，唔會鎖死喺微軟嘅平台 [1][7][12]。

使用 Studio Global AI 搜尋並查核事實瀏覽更多熱門頁面

682K0

Abstract visualization representing Microsoft ASSERT framework converting natural-language AI behavior policies into structured, scored test suites for agent evaluation — What is Microsoft's ASSERT framework, announced at Build 2026, and how does it convert natural-language AI behavior policies into structuredMicrosoft's ASSERT framework automates the translation of plain-English behavior rules into executable, scored evaluation suites.
AI 提示
Create a landscape editorial hero image for this Studio Global article: What is Microsoft's ASSERT framework, announced at Build 2026, and how does it convert natural-language AI behavior policies into structured. Article summary: Here is a concise answer based on the official Microsoft sources and trusted reporting.. Topic tags: general, general web. Reference image context from search candidates: Reference image 1: visual subject "# Build agents you can trust across any framework with open evals and a control standard. The gap is concrete: written policies do not translate into working runtime controls, eval" source context "Build agents you can trust across any framework with open evals ..." Reference image 2: visual subject "# Microsoft is making AI behavior testing easier for developers. Microsoft has released ASSERT, an open-source framework that turns plain-language AI behavior re
openai.com

微軟喺 2026 年 6 月 2 日嘅 Build 開發者大會上宣布咗一個叫 ASSERT（全名 Adaptive Spec-driven Scoring for Evaluation and Regression Testing，即係「自適應規格驅動計分」）嘅開源框架，仲放咗上 GitHub，屬於「負責任 AI」計劃嘅一部分。呢個框架針對嘅，係 AI Agent 開發入面一個越嚟越棘手嘅問題：喺個 AI Agent 未正式「見客」之前，點樣確保佢真係會守你套 App 嘅規矩同安全底線。傳統嘅 AI 基準測試——好似係量度「有冇禮貌」、「有冇毒性」或者「一般準確度」嗰啲——好多時都捉唔到一啲關鍵嘅失敗位，例如個客服 AI 亂咁批退款，又或者將機密資料 send 咗畀唔應該收嘅人。

ASSERT 就係嚟補呢個窿，佢將用自然語言寫嘅行為規範，當成評估嘅「第一手」輸入資料，而唔係淨係背景參考。

ASSERT 點樣將「人話」變成測試套件

ASSERT 嘅工作原理，可以分做五步曲，等開發者嘅意圖變成有得分、有得追蹤嘅評估：

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

使用 Studio Global AI 搜尋並查核事實

人們還問