답변게시됨지난주Last edited 5일 전16 소스

마이크로소프트의 ASSERT 프레임워크, AI 에이전트의 '말썽'을 출시 전에 잡아내는 비법

ASSERT(Adaptive Spec driven Scoring for Evaluation and Regression Testing)는 평문으로 작성된 행동 규칙을 실행 가능하고 채점까지 해주는 테스트 묶음으로 바꿔주는 오픈소스 프레임워크다. 악의적인 공격 시나리오를 자동 생성하고, AI 에이전트의 모든 도구 호출을 기록하여 어느 단계에서 규칙을 어겼는지 상세한 진단 보고서를 제공한다.

Studio Global AI로 검색 및 팩트체크 인기 페이지 더 보기

682K0

Abstract visualization representing Microsoft ASSERT framework converting natural-language AI behavior policies into structured, scored test suites for agent evaluation — What is Microsoft's ASSERT framework, announced at Build 2026, and how does it convert natural-language AI behavior policies into structuredMicrosoft's ASSERT framework automates the translation of plain-English behavior rules into executable, scored evaluation suites.
AI 프롬프트
Create a landscape editorial hero image for this Studio Global article: What is Microsoft's ASSERT framework, announced at Build 2026, and how does it convert natural-language AI behavior policies into structured. Article summary: Here is a concise answer based on the official Microsoft sources and trusted reporting.. Topic tags: general, general web. Reference image context from search candidates: Reference image 1: visual subject "# Build agents you can trust across any framework with open evals and a control standard. The gap is concrete: written policies do not translate into working runtime controls, eval" source context "Build agents you can trust across any framework with open evals ..." Reference image 2: visual subject "# Microsoft is making AI behavior testing easier for developers. Microsoft has released ASSERT, an open-source framework that turns plain-language AI behavior re
openai.com

마이크로소프트가 2026년 6월 2일, 연례 개발자 컨퍼런스 '빌드(Build) 2026'에서 새로운 오픈소스 프레임워크 ASSERT(Adaptive Spec-driven Scoring for Evaluation and Regression Testing) 를 공개했다 . 이름 그대로 '적응형 사양 기반 평가 및 회귀 테스트'를 의미하는 이 도구는, 최근 폭발적으로 증가하는 'AI 에이전트' 개발의 고질적인 문제를 해결하기 위해 등장했다. 아무리 똑똑한 AI라도 실제 서비스에 투입되기 전에 "우리 회사 규칙을 과연 잘 지킬까?"를 검증하는 일은 여전히 까다로운 숙제였기 때문이다.

기존의 AI 안전성 벤치마크는 주로 '유용성', '유해성', '정확도' 같은 일반 지표만을 측정해 왔다. 하지만 이런 테스트만으로는 "고객 지원 AI가 관리자 승인 없이 50만원 이상 환불해 주면 안 돼", "문서 검색 AI가 외부인에게 이메일 보내면 큰일 나" 같은 실제 비즈니스 규칙 위반을 잡아내기 어렵다 . ASSERT는 바로 이 간극을 메우기 위해, 개발자가 작성한 자연어 규칙(한국어, 영어 등) 자체를 평가의 '1등 시민'으로 취급하며, 이를 곧바로 실행 가능한 테스트 묶음으로 변환한다.

말로 하는 규칙 정의가 어떻게 자동 테스트가 될까? 5단계 파이프라인

ASSERT가 '우리 서비스 AI는 이래야 한다'라는 문장을 받아서 점수가 매겨진 진단 보고서로 바꾸기까지의 과정은 크게 다섯 단계로 나뉜다 .

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

Studio Global AI로 검색 및 팩트체크

사람들은 또한 묻습니다.