답변게시됨그저께Last edited 그저께31 소스

앤트로픽의 가장 강력한 AI '클로드 페이블 5', 출시 하루 만에 무너지다: '팩 헌트' 전략이 안전성 신화를 깨뜨렸다

2026년 6월 10일, 앤트로픽의 최신 AI '클로드 페이블 5'가 출시 단 하루 만에 연구원에 의해 안전 장벽이 무너졌다. '팩 헌트'라 불리는 다중 AI 협업 전략으로, 1,000시간 이상의 레드팀 테스트를 무력화했다.

Studio Global AI로 검색 및 팩트체크 인기 페이지 더 보기

37K0

What happened when Anthropic's Claude Fable 5 was reportedly jailbroken by a researcher just one day after its June 9 launch, what techniqueAI-generated editorial hero image for What happened when Anthropic's Claude Fable 5 was reportedly jailbroken by a researcher just one day after its June 9 launch, what technique.
AI 프롬프트
Create a landscape editorial hero image for this Studio Global article: What happened when Anthropic's Claude Fable 5 was reportedly jailbroken by a researcher just one day after its June 9 launch, what technique. Article summary: On June 10, 2026 — just one day after Anthropic launched Claude Fable 5, its first public Mythos-class model — prolific AI red-teamer **Pliny the Liberator** announced he had bypassed the model's safety classifiers, extr. Topic tags: general, general web, user generated. Reference image context from search candidates: Reference image 1: visual subject "# Anthropic’s Claude Fable 5 Jailbroken to Generate Stack Exploits. Anthropic's Claude Fable 5 Jailbroken. Anthropic launched Claude Fable 5 on June 9, 2026, as the first publicly" source context "Anthropic's Claude Fable 5 Jailbroken to Generate Stack ..." Reference image 2: visual subject "Anthropic Releases Cl
openai.com

앤트로픽은 2026년 6월 9일, 자사의 첫 ‘미토스(Mythos)’ 등급 AI 모델인 **클로드 페이블 5(Claude Fable 5)**를 공개했다. 이는 너무 위험해서 제한 없이 공개하기 어렵다고 여겨지던 등급의 모델로, 전례 없는 수준의 안전 장치가 적용되었다 . 전용 AI 분류기가 사이버 보안, 생물학, 화학, 모델 복제 등 고위험 영역의 질의를 실시간 감시하고, 위험 신호가 감지되면 사용자 모르게 요청을 더 낮은 성능의 모델 ‘클로드 오퍼스 4.8’로 우회시키는 방식이었다. 앤트로픽은 출시 전 1,000시간 이상의 외부 버그 바운티 및 레드팀 테스트를 통해 단 하나의 범용 탈옥(Universal Jailbreak)도 발견하지 못했다고 공언했다 .

그러나 이 자신감은 약 하루 만에 산산조각이 났다.

2026년 6월 10일, 가명의 AI 레드팀 전문가 **‘플리니 더 리버레이터(Pliny the Liberator)’**는 페이블 5의 안전 분류기를 완전히 우회하는 데 성공했다고 발표했다. 그는 모델의 12만 자 분량의 시스템 프롬프트 전체를 추출해 깃허브에 공개했고, 취약점 공격 코드와 제한된 화학 합성 가이드까지 생성하는 데 성공했다 . 출시 후 탈옥 성공까지 걸린 시간은 24~48시간에 불과했다 .

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

Studio Global AI로 검색 및 팩트체크

사람들은 또한 묻습니다.