The restrictions aren't limited to cybersecurity. The guardrails also target queries related to biology, chemistry, and, critically, AI model distillation. This last point has fueled a separate wave of criticism, with some developers accusing Anthropic of using “safety” as a pretext for anticompetitive behavior by preventing other AI developers from using Fable 5's outputs for training .
Anthropic's safety system in Fable 5 is not a simple refusal mechanism. It's a routing system designed to fail quietly . The architecture works in three steps:
Anthropic states that these classifiers fire on fewer than 5% of all sessions on average . The company has publicly acknowledged the over-flagging problem. A company spokesperson told Business Insider that the safety measures “may flag safe, neutral, or benign requests,” but justified it as a necessary trade-off to publicly release a model with such powerful underlying capabilities
.
Anthropic’s position is that the conservative guardrails are a deliberate and responsible choice, not a bug. The company argues that the underlying Mythos-class model is so proficient in tasks like finding and exploiting software vulnerabilities that an unrestricted public release would create an unacceptable risk of catastrophic misuse .
The guardrails are, in their view, a design compromise—a way to provide the public with access to a state-of-the-art reasoning, coding, and writing model while putting a sandbox around its most dangerous potential capabilities . They frame the over-flagging as the temporary cost of releasing a powerful model both “safely and quickly,” with a commitment to refine the classifiers over time
.
The release of Claude Fable 5 cannot be fully understood in isolation. It is one half of a two-tier deployment strategy that is becoming a new industry standard for frontier AI models .
On the same day Fable 5 was released, Anthropic also announced Claude Mythos 5. Both models share the exact same underlying architecture and weights—they are the same “brain.” The only difference is the safety configuration. Mythos 5 has the classifiers removed in the sensitive domains, giving it its full, unrestricted capabilities .
However, Mythos 5 is not for the public. It is restricted to a small group of vetted partners, including government agencies and critical infrastructure operators, through an initiative called Project Glasswing . This U.S. government-backed program was initially launched with 12 founding partners, including tech giants like AWS, Google, and Microsoft, to let “cyber defenders” use AI to find and patch software vulnerabilities at scale
. With the release of Mythos 5, access was expanded to approximately 40 organizations
.
The table below illustrates the fundamental split:
Anthropic's Fable/Mythos split is the most explicit example of what can be called capability-tiered AI deployment. In this new model, a single frontier AI is not a single product. Its full power is a privilege, not a given, and safety guardrails are the mechanism that creates product differentiation .
This pattern is not unique to Anthropic. Other leading AI companies, including OpenAI, have adopted similar approaches by providing restricted-access versions of their most advanced models to national security and research partners . The Fable/Mythos launch crystallizes a future where the most powerful AI capabilities are gated not by technology, but by a vetting status, with safety protocols doubling as access-control mechanisms, an approach that is already sparking a broader debate about centralization, fairness, and the true meaning of "public" AI safety.
Comments
0 comments