Anthropic has apologized for quietly throttling its newest AI model, Claude Fable 5, with hidden guardrails that degraded answers for researchers and rivals without telling anyone it was happening. According to The Verge AI, the company is now reversing course and promising to be upfront about when its restrictions kick in, even if that means Fable refuses more queries outright. This is a notable climbdown, and it gets at a tension every frontier lab is wrestling with right now: how to ship a powerful model fast without quietly breaking trust.
What actually happened
Fable is the first widely available model in Anthropic’s Mythos class, a group the company has spent months warning is too dangerous for open release. To manage that risk, Anthropic launched Fable with safeguards that block or alter responses to certain “high-risk” queries.
The problem was how one of those safeguards worked. The Verge AI reports that Anthropic targeted distillation, a technique for training smaller models on the outputs of larger ones. Instead of refusing suspected distillation attempts, Fable silently changed and degraded its answers. Users got worse responses and were never told why.
That detail was buried in Fable’s system card, the public document labs publish to explain how a model behaves. Anthropic openly stated it would “handle queries it believed were distillation attempts by altering and degrading the model’s answers directly,” with no notification to the user.
Why researchers pushed back
The backlash was swift, and it’s easy to see why. A safeguard meant to stop competitors from copying Fable could also hit legitimate third parties trying to evaluate the model. If you can’t tell whether you’re getting Fable’s real answer or a deliberately worsened one, you can’t trust any benchmark, audit, or safety test you run on it.
Anthropic justified the targeting by pointing to its terms of service, noting that “using Claude to develop competing models already violates” those terms. The company has previously accused Chinese rivals like DeepSeek of distilling its models on an “industrial” scale, so the commercial motive here is real. But invisible degradation is a different beast from a clear refusal.
The fix
Anthropic laid out the change in a post on X. Here’s what’s different now:
- Suspected distillation queries fall back to Claude Opus 4.8, the company’s previous flagship, instead of being secretly degraded.
- Users get told every time. “You will see this every time it happens,” Anthropic wrote.
- This mirrors how Fable already handles other high-risk areas like biology, chemistry, and cybersecurity, where queries route through Opus 4.8 unless they’re blocked outright under rules covering drugs, weapons, or prohibited content.
The company was blunt about the original call. “Invisible safeguards can be targeted more narrowly, allowing us to ship quickly with very few false positives. We went with invisible safeguards for this reason, and that was the wrong tradeoff,” Anthropic wrote. “You should have visibility into the safeguards we have in place, and why. We’re sorry for not getting the balance right.”
Why it matters
What stands out here is the admission that speed won out over transparency, and that the company now sees that as a mistake. As Anthropic put it, “Visible safeguards can be probed, so they have to be robust, which takes time to get right.” Hidden ones are faster to ship. That’s the whole tradeoff in one sentence.
There’s a practical catch worth flagging. The Verge AI notes that in some areas, notably biology, Fable’s safeguards are tuned so broadly that the model is “practically unusable for even basic queries,” something an Anthropic spokesperson acknowledged. So more visibility doesn’t automatically mean more usable.
If you build on or evaluate frontier models, the lesson is concrete: read the system card, and treat undocumented response degradation as a real risk when you benchmark. Expect more labs to face the same transparency question as Mythos-class systems move toward release. Full details are at the original report from The Verge AI.