Fable 5 Debugging Performance Dropped 86% Explained

New data point that stopped me cold: one independent benchmark showed Fable 5’s debugging score crashing from 86.2 down to 25.9 after its redeployment. Refactoring dropped from 73.6 to 38.4 too.

That number comes from a rundown by Matt Wolfe, the creator behind this week’s AI news breakdown. He’s not the one who ran that benchmark, though. He’s pointing to a test from the Bridgemind account on X, and he does a great job putting it all in context.

Here’s what’s actually going on. Fable 5 got pulled by the US government back in June over safety vulnerabilities, then redeployed on July 1st with tighter guardrails. The catch, per the official notes the creator cites: the new classifier flags harmless requests more often during routine coding. So when Fable can finish a task, it performs like before. It just gets blocked way more, and those blocked requests get bumped to an Opus model instead.

What I found refreshing is his honesty. He says in his own testing it’s felt pretty much the same, and he hasn’t hit a single block. So the panic and the real experience don’t fully line up.

Three things he actually built with the new Fable to prove it still ships:

🔹 A short-form video dashboard that writes scripts, fact-checks his claims, builds a teleprompter, and generates B-roll using Remotion behind the scenes.
🔹 A long-form B-roll tool where he pastes an article, highlights sections, and it turns those highlights into animated scrolling video clips.
🔹 BuseyBench, a benchmark site testing how well each AI model draws SVG images of Gary Busey using pure code. He admits he built it purely for fun, and it still doubles as a legit look at how coding models have improved since 2023.

A few sharp tips and pitfalls he flags:

Paid plans only keep Fable 5 access through July 7th. After that it moves to paid usage credits on top of your plan.
His move: hand Fable the big heavy projects now, then let cheaper models like Opus or GPT 5.5 do incremental cleanup later.
New cheaper options are landing fast. Claude Sonnet 5 runs at 2 dollars in, 10 dollars out, but Anthropic’s own system card admits it’s not at the capability frontier of Opus or Mythos.
GPT 5.6 (the Sol, Terra, and Luna variants) got previewed at roughly half Fable’s price, but it’s locked to trusted partners for now.

He also raised one story I can’t stop thinking about. He points to reporting that OpenAI floated giving the US government a 5 percent stake, around 42.6 billion dollars. His worry, and I share it: if the government profits from these companies, who exactly keeps them regulated? That’s a genuine conflict of interest, and I’m glad he said it out loud.

Quick hits he ran through: Google’s Nano Banana 2 Light generates images in about 4 seconds, NotebookLM now makes 60-second vertical shorts, Cursor shipped an iOS app, and OpenAI is teasing Codex hardware on July 15th.

My honest reaction? The most useful lesson here isn’t the drama. It’s his workflow logic: use the powerful expensive model for the heavy lifting while you have it, then downgrade to cheaper models for the polish.

Want the full walkthrough of BuseyBench, the video dashboard, and every model update? Watch his complete breakdown for the details.

Related: