Claude Code v0.5.4: Model Routing Saves Token Costs

Yesterday a small plugin update quietly shipped for Claude Code, and the new feature buried in the changelog is the part that actually matters. The version bump is minor. The idea behind step 2 is not.

What’s new in v0.5.4

The Claude Code Prompt Improver plugin (built by severity1, now at 1.5K stars on GitHub) works through a UserPromptSubmit hook. Clear prompts pass straight through. Vague prompts trigger a skill that reads the codebase and asks 1-6 clarifying questions before anything runs. Cost is around 189 tokens per prompt, and the hook only loads when a prompt is actually vague, so you’re not paying for it on every keystroke.

To put that in perspective: a “vague” prompt is something like “fix the auth flow” or “make the dashboard faster.” The hook catches those, scans your codebase for relevant context, and asks targeted questions: Which auth flow? Which part of the dashboard? Are you optimizing for first load or interaction latency? That back-and-forth costs 189 tokens total, which at current Claude pricing rounds to a fraction of a cent. The alternative, letting a confused model make assumptions and run for five minutes in the wrong direction, costs orders of magnitude more in both tokens and your time. The clarifying pass pays for itself on the first intercept.

Version 0.5.4 adds a second hook that fires specifically when a multi-agent or dynamic workflow run is requested. This is where it gets interesting.

The twist: model routing

When a multi-agent run kicks off, the new hook injects routing guidance automatically. The idea is straightforward but most people figure it out only after they’ve already burned through their token budget: your session model (the smart, expensive one) should handle planning, strategy, and orchestration. The actual implementation work gets routed to a smaller, cheaper model. The plugin also enforces a plan-first step, meaning the agent shows you the plan for human review before it starts executing anything.

Here is what this looks like in practice. Say you ask Claude to build a data pipeline with five stages. Without routing, Opus or Sonnet handles every step: writing the boilerplate ingestion code, the repetitive transformation functions, the standard error handling wrappers. All of that at full model pricing. With routing, the orchestrator thinks through the architecture, decides what each stage should do, identifies the edge cases, and writes the plan. Then it hands off the actual code generation for each stage to a smaller, faster model. The expensive thinking happens once. The cheap execution scales out.

The plan-first enforcement is the part that turns this from a cost trick into a quality improvement. Before any code runs, you see the full execution plan: which tasks go to which model, in what order, with what dependencies. You can catch a wrong assumption before the agent spends twenty minutes acting on it. That review step costs you thirty seconds and saves you from the painful experience of watching a multi-agent run confidently build the wrong thing.

Community reaction in the thread was mostly people nodding and saying they wish they’d known this earlier. The clarifying-questions feature is useful. The routing pattern is the real thing worth paying attention to.

How to get it running

⚙️ Add the marketplace: claude plugin marketplace add severity1/severity1-marketplace (this registers the source; without it, the install command in step 2 will fail with a “marketplace not found” error)
📦 Install the plugin: claude plugin install prompt-improver@severity1-marketplace
🧠 On your next multi-agent run, the routing hook fires automatically and injects the model guidance before execution starts. You do not need to configure anything manually, the hook reads the run type and applies the right guidance.
👀 Review the plan the agent surfaces before it runs, then approve or adjust. This is not optional overhead, it is the part that makes the whole system work.
✅ Watch your expensive model focus on decisions, not boilerplate implementation

One note on versions: the plugin requires Claude Code 0.9.0 or later. If the marketplace command returns an error, check your Claude Code version first with claude --version and update if needed.

Pro tip

If you’re already running multi-agent workflows in Claude Code, add this today and compare your token spend before and after on a similar task. The routing guidance alone is worth the five-minute install.

For a sharper measurement, run the same workflow twice on a contained task, once without the plugin and once with it, and pull the token counts from your usage logs. Most people who do this find the savings are front-loaded: the biggest reductions come from the first few workflow runs, where the model was previously doing expensive reasoning on work that a smaller model handles fine. After that, savings stabilize at somewhere between 30 and 60 percent depending on how implementation-heavy your workflows are. Worth knowing before you assume it stopped working.

Full details and source are on GitHub: github.com/severity1/claude-code-prompt-improver

Frequently Asked Questions

Q: When should I use model routing in my agent workflows?

Reserve your session model for planning, strategy, and orchestration. Route implementation and specific tasks to cheaper models, basically, don’t run your expensive model on every step. Most people discover this after burning through tokens on multi-agent runs unnecessarily.

Q: Will this slow down my clear, well-written prompts?

Nope. Clear prompts pass straight through without triggering anything. You only hit the ~189-token overhead if a prompt looks vague and needs clarification, cost only when you actually need it.

Q: Is this just another prompt-engineering tool?

Not really. The prompt-improver handles vague prompts, but v0.5.4’s real win is the model-routing guidance for workflows, it’s an architectural principle for cost management, not fancy adjective tweaking.

Q: How much can I realistically save with correct model routing?

Depends on how much you spawn agents, but switching implementation work from expensive models to Haiku adds up fast. Even small multi-agent runs compound costs quickly, this pattern catches that before you notice the damage.

Claude Code Prompt Improver v0.5.4 – workflow routing guidance
by u/crystalpeaks25 in PromptEngineering

What’s new in v0.5.4

The twist: model routing

How to get it running

Pro tip

Frequently Asked Questions

Related: