Claude Code: Save Money & Boost Efficiency with AI Delegation

Here’s a practical trick for anyone working with Claude Code: stop micromanaging how the AI works, and let it use its own judgement instead. The tip comes from Simon Willison, who picked it up during a Fireside Chat he hosted with Cat Wu and Thariq Shihipar from the Claude Code team, plus a follow-up from Jesse Vincent. The payoff is real. Simon reports he’s “getting a ton of work done” while burning through his Fable allowance far more slowly than before.

This guide walks you through the idea and the exact setup. If you use Claude Code (or a similar agent), that’s all you need.

What you’ll learn

Why handing decisions to the AI often beats writing strict rules
How to make Claude delegate coding work to cheaper models automatically
The exact prompt Simon used, so you can copy it

Quick terms, explained

Fable / Opus: the top-tier, most capable Claude models. Powerful, but they cost the most to run.

Sonnet / Haiku: lower-power models. Sonnet handles solid implementation work. Haiku handles simple, mechanical edits. Both are cheaper.

Subagent: a smaller helper task the main AI spins up to do a specific job, on a model you choose.

Tokens: the units you’re billed on. Fewer tokens on the expensive model means lower cost.

Step 1: Let the AI judge, don’t dictate

The core lesson from the Claude Code team was about testing. You could write a rigid rule like: “only use automated testing for larger features, don’t update and run tests for small copy or design changes.” It works. But Simon reports the team said it’s better to just tell Fable to use its own judgement about when to write tests.

Why this matters: rigid rules break on edge cases you didn’t predict. The model already understands context. Give it the goal, not the checklist.

Step 2: Apply the same idea to cost

Jesse Vincent gave Simon a related tip aimed squarely at saving money: tell Fable to use other models for smaller tasks, applying its own judgement about which model to use. Instead of you deciding “this is a Haiku job,” the AI decides. The top model stays reserved for the thinking-heavy work.

Step 3: Use the exact prompt

Simon prompted Claude Code with this line. You can paste it as-is:

“For all coding tasks use your judgement to decide an appropriate lower power model and run that in a subagent”

That’s the whole instruction. Short and open-ended on purpose.

Step 4: Let Claude save it as a memory

After the prompt, Claude saved a memory file at ~/.claude/projects/name-of-project/memory/delegate-coding-to-subagents.md so the preference sticks across sessions. The key part of what it recorded:

“Why: cost/efficiency, implementation work rarely needs the top-tier model; judgment, review, and synthesis stay with the main loop.”

“How to apply: when a task in this project is primarily writing/editing code, spawn an Agent with a model override (sonnet for substantive implementation, haiku for trivial/mechanical edits) and a self-contained prompt; review the result in the main loop before committing. Design, auditing, data synthesis, and anything judgment-heavy stays in the main model.”

Read that split carefully. It’s the whole strategy in two sentences: cheap models write code, the expensive model reviews and decides.

Best practice: keep judgement work up top

Note what does NOT get delegated. Design, auditing, data synthesis, and any judgment-heavy work stays with the main, higher-power model. The subagent writes; the main loop reviews the result before anything gets committed. You’re not cutting quality, you’re cutting waste.

Why this matters right now

Simon flagged the timing: Fable prices are set to go up in the few days after he wrote this, so stretching your allowance has an immediate dollar value. Beyond the deadline, this points to a broader shift in how people work with agents. The best results come from setting intent and trusting the model to route the work, rather than scripting every decision yourself.

Next steps

Try the prompt in your own Claude Code project and watch your token usage over a few days.
Check whether a memory file was saved, and edit it if you want tighter or looser rules.
Extend the same “use your judgement” pattern to other repetitive decisions, like when to run tests.

For the full context and the original memory file, see Simon Willison’s writeup.

Read original article