Auditing Your Claude Code Setup Against a 10-Level Framework (Honest Results)

Most Claude Code users sit somewhere between “ChatGPT in a terminal” and a fully autonomous agent infrastructure. Knowing which side of that line you’re on matters, because the gap determines whether you’re getting 10x value or 100x. The distinction is not about how many features you’ve enabled. It’s about whether your setup compounds over time or resets with every new session.

Jonathan Malkin ran his production setup (30+ skills, hooks as middleware, subagent orchestration, VPS running 24/7) against darnoux’s 10-level Claude Code mastery framework, level by level, with honest results. The breakdown is worth studying regardless of where you land.

The Criteria That Actually Matter

Levels 0-4 are about utility. You add context (CLAUDE.md), connect live data (MCP servers), build reusable workflows (slash commands), and set up memory that persists corrections across sessions. Most advanced users land somewhere in this range. The work at these levels is mostly additive: each addition makes Claude more capable without requiring you to think differently about how it operates.

Level 5 is the real inflection point. Below it, you’re making Claude more useful. At 5 and above, you’re making it safe enough to give real autonomy to. That’s a different kind of problem, and it requires thinking like a system architect, not a power user. The questions change from “how do I make this better?” to “what breaks when I’m not watching?” If you can’t answer the second question, you’re not ready to hand over the keys.

Power User vs. Systems Architect

Here’s where the two setups actually diverge:

Power user (Levels 0-4):

  • CLAUDE.md with project context and identity
  • MCP servers for live data (filesystem, APIs, browser)
  • Custom slash commands for repeated tasks
  • Memory that saves corrections across sessions

⚙️ Systems architect (Levels 5-10):

  • Hooks as middleware: safety guards, context compression, date injection running on every command
  • Subagent routing by model: Haiku for research, Sonnet for synthesis, Opus for decisions requiring depth
  • Structured decision cards instead of vague approval gates
  • Headless claude -p pipelines for async processing (tweet scheduler, email triage, morning orchestrator)
  • Always-on cron infrastructure: VPS, scheduled jobs, daemon processes
  • Lead agent orchestration with documented failure handling

The difference is not complexity for its own sake. It’s whether Claude can operate reliably without you in the loop. A power user optimizes for speed during sessions. A systems architect optimizes for reliability between sessions. Both are legitimate goals, but conflating them leads to fragile setups that feel powerful until they silently fail at 2am with no one watching.

The Recommendation

Don’t treat the framework as a ladder to climb. Treat it as a diagnostic tool.

One thing Malkin is honest about: he didn’t build level by level. He built top-down. Foundation first (CLAUDE.md, identity, context), then skills, then infrastructure. The requirements pulled him up the levels. Architecture informed implementation, not the other way around.

That’s the practical advice worth keeping. Build what your specific workflow actually needs, and let the requirements dictate where you go next. Skipping ahead to cron infrastructure before your CLAUDE.md actually captures your working style is a reliable way to automate noise instead of value.

Where to Start, Based on Where You Are

  1. If you’re at Level 3-4 and Claude still needs constant steering: The bottleneck is almost certainly your CLAUDE.md and memory setup. Add a feedback memory category if you don’t have one. It’s where compounding actually happens: corrections get saved with the why, so you stop making the same correction twice across sessions. Without the why, a rule is just a rule. With it, Claude can apply judgment to edge cases instead of blindly following instructions it doesn’t understand.
  2. If you’re at Level 5 and want real autonomy: The next investment is hooks and model routing. A safety hook that intercepts destructive operations before they run is the highest-leverage single addition at this stage. Even a simple hook that requires confirmation before any file deletion or git reset changes the risk profile of your entire setup.
  3. If you’re building toward always-on infrastructure: Browser automation must run in isolated subagents, not inline. Screenshots are base64 context bombs. Find that out in planning, not mid-session when quality degrades and you lose state. The same principle applies to any tool that produces large outputs: isolate it, summarize the results, and pass only what the lead agent needs.
  4. One gotcha worth documenting now: Haiku agents complete work but can fail to communicate results back to a lead agent. Anything that needs to report results up the chain has to run on Sonnet or Opus. Put that in your CLAUDE.md before the next session rediscovers it. Small model routing decisions like this have outsized effects on whether your orchestration actually holds together under load.

The Setup References

darnoux’s framework is worth reading straight through. Jules on GitHub (skills, hooks, memory, cron setup, agent patterns) is the reference implementation to study if you’re headed toward Levels 6-10. The value of studying a real production setup is that you see what got built and what got skipped, which tells you as much as the architecture itself.

Where does your setup land? The Level 5 to Level 6 jump is where the most interesting infrastructure decisions happen. That’s the one worth working through.

Frequently Asked Questions

Q: Should I prioritize MCP servers over custom skills?

Both are complementary, not competitive. MCP servers provide the data foundation (filesystem, APIs, live context), while skills encode workflows and decision logic. The framework places MCP at Level 2 and skills at Level 3 because you need the data first, but skills, especially those coordinating multiple tools, often deliver more immediate productivity wins once that foundation exists.

Q: Do I need 30+ skills to be advanced?

No. The author built incrementally over 3 months, starting with high-friction workflows. Focus on 3-5 core skills that automate your most-repeated decision points, like /scope (validate requirements before coding) or /systematic-debugging (force the right diagnostic sequence). The number scales naturally as you discover more patterns worth encoding.

Q: How detailed should my CLAUDE.md be?

Start simple, even 1-2 profiles (identity + business context) unlock most gains. Depth compounds over time; the 6-level structure here evolved gradually. The key isn’t volume but consistency: the more accurate your context, the less steering Claude needs and the more it just executes your intent.

Q: What’s the highest-ROI first skill to build?

Pick a workflow you execute frequently where clarity prevents mistakes, like pre-code validation, structured debugging, or session wrap-up. These skills save time and prevent errors simultaneously, making them more valuable than convenience scripts. Start with whichever friction point interrupts your flow most often.

How Jules, my Claude Code setup, stacks up against @darnoux’s 10-level mastery framework.
by u/jonathanmalkin in PromptEngineering

Scroll to Top