Claude Code's API Exposed: Inside the Prompt System

Someone set up a local proxy, intercepted every raw request Claude Code sends to Anthropic’s API, and documented the full structure. The finding? There’s nothing magical happening. It’s a context window management system, top to bottom. 🔍

Think about what that actually means. A tool that writes code, runs agents, manages memory, and orchestrates multi-step workflows is doing all of that through the same mechanism you use when you type a message into the chat. The “intelligence” is real. The infrastructure is not exotic. One engineer with a proxy and some patience exposed the entire operating model in an afternoon.

Claude Code looks sophisticated from the outside. Tools, memory, agents, skills. Watching it work, you might assume there’s some proprietary runtime underneath, some special protocol between client and model. But when you crack open the actual API payloads, you keep seeing the same thing: plain text, carefully ordered, loaded just-in-time. The sophistication is in the sequencing, not the substrate.

This matters beyond trivia. If you understand that Claude Code is fundamentally a prompt orchestration system, you understand why some sessions feel sharp and others feel scattered. You understand why a well-written CLAUDE.md outperforms a bloated one. You understand why context hygiene is not a nice-to-have but a core performance lever. The model doesn’t know what you haven’t told it, and it forgets what got pushed out of the window.

Your CLAUDE.md file gets injected as raw text before your first message, wrapped in system-reminder tags. Skills load as tool results when you invoke them. Memories? Same deal. No special API fields. No proprietary magic. Just well-structured prompts. Every capability you interact with in Claude Code is ultimately a piece of text that arrived in the context window at the right moment, in the right order, formatted clearly enough for the model to act on it. That’s the whole system.

Which means the quality of your setup is the quality of your prompts. A vague CLAUDE.md produces a vague agent. A precise one produces a precise one. There’s no fallback layer doing interpretation for you.

Three things worth knowing:

🚀 Deferred tools cut startup tokens by ~85%, Claude Code sends tool names only on launch, then fetches full schemas on-demand. That’s how it handles 50+ tools without blowing the context window before you type a single word. The practical implication: tools you never invoke cost you almost nothing. The system is lazy-loading by design, which means you can register a large tool library without paying an upfront token tax on every session. If you’re building your own agentic setup, this pattern is worth stealing. Register everything, load only what’s needed, load it at the moment it’s needed.
📋 Skills are just markdown files, Invoke /code-review and Claude calls a Skill tool, gets back a markdown document with instructions, and follows them. That’s the entire mechanic. Writing a good skill means writing a good prompt. This is liberating if you’ve been treating “agentic frameworks” as something requiring engineering infrastructure. A skill is a text document. You can version it, share it, iterate on it in a text editor. The barrier to building genuinely useful agent behaviors is much lower than most people assume. The bottleneck is always prompt quality, not tooling complexity.
🔄 Compaction is a controlled reset, not data loss, When a long session collapses into a summary block, CLAUDE.md and deferred tools survive and get re-injected. The system is designed to protect the important context across resets. Understanding this changes how you structure your persistent instructions. Anything you need to survive a compaction event belongs in CLAUDE.md or a memory file, not in a message you typed three hours ago. The session transcript is ephemeral. The injected files are not. Design accordingly.

The bigger takeaway: if you’re building anything agentic, prompt structure IS system architecture. There’s no layer below the text. Every decision about what gets loaded, when it gets loaded, and how it gets formatted is a structural decision with real consequences for how the system performs. Engineers who treat prompts as configuration rather than as code tend to build more reliable agents, because they stop looking for a magic layer that doesn’t exist and start working with the actual mechanics.

Most people using Claude Code daily have never thought about what’s in the API payload. They don’t need to. But the ones building on top of it, designing agents, writing skills, structuring memory, those people benefit enormously from knowing exactly what the model sees and when it sees it.

The engineer who captured these payloads posted a full breakdown with actual request examples. Linked below. Worth a read if you want to see exactly what’s running under the hood.

I intercepted Claude Code’s API calls and broke down exactly what it sends — here’s what I found
by u/Funny_Prior7225 in PromptEngineering

Related: