Running Out of Tokens Was the Best Thing That Happened to This Dev

Late one night, a developer hit the token limit with Codex mid-session. Most people would reload their context dump and keep grinding. The author, posting in r/PromptEngineering, stopped and asked a different question instead: where is all that token budget actually going?

Turns out, most of it wasn’t the actual work. It was context reloading. That one realization turned into a full experiment, documented in a Medium article and a public GitHub repo, and shared with anyone dealing with the same problem.

🧠 Why Context Waste Is a Real Problem

If you’ve done any serious AI-assisted development, you’ve felt this. Every new session, the model starts fresh. You re-explain the architecture. You re-establish naming conventions. You summarize why you made that one weird decision in the auth module last week.

You’re essentially re-onboarding a developer every time you open a new chat. That’s expensive in tokens, slow in practice, and it introduces inconsistency when your summaries don’t perfectly match from session to session.

The core insight here is that context isn’t just overhead you tolerate. It’s something you can engineer. And at the prompt level, there’s more room to optimize than most people realize.

Think about it this way: if a human junior developer forgot everything you told them each morning and you had to re-explain the entire codebase before lunch, you’d fix the problem immediately. With AI, people just accept it as the cost of doing business. That’s the assumption worth challenging.

🔧 How the Context Engine Works

The author built the engine layer by layer, adding one piece at a time as each problem revealed the next. Here’s what the full system includes:

  • Persistent memory: stores architecture decisions, naming conventions, and key context across sessions so the model doesn’t start from zero
  • Context planning: proactively selects what’s relevant for each task instead of loading everything at once
  • Failure tracking: logs what went wrong and why, so the model doesn’t repeat the same mistakes
  • Task-specific memory: different tasks pull different context slices, keeping each session lean
  • Domain mods: modular context extensions for specific areas like UX, frontend, or backend work

The original Medium article walks through every iteration, including the prompts used at each stage. The author is upfront that some of those early prompts are “a bit chaotic,” which honestly makes the writeup more useful, not less. You can watch the thinking evolve in real time instead of seeing a polished final version with the messy parts cleaned up.

The domain mods piece is particularly clever. Instead of one giant context blob that covers everything, you maintain smaller focused modules and mix them together based on what you’re actually doing. Working on a frontend bug? Load the UI module. Refactoring the API layer? Swap in the backend module. The total token cost stays predictable, and the model gets exactly what it needs rather than a wall of text it has to sort through.

By the end, the creator says it stopped feeling like using an assistant and started feeling like working with a small dev team. That’s the shift context engineering can create when done well.

💡 Tips for Starting Your Own Context System

You don’t need to build the full engine to get value here. A few practical places to start:

  • Keep a project context file: a simple markdown document with architecture decisions and key notes, pasted at session start. Even this beats starting from nothing every time
  • Track failures explicitly: add a short “what didn’t work and why” section to your context. Low effort, and it genuinely reduces repeated errors
  • Scope context to the task: don’t load your whole project history for a small bug fix. A focused 200-token context often beats a bloated 2,000-token one
  • Start with your highest-friction area: wherever you re-explain the most is where your first domain mod should live
  • Version your context file: treat it like code. When a major architectural decision changes, update the file and commit it. Future you will be grateful.

One real thing worth watching: persistent memory can also lock in bad assumptions. If the model learned something wrong early, it’ll keep getting it wrong confidently, which is worse than getting it wrong randomly. The author uses failure tracking to address this, but it’s worth thinking through before you adopt the pattern in your own workflow. A quick audit of your context file every few weeks goes a long way.

🚀 See the Full Experiment

The original r/PromptEngineering thread includes links to the Medium article with the full iteration history and all the prompts, plus the GitHub repo to fork and adapt. If you’re burning your token budget on re-explaining your own project instead of building it, this one is worth reading.

Head over to the original Reddit discussion to find the links and see what the community is saying about it. The comments alone have some sharp variations on the approach that are worth stealing.

How context engineering via prompts turned Codex into my whole dev team — while cutting token waste
by u/Comfortable_Gas_3046 in PromptEngineering

Scroll to Top