Reduce AI Agent Context Bloat: Speed Up Workflows

A developer sat down on a Sunday with one goal: figure out why his AI agents felt slow. Not slow as in latency. Slow as in, he kept watching them churn through file after file before writing a single line of code. Six months of scaffolding had quietly turned into a monument to planning.

🤖 The Setup That Grew Too Big

Agentic workflows have a natural gravitational pull toward complexity. You add a pre-commit hook here, a 24KB orchestration script there, a CONTRIBUTING.md that nobody asked for. Each addition makes sense in isolation. Together, they create something that reads more than it does.

This developer’s agents were burning tokens on documentation. Every session started with the agent loading context it rarely needed. Think of it like a contractor who memorizes the full building code before hammering a single nail. Useful once, but you don’t need to reread it before every task. The agent was studying for a test it had already passed, over and over.

The real cost is not just tokens. It is the cognitive overhead of maintaining all that structure. Every new file you add to the context is a file you need to keep accurate, keep updated, and keep consistent with everything else. That is invisible maintenance debt, and it compounds quietly until one Sunday you notice nothing is actually moving.

🔍 What the Audit Actually Found

He audited everything manually. Not with a script, not with a dashboard, just him and a delete key on a quiet Sunday afternoon.

The list of what went: pre-commit hooks that blocked more than they caught, a 48KB orchestration script that handled edge cases no one had seen in production, a cluster of markdown files consolidated into two, contract tests so brittle they broke on whitespace changes. Skills that loaded on every session whether they were needed or not got converted to lazy-load, pulling context only when invoked.

The markdown consolidation alone was revealing. He had seven files that all described variations of the same agent behavior. Each one had been written at a different point in the project’s life, so they contradicted each other in small ways. The agent was not just loading extra context. It was loading conflicting context and quietly averaging it out.

He went to bed. Monday morning he shipped to three projects in parallel inside six hours.

⚙️ How to Run the Same Cleanup

Start with context load order. Open a fresh session and watch what your agent reads before it writes anything. That reading list is your audit target.

Ask a simple question about each file: does the agent need this on every single task, or only on specific ones? If the answer is “only specific ones,” make it lazy. Reference it by name in your main config so it can be pulled when relevant, but don’t force it into the starting context.

Consolidate documentation aggressively. If you have five markdown files covering related ground, combine them into one. Agents do not get bonus points for reading more files. They just spend more tokens.

Delete the hooks and scripts that exist for theoretical problems. Pre-commit hooks that run heavy checks on every commit slow the feedback loop. Contract tests that map to behavior that never actually breaks are noise. Cut them and see what breaks in production. Usually nothing does.

Count your markdown files. If the number is above ten and the project is a personal workflow, you have a documentation problem masquerading as an organization system. A good rule of thumb: if you cannot describe what every file does in one sentence from memory, the file is probably not earning its place in the context window.

💡 Tips Worth Keeping

The goal is not to have zero structure. It is to have structure that earns its place by saving more tokens than it costs.

Lazy loading is the highest-leverage change in this whole list. An agent that loads twenty skill files on startup is paying a tax on every single run. Converting those to on-demand references costs about thirty minutes of refactoring and pays back immediately.

After you cut something, run your actual workload for a week before adding anything back. The things you genuinely miss will be obvious. The things you thought you needed will turn out to be infrastructure for a problem you solved differently months ago.

Set a size limit on any single orchestration file. 10KB is a reasonable ceiling. If a script that coordinates agent behavior is longer than that, it is probably doing something an agent could figure out from a clear instruction instead. A long orchestration script is often a sign that you stopped trusting the model and started writing a second model in plain text.

Keep a short changelog of what you removed and why. Not for the agent, for yourself. Three months from now, when someone suggests adding back a pre-commit hook, you will want the note that says “removed because it blocked 40 commits and caught zero real bugs.”

🚀 Try It This Week

Pick one AI workflow you use regularly and look at what loads before the first output appears. Find one file that is not earning its tokens. Delete it or make it lazy. Ship something afterward and notice whether anything breaks.

The ultimate agentic setup does not exist. The one that ships fastest usually does.

Token efficiency | Project bloat | How I reduced token usage without plugins/mcps
by u/_KryptonytE_ in PromptEngineering

🤖 The Setup That Grew Too Big

🔍 What the Audit Actually Found

⚙️ How to Run the Same Cleanup

💡 Tips Worth Keeping

🚀 Try It This Week

Related: