AGENTS.md Framework for AI Agents That Code Better

A developer on r/PromptEngineering got tired of AI agents that just write code without real understanding. So they built an AGENTS.md file that gives the agent a reasoning topology before it generates a single line. The twist: the whole framework is built around making the agent slower on purpose.

Not slower in a frustrating way. Slower in a “does this actually make sense before we ship it” way. The difference is huge in practice. Most AI-generated code doesn’t fail because the model is dumb. It fails because the agent skipped the thinking step entirely and went straight to typing. This framework puts the thinking step back in.

What’s in it

The AGENTS.md framework works as a thinking protocol. Before the agent writes anything, it runs through an ambiguity detection loop. Vague request? Full interrogation. Clear and specific? Quick verify and proceed. Trivial change (add a tooltip, fix a typo)? Skip everything and just do it.

The file itself is surprisingly short. It’s not a massive system prompt with hundreds of rules. It’s a structured decision tree that forces classification before action. The agent reads the task, classifies it by complexity, and then routes it through the appropriate workflow. Vague tasks go into a clarification loop where the agent asks targeted questions until the scope is unambiguous. Complex but clear tasks go through the four invariant questions. Simple tasks skip the whole thing and execute directly. That classification step alone is worth the price of admission, because most agents don’t classify at all. They just start generating.

The part that’s actually clever

The agent has to answer four invariant questions before touching non-trivial code:

Where does state live?
Where does feedback live?
What breaks if I delete this?
When does timing work?

If it can’t answer those clearly, it flags the gap and waits. That’s the feature. Most people prompt their agents to go faster. This one slows down on the things that matter.

Think about what those four questions actually cover. State: is this data in memory, a database, a global store, a cookie? Getting that wrong means bugs that only show up in production. Feedback: where does the user or system see the result? Getting that wrong means features that work technically but confuse everyone using them. Deletion risk: what downstream dependency quietly relies on the thing you’re about to change? Getting that wrong means you break something three components away from where you’re working. Timing: is this synchronous, async, event-driven? Getting that wrong means race conditions that are nearly impossible to reproduce in a dev environment.

These aren’t abstract software engineering principles. They’re the exact questions a senior engineer asks before reviewing a PR. The AGENTS.md framework bakes them into the agent’s default behavior before the code even gets written. That’s the architectural shift here. You’re not hoping the model thinks carefully. You’re requiring it.

How to use it

🔹 Grab the AGENTS.md gist from u/Educational_Yam3766’s GitHub Gist profile. Search the username directly on gist.github.com and look for the most recent public gist with “AGENTS” in the title
🔹 Drop it into your project root or paste it into your agent’s system prompt. If you’re using Cursor, Windsurf, or Aider, the project root placement works out of the box. Raw Claude or GPT session? Paste it as the first block in your system prompt
🔹 Start a task, let the agent run ambiguity detection, and answer its calibrated questions. Don’t rush this part. The interrogation questions it asks are the framework doing exactly what it’s supposed to do. If it asks you three questions and you answer all three clearly, the subsequent code will be dramatically better than what you’d get from “just write me a function that does X”
🔹 Say “ship it” when you’re aligned and it proceeds with any deferred risks explicitly flagged. That deferred risks list is genuinely useful on its own. It’s not just code comments. It’s a structured handoff of the things the agent flagged as uncertain but proceeded anyway. Review it, turn it into issues, or address it in the next task

Pro tip

The Trivial Changes Rule is underrated here. The friction loop only activates on complex, non-obvious requests. Small stuff like typos, tooltips, and variable renames moves fast. That’s what makes this practical for real-world use instead of just interesting in theory.

There’s a secondary move worth knowing: the ambiguity threshold is tunable. The framework as written treats anything without a clear scope as vague. But if your team works with well-documented specs or a tight ticket system, you can modify the ambiguity detection criteria to match your actual workflow. The AGENTS.md file is just text. Edit it. Make it work for the way you build, not the way the original author builds. A few lines of adjustment can cut the clarification overhead by half without losing any of the reasoning quality.

Call to action

If your AI agent keeps shipping code that technically runs but makes no architectural sense, this is worth 10 minutes of your time. Grab the gist and drop it into your next project. 🚀

The gap between an agent that writes code and an agent that understands what it’s writing is mostly a prompting problem. This framework closes a big chunk of that gap without requiring you to build anything new. Ten minutes of setup, one file in your project root, and your agent starts asking the questions it should have been asking all along.

Frequently Asked Questions

Q: Do you enforce verification every time, or skip it for small changes?

You still verify everything, but small changes get fast verification. A typo fix gets a quick read-through. A rename gets a quick grep check. It’s not about skipping verification; it’s about matching rigor to risk. Obvious changes get obvious verification.

Q: How do you quickly tell if something is low-ambiguity or high-ambiguity?

Look for specificity. “Change X to Y” is clear (low ambiguity). “Help me with state management” is vague (high ambiguity). “Add a loading spinner here” is medium (scope is clear, but design choices exist). If you can immediately picture what done looks like, move forward. If you can’t, ask questions.

Q: What does “state ownership” mean in practice?

It’s about answering: who owns this data? If Component A stores the cart, changes go through A, not direct mutations elsewhere. This prevents the classic bug: “I updated the database, but the UI shows stale data because the cache wasn’t cleared.” Always name the owner when you design.

Q: When do you resolve tensions versus deferring them?

Resolve now if it’s high-impact or expensive to change later (like picking an async pattern). Defer if it’s low-cost to revisit (like performance tweaks). Both are valid, execute means you’ve answered the critical questions and can move forward.

My AGENTS.md
by u/Educational_Yam3766 in PromptEngineering