Why Your AI Repeats the Same Code Mistakes: 2 Fixes

Week three of building a real-time multiplayer app. GPU rendering, live video processing, the whole thing. Progress was real, the commit history was growing, and the momentum felt good. Then the AI suggested the exact pattern that broke the code two sessions ago. The one that was already fixed. Debugged, documented, buried. Again.

A developer on r/PromptEngineering hit this wall and came back with two concrete changes that fixed it. Both are simpler than you’d expect.

🧠 Why This Keeps Happening

Every new chat starts from zero. The AI has no idea you already argued yourself out of using polling over WebSockets. No idea that race condition already cost you an afternoon. No idea the third-party SDK you were going to use turned out to have a licensing problem. It just starts fresh, confident, and ready to suggest the same mistakes.

One model on its own is what this developer called “a confident genius with blind spots.” It optimizes for sounding coherent, not for being correct. It fills gaps in its knowledge with plausible-sounding guesses and moves on like nothing happened. That’s the actual problem. And because the output reads well, it’s easy to miss until you’ve already wasted two hours chasing a bug you’ve seen before.

📄 Fix One: A Living Spec Doc

Before touching any feature, keep an architecture.md that records not just what the app does but why each decision was made. The why is where the value is. “We use WebSockets instead of polling because polling caused noticeable lag above 40 concurrent users” is infinitely more useful than “we use WebSockets.”

Every new chat starts with zero memory. The doc is the memory. Paste the relevant sections into the system prompt before you ask a single question. Update it after every feature, every failed experiment, every decision that took longer than it should have. Think of it as onboarding a new contractor at the start of every session, because that’s exactly what you’re doing.

One commenter added a sharp upgrade: include a “known anti-patterns” section that explicitly lists what the AI keeps trying that doesn’t work. Not vague warnings, but specific ones. “Do not suggest setInterval for state sync. We tested this. It causes drift under load.” Hand it the list of landmines before it walks into them. The five minutes it takes to write that entry will pay back every time.

⚔️ Fix Two: Two AIs That Argue

Have one model interrogate the idea and write an implementation plan. Be specific with the prompt: describe the feature, the constraints, the parts of the codebase it touches. Then hand that plan to a different model and tell it to tear the plan apart. Edge cases, contradictions, simpler approaches, anything it would flag before a code review.

The developer uses Gemini and Claude, but any two strong models work. What matters is the role split, not the brand. The setup changes the dynamic completely: instead of one model optimizing for “does this sound good,” you get one that builds and one whose entire job is finding flaws. One catches what the other sails past. It takes a few extra minutes upfront and regularly saves hours of debugging on the back end.

💡 Tips and Tricks

Kill the sycophancy before anything else. The default AI is a yes-man. Run ideas through this system prompt first:

Act as my high-level advisor and mirror. Be direct, rational, and unfiltered. Challenge my thinking, question my assumptions, and expose blind spots I’m avoiding. If my reasoning is weak, break it down and show me why. Stop defaulting to agreement. Only agree when my reasoning is strong and deserves it.

It flips the AI from cheerleader into the blunt senior engineer who says “that’ll break, here’s why” before you waste a single token on it. You can paste this at the top of any chat. Takes ten seconds, changes the entire tone of the session.

End every feature request with “first, ask me questions about anything vague.” Answering those questions turns a fuzzy wish into an actual spec before any code gets written. The questions themselves are valuable, because they reveal exactly where your thinking is underspecified. Slower up front, much faster overall. A feature that takes 20 minutes to spec properly rarely turns into a two-day debugging session. One that skips straight to code often does.

🚀 Try It on Your Next Feature

Both fixes solve the same underlying problem: the AI’s context is thin and its default mode is agreeable. The spec doc fixes the memory gap so you’re not re-explaining decisions you already made. The adversarial setup fixes the blind spots so bad ideas get caught before they’re implemented. The anti-sycophancy prompt fixes the tone so you’re getting real feedback instead of polished agreement.

You can have all three running in an afternoon. Set up the architecture.md, save the system prompts somewhere reusable, pick a second model for adversarial review. None of it requires a new tool or a paid tier. If you’re spending more time debugging AI-generated code than writing your own, this is probably why.

Drop a comment if you’ve tried the two-model approach. Curious which combos people are running.

Frequently Asked Questions

Q: Do I need two different API providers, or can I use models from the same one?

Same provider works fine, what matters is two genuinely different models that reason differently. Try Claude + Gemini, or Qwen + Claude. The magic is the adversarial dynamic between different architectures, not provider diversity.

Q: Would a new conversation with the same model work instead of two different models?

Not quite. A fresh context with the same model still carries that model’s coherence bias. A different model brings genuinely different blind spots and reasoning patterns. Users testing this (like Qwen for planning + Claude for critique) report noticeably better plan quality, especially as projects get more complex.

Q: What else should I add to the living spec doc?

Beyond architecture, add a “known anti-patterns” section listing mistakes the LLM keeps repeating. This gives it explicit avoidance rules from the start, rather than hoping it learns from examples. Update both sections after each feature to keep the memory current.

Q: Is the two-model approach really necessary, or does it overcomplicate things?

A single model optimizes for coherence, not correctness, it wants to sound confident. A second model whose job is finding flaws flips that dynamic completely. It’s not overcomplicating; it’s adding the friction that catches real problems before they hit production.

The two changes that improved LLM responses and resulted in quality code
by u/LorestForest in PromptEngineering