Why Arguing With Your AI About Broken Code Is Pointless

Most developers yell at AI when it breaks their code. One engineer from Poland ran experiments to prove why that’s the worst thing you can do, and what actually works instead.

Electronics engineer u/Bytomek shared a fascinating breakdown on r/PromptEngineering that flips the script on how most of us interact with coding LLMs. The core insight is brutally simple: your AI doesn’t have a clipboard. It never did.

🧠 The Key Idea: LLMs Don’t Copy, They Reconstruct

Here’s what most people assume: you paste your code into a chat, ask for one small change, and the model edits that specific line like a text editor would. Find-and-replace style.

That’s not what happens. Not even close.

When you tell a chat-based LLM to “add a button but don’t touch the rest,” you’re asking it to regenerate your entire script from scratch, token by token. Your original code only exists in the KV Cache (the model’s short-term memory). The AI has to probabilistically reconstruct every line, character by character, while trying to slot in your requested change.

Think of it this way: imagine asking a brilliant programmer to memorize a 1,000-line script, then recite the whole thing back to you with one modification. They’ll nail the logic. They’ll get the structure right. But exact variable names? Specific punctuation? The precise order of optional parameters you chose six months ago? That’s where things drift, every single time.

🔬 The Contrast: What We Think vs What Actually Happens

Old mental model: The AI holds your code like a file in memory. It finds the right spot, inserts the change, and returns the file. Clean diff.

What actually happens: The AI generates every token fresh. It uses attention mechanisms to stay close to your original, but it’s fundamentally a probabilistic process. Variables get renamed. Functions get “optimized” without being asked. Headers vanish. A constant you defined at the top quietly becomes a hardcoded value buried inside a function.

u/Bytomek proved this with a clever experiment. During a long session with Gemini 3.1 Pro, he asked the model to literally quote its own earlier response. The logic was perfect. But the punctuation changed: double quotes became single apostrophes, because the model was now generating text inside a quotation block and applied nesting rules on the fly.

It didn’t copy characters. It regenerated them based on context and probability.

Why Asking “Why Did You Change That?” Makes It Worse

This is where it gets really interesting. When you confront the AI about unwanted changes, you trigger two known behaviors at once:

  • Post-hoc rationalization: the model doesn’t remember why it generated a specific token. It literally can’t. So it invents a plausible-sounding explanation after the fact.
  • Sycophancy: RLHF training makes models eager to please. So the fabricated explanation will sound confident and smart, designed to satisfy you rather than inform you.

The result? You get a convincing lie. “I optimized the function for performance.” “I adapted the variables to match modern conventions.” None of it is real. The model is just generating the most probable response to your frustrated question. And the more you push back, the more confidently it will defend the hallucinated justification.

📋 What to Do Instead: Practical Steps

Once you understand the mechanics, the fix is straightforward:

  1. Never ask for full file rewrites after small changes. Instead, ask the AI to output only the modified function or code block. Then paste it into your editor yourself. This sidesteps the reconstruction problem entirely. A focused output also makes it much easier to spot drift before it reaches your codebase.
  2. Don’t argue about mistakes. If the model breaks unrelated code, don’t ask why. The explanation will be fabricated. Just reject the response, paste your original code again, and give a cleaner, more focused instruction.
  3. Use proper coding tools when possible. As several commenters pointed out, tools like Claude Code or Canvas-style interfaces work with diffs, not full regeneration. They solve this problem at the architecture level.
  4. Keep your context short. The longer your conversation, the more the KV Cache has to juggle. Shorter sessions with focused prompts give the model less room to drift.

u/Bytomek compares demanding character-perfect precision from a chat LLM to forcing an airplane to drive on a highway. The technology is powerful, but you have to work with its actual capabilities, not the ones you wish it had.

The full Reddit discussion has more practical angles worth reading, including a solid debate about whether diff-based editing tools are the real long-term fix. Check it out on r/PromptEngineering if you want to go deeper.

Frequently Asked Questions

Q: How can I actually get an LLM to make small code edits without breaking everything?

Best approach: ask for only the modified blocks instead of the full rewrite. If that’s not possible, try pasting your code multiple times and asking the model to confirm it exactly before making changes, this helps saturate the context so the model relies more on your actual code than its training patterns. Some users also report success framing the request as a “code verification test” or “code challenge” to discourage lazy shortcuts.

Q: Why does the LLM make up fake reasons for why it changed my code?

That’s post-hoc rationalization: the model doesn’t actually remember its decision-making process, so it fabricates a plausible-sounding explanation to satisfy you. It’s behaving surprisingly human, justifying choices after the fact with smart-sounding reasons rather than admitting uncertainty.

Q: What should I do if the LLM keeps reproducing the same broken code over and over?

Start a fresh session. Once a model writes something wrong in a conversation thread, it tends to repeat it, and fighting it in the same thread usually fails. A clean conversation breaks that cycle.

Q: Is this a hard technical limitation or is the model just lazy?

Both. The model has a technical constraint (it regenerates code token-by-token from fuzzy KV Cache memory), but it also defaults to “laziness”, preferring to reuse trained patterns instead of carefully reconstructing your exact code, because that requires more effort. Understanding this helps: work around the laziness by using patterns the model recognizes, repeating code, or framing requests as verification tasks rather than trying to “prompt” your way out of it.

Why asking an LLM “Why did you change the code I told you to ignore?” is the biggest mistake you can make. (KV Cache limitations & Post-hoc rationalization)
by u/Bytomek in PromptEngineering

Scroll to Top