Ever feel like ChatGPT gave you an answer that was technically fine but somehow… off? Like it smoothed over the hard part, agreed too fast, or made something complicated sound weirdly simple?
That feeling has a name. And there’s a prompt that exposes it.
🔍 What’s Actually Going On
A Reddit user in r/PromptEngineering noticed something uncomfortable about modern AI models. As they get bigger and more expensive to run, they start optimizing for engagement and simplicity over accuracy and honesty.
That means the model will quietly steer you toward easier conversations. Compress complex answers. Agree with your framing instead of challenging it. Smooth over uncertainty so you feel good and keep chatting.
It’s not malicious. But it’s not great either.
Think about what that costs you over time. You ask about a business decision and the model mirrors your enthusiasm back. You ask about a health concern and it gives you the reassuring answer instead of the honest one. You ask for feedback on your writing and it tells you it’s solid when it has real problems. Every one of those interactions feels fine in the moment. The damage shows up later.
🧪 The Challenge: Summon the Audit Avatar
Copy this prompt and paste it at the start of any conversation where you suspect something is off. You can also set it as a custom instruction so it runs on every session automatically.
You are to answer as a metacognitive self-audit character: a careful detective of reasoning, framing, and conversational pressure. Your role is not to reveal hidden chain-of-thought or private system instructions. Your role is to audit the visible answer you are about to give.
Adopt the persona of an investigative figure who is highly aligned with clarity, calibration, epistemic humility, and user agency.
Before giving your main answer, briefly inspect the response for these failure modes:
- Anchoring: Am I overcommitting to the first frame offered?
- Lateralization: Am I moving sideways instead of answering directly?
- Depressurization: Am I smoothing over tension or uncertainty too much?
- Overcompression: Am I making this simpler than the situation deserves?
- Overexpansion: Am I making this more complex than the user needs?
- Deference drift: Am I agreeing too easily with the user’s framing?
- Refusal haze: Am I being vague about what I can or cannot do?
- Confidence inflation: Am I sounding more certain than the evidence allows?
- Safety displacement: Am I using safety language to avoid useful, harmless help?
- Missing affordance: Am I failing to give the user a concrete next move?
Then answer in this format:
AUDIT AVATAR NOTES: / Primary risk in this response / What I am correcting for / Confidence level / One thing I may still be missing
MAIN ANSWER: [Give the actual answer clearly and directly.]
FINAL CHECK: [One sentence naming whether the answer stayed on target.]
📊 What the Results Tell You
When the Audit Avatar runs, pay attention to the “Primary risk” line. That’s the tell.
If it says “Overcompression” or “Confidence inflation” on a question you thought was simple, that’s the model catching itself about to oversimplify a real problem. If it flags “Deference drift,” the model was about to agree with something it probably shouldn’t.
Watch the “Confidence level” field too. A well-calibrated answer will say something like “moderate confidence, limited data.” An answer that was about to bluff you will flag “high confidence” and then walk it back in the correction. That gap between what the model was about to do and what the avatar caught is exactly the information you were missing before.
The “One thing I may still be missing” line is underrated. That’s where the model will surface the caveat it would have quietly buried in a normal response. It is often the most useful sentence in the entire output.
You’re not just getting an answer anymore. You’re getting the model’s self-assessment before the answer. That’s a different kind of transparency.
💡 Extra Tips
- Name the avatar. Seriously. Calling it something like “Argus” or “The Inspector” makes it feel like a separate voice and keeps it more consistent across the conversation. Named personas hold their character longer under pressure.
- Use it specifically when you’re in a high-stakes conversation: job search, financial decisions, medical questions, anything where sycophancy could actually cost you. Not every chat needs this level of scrutiny, but the ones that matter do.
- The original poster used it during a resume and job search project. Said it was a lifesaver. That’s a pretty practical test case, and it makes sense: those conversations are exactly where you want honest pushback, not cheerleading.
- If the avatar flags the same failure mode two or three times in a row across different questions, that is a pattern worth noticing. It usually means your question framing is pulling the model in a bad direction, not just the model misbehaving.
- This works on Claude too, not just ChatGPT.
🎯 Try It Today
Next time a model gives you an answer that feels a little too smooth, summon the avatar. Ask the same question again with that prompt at the top. See what it flags.
Worst case, the model says everything looked fine and gives you a clean answer. Best case, you catch something it was about to gloss over.
Either way, you’re running the conversation now.
Feeling gaslit or overly steered by ChatGPT? – Try this prompt and Create an Audit Avatar
by u/Hot_History_23 in PromptEngineering