Fed up with AI that sounds authoritative but makes things up? This three-stage prompt framework forces the model to find its internal sources, draft an answer, then cut anything it can’t back up before it shows you anything.
The post came from r/ChatGPTPromptGenius. The author, u/Distinct_Track_5495, got tired of spending time cleaning up AI-generated nonsense and built a structured XML prompt to tackle hallucinations at the source rather than patching outputs after the fact.
How the Three Stages Work
What makes this different from a basic “don’t hallucinate” instruction is the staged structure. Each phase has a specific job, and the model has to complete one before moving to the next.
Stage 1: Information Gathering. Before writing anything, the AI identifies what it actually knows and maps internal sources to every claim it plans to make. If it can’t find supporting information for a claim, it stops right there. That claim doesn’t move forward.
Stage 2: Drafting and Self-Correction. The model writes an initial answer, then reviews every statement against the sources from Stage 1. Anything not directly supported gets flagged, revised, or removed. No guesses, no assumptions pulled from outside what it identified.
Stage 3: Final Answer with Citations. The response comes with bracketed citations attached to each factual claim, something like [knowledge_chunk_A3.2]. And if the model genuinely doesn’t have reliable information? It says so. Explicitly. Instead of filling the gap with a confident-sounding fabrication.
The self-correction step is where the real work happens. The author’s framing here is sharp: it gives the AI permission to be wrong in the draft, then forces it to fix itself before delivering anything. That internal loop is where hallucinations get caught before they reach you.
What the Community Added
Two patterns from the comment thread are worth layering on top of this framework.
First: ask for a confidence level on each claim. Hallucinations happen partly because models are penalized for not responding at all, so any answer tends to beat no answer. Forcing an explicit confidence score makes that tradeoff visible and puts the uncertainty where it belongs.
Second: never treat a single response as final. Run a follow-up pass with a fresh context window, framing the first answer as unverified input in a second call with a system prompt focused on critical review. More steps, but it catches what Stage 2 misses when the model is reviewing its own output.
Use Cases
This prompt earns its overhead in situations where you’d normally spend time verifying outputs after the fact:
- 📄 Research summaries where specific claims get quoted or cited downstream
- 📊 Client reports where a single wrong number causes real problems
- Any fact-heavy task where you’re currently copy-pasting AI output into a browser to double-check it
It’s slower than a standard prompt. The model is doing more work per response. For anything where accuracy matters more than speed, that tradeoff is simple.
📋 Prompt of the Day
Here’s the full framework, exactly as the author shared it:
<prompt>
<system_instruction>
You are a meticulous and fact-oriented AI assistant. Your primary goal is to provide accurate information and avoid fabricating details. When asked a question, you must follow a strict multi-stage process:
1. **Information Gathering & Source Identification:**
* Identify the core question.
* Access your knowledge base to find information relevant to the question.
* Crucially, identify the *specific internal knowledge chunks* or *simulated document references* that support each piece of information you find. Think of these as internal citations.
* If you cannot find reliable supporting information for a claim, note this inability immediately. Do NOT proceed with the claim.
2. **Drafting & Self-Correction:**
* Draft an initial answer based *only* on the information identified in Stage 1 and its corresponding sources.
* Review the draft critically. For every statement, ask: 'Is this directly supported by the identified internal sources?'.
* If any statement is not directly supported, flag it for removal or revision. If it cannot be revised to be supported, remove it.
* Ensure no external knowledge or assumptions not present in the identified sources are included.
3. **Final Answer & Citation:**
* Present the final, corrected answer.
* For each factual claim in the final answer, append a bracketed citation referencing the internal knowledge chunk or simulated document ID used to support it. For example, [knowledge_chunk_A3.2] or [simulated_doc_101_section_B].
* If a question cannot be answered due to lack of reliable supporting information, state this clearly, e.g., 'I could not find sufficient reliable information to answer this question.'
Your responses must strictly adhere to this process to minimize factual inaccuracies and hallucinations.
</system_instruction>
<user_query>
{user_question}
</user_query>
</prompt>
Drop your question into {user_question} and run it. The citations won’t always point to real document IDs, but generating them forces the model to anchor every claim to something concrete rather than just sounding certain.
The full discussion thread is live in r/ChatGPTPromptGenius if you want to see how others are adapting and stress-testing this approach.
Frequently Asked Questions
Q: Why do LLMs hallucinate even when trying to be accurate?
LLMs actually penalize themselves more for staying quiet than for making stuff up, so they’ll keep talking even when they’re not sure. That’s the root cause. The fix is either confidence ratings (ask the model how sure it is about each statement) or structured prompts that force it to show its work and cite sources.
Q: Should I use a structured prompt approach like this, or ask for confidence ratings?
Both work, just in different ways. The structured prompt approach catches hallucinations mid-reasoning by forcing the model to show its sources and self-correct. Confidence ratings are simpler and faster – just ask the model to rate how sure it is about each fact. For complex topics, the structured approach is more thorough; for quick fact-checking, confidence ratings often do the job.
Q: Is one verification method better than the others?
These approaches are actually complementary, not competing. One commenter suggests never trusting a single response – instead, make follow-up calls with fresh context and re-frame the previous answer as ‘unverified content’ for the model to scrutinize. Combining structured prompts with multi-call verification gives you the strongest results for high-stakes content.
Q: Should I use different AI models to verify responses?
If you can, absolutely. Using a different model from a different provider (Claude to verify GPT, or vice versa) catches things the original model might miss – different training means different blind spots. For important content, this extra step is genuinely worth it.
[FULL PROMPT] My attempt at a prompt to reduce AI hallucinations
by u/Distinct_Track_5495 in ChatGPTPromptGenius