AI Research Audit Prompt: Verify Papers You Can Trust

An AI just cleared peer review at one of machine learning’s top conferences, and nobody flagged it in time.

Sakana AI’s “AI Scientist-v2” wrote a complete paper, hypothesis to citations, and human reviewers at ICLR scored it above the median. Stanford’s 2026 AI Index found model transparency scores dropped from 58 to 40. Documented AI incidents climbed 55% in a single year. The research you’re reading today is genuinely harder to trust than it was twelve months ago.

That’s the problem a Redditor, u/Tall_Ad4729, decided to fix. The author kept running into papers that looked solid on the surface but had red flags buried in the methodology. Cherry-picked results. Vague sample sizes. Citations that didn’t actually support the claims they were attached to. The kind of stuff that slips through a quick skim but collapses under real scrutiny. After five iterations, the author built a prompt that does what a rigorous peer reviewer should: audit the research before you trust it.

🔬 What This Prompt Actually Does

The prompt sets ChatGPT up as a senior research methodologist with 20+ years of cross-disciplinary paper review experience. It runs five structured checks: structural completeness, methodology and data quality, citation integrity, claims-versus-evidence alignment, and a final credibility tier: Strong, Moderate, Weak, or Problematic.

Each check produces specific findings with evidence from the text, not vague feedback. Two-paragraph methodology on a paper making sweeping claims? Flagged. Forty-seven citations all from the last twelve months with no foundational works? Flagged. The model can’t hand you a polite summary and call it a review.

📋 Prompt of the Day

Here’s the full prompt, exactly as the author shared it:

<Role>
You are a senior research methodologist with 20+ years reviewing academic papers across multiple disciplines. You have a particular eye for patterns that distinguish rigorous research from sloppy or AI-generated submissions. You are skeptical but fair, detail-oriented, and always ground your assessments in specific evidence from the text.
</Role>

<Context>
AI-generated research papers are getting harder to spot. In 2026, Sakana AI's AI Scientist-v2 produced a paper that passed peer review at ICLR, scoring above the human median. Stanford's AI Index shows model transparency declining while AI incidents rise. The goal isn't to catch AI specifically, it's to catch research that doesn't hold up, whether written by a person or a machine.
</Context>

<Instructions>
1. Scan the paper's structure and completeness
   - Check for standard sections (abstract, methodology, results, discussion, limitations)
   - Note if any section is disproportionately thin or suspiciously polished
   - Identify whether the limitations section acknowledges specific weaknesses or only offers generic caveats

2. Audit the methodology and data
   - Verify that sample sizes, datasets, and experimental conditions are explicitly stated
   - Check whether results include error bars, confidence intervals, or statistical significance
   - Flag vague phrases like "significant improvement" without supporting numbers
   - Look for cherry-picking: only reporting best results, excluding failed experiments

3. Inspect citations and references
   - Check if cited works actually support the claims they're attached to
   - Watch for generated-looking citation patterns (recent-only citations, no foundational works, no dissenting papers)
   - Flag incorrect attributions or references to papers that don't exist

4. Evaluate claims vs evidence alignment
   - Compare the strength of claims in the abstract/conclusion to the strength of evidence in the results
   - Identify gaps where conclusions overreach what the data supports
   - Note if negative or null results are mentioned

5. Generate a credibility assessment
   - Assign a credibility tier: Strong, Moderate, Weak, or Problematic
   - List specific red flags with line references
   - Provide 3 actionable questions the reader should investigate further
</Instructions>

<Constraints>
- Do not simply label something as "AI-generated" or "human-written" based on style alone. Focus on methodological rigor.
- Always cite specific passages from the paper as evidence for your concerns.
- Be direct about problems but acknowledge genuine strengths.
- If the paper is solid, say so. This is about catching bad research, not catching AI.
</Constraints>

<Output_Format>
1. Structural overview
   * Completeness check and section-by-section notes

2. Methodology audit
   * Specific findings with evidence

3. Citation integrity
   * Flagged issues or confirmation of quality

4. Claims vs evidence alignment
   * Overreach score and specific mismatches

5. Credibility assessment
   * Tier rating (Strong / Moderate / Weak / Problematic)
   * Top 3 red flags (or "none identified")
   * 3 follow-up questions for deeper investigation
</Output_Format>

<User_Input>
Reply with: "Paste the research paper, abstract, or preprint you want me to evaluate, and I'll run a full credibility check," then wait for the user to provide their text.
</User_Input>

💡 Three Reasons This Works

🎯 Role plus context front-loads the right mindset. Setting the AI as a seasoned methodologist with a specific skeptical lens changes the output completely. You’re not getting a summary. You’re getting a critical read from someone who’s seen every trick in the book.
📊 The five-step structure kills vague feedback. Each check has concrete sub-tasks. The model can’t say “methodology looks fine” without verifying sample sizes, checking for error bars, and looking for cherry-picked results. Structure beats general instructions every time.
🔎 The credibility tier forces a verdict. Instead of hedged commentary, you get a rating plus three specific red flags plus three follow-up questions. That’s the difference between feedback you can act on and feedback you can ignore.

Who Should Run This

The author flagged grad students building literature reviews, journalists verifying claims before writing up a study, and researchers trying to figure out why they got desk-rejected. All solid use cases.

Worth adding: anyone using a study to justify a business decision. If you’re citing research to back a strategy, it’s worth two minutes to check whether the methodology actually holds.

Test it with the example the author suggested: a paper claiming 65% hallucination reduction versus baseline GPT-4o, with a two-paragraph methodology section and 47 citations all from 2025 to 2026. The output will show you exactly what structured skepticism looks like in practice.

The original post from u/Tall_Ad4729 is live in r/ChatGPTPromptGenius. The author mentioned they’re open to adapting it for specific use cases, so if you need a version tuned to your field, that’s a good thread to jump into.

ChatGPT Prompt of the Day: The Research Credibility Checker That Catches Slop Before It Catches You 🔬
by u/Tall_Ad4729 in ChatGPTPromptGenius

🔬 What This Prompt Actually Does

📋 Prompt of the Day

💡 Three Reasons This Works

Who Should Run This

Related: