AI Grading Prompts That Actually Work for Teachers

It’s 10:47pm on a Sunday. A teacher is on essay 19 of 30. She just typed “good evidence here” into a margin and stared at it. She knew it wasn’t enough. Eleven papers still to go, and the next one belongs to a kid who actually reads what she writes. Not skims. Reads. And she’s going to get a comment that took three seconds and sounds like it.

That scene is what sent u/AshxReddit home to build something better. They burned a week testing AI grading prompts, tried six options found online, and watched every single one fail. Then they figured out why, and built a three-prompt chain that’s worth understanding.

📋 Why every AI grading prompt falls flat

The standard advice: “act as an English teacher and give feedback on this essay.” What comes back is a wall of rubric noise. Every trait flagged. Every paragraph picked apart. Thesis weak, transitions choppy, evidence present but underdeveloped, voice inconsistent. All of it, all at once. No teacher would paste that into a student’s paper, and no student would read it if they did. It reads like a machine talking to a rubric, not a person talking to a kid.

The problem is the prompt skips the actual workflow of grading. Real teachers don’t comment on everything. They pick one or two growth edges per student and deliberately under-comment on the rest. This comes straight from Hattie’s feedback research: give a student more than three growth areas at once and they freeze. They don’t know what to fix first, so they fix nothing. The most effective feedback is narrow, specific, and leaves room for the student to feel like they can actually act on it. Generic AI prompts do the opposite every time.

🔧 How the three-prompt chain works

Step 1: Build a student lens. Before the AI reads a single essay, you create a “lens” for that student. It names the primary growth edge and an under-comment list, which is everything to ignore this round. The under-comment list is the load-bearing part. Telling the model what NOT to flag is what makes the rest of the output feel targeted instead of random. For a student whose main issue is argument structure, that means sentence variety, word choice, and transitions all go on the ignore list. The model only sees what matters right now.

Step 2: Diagnose before you comment. The model reads the essay against the lens and produces a diagnosis. Every observation must be grounded in a specific quote from the essay before any margin comment gets written. This step forces the AI to earn its comments. If the diagnosis doesn’t point to line 14, the comment that follows will float. Vague comments are the number one thing students ignore, so anchoring everything to a quote from their own writing makes feedback land differently.

Step 3: Match the teacher’s voice. A separate prompt takes two or three sentences of the teacher’s real past feedback and mirrors their sentence length, contractions, and address style. A teacher who writes “you’re almost there, try pushing this further” sounds nothing like one who writes “the argument requires further development.” Without this step, comments read like a textbook. Students notice immediately, and the moment they notice, they stop trusting what they’re reading.

💡 Tips that clean up the output

Ban AI tells explicitly. Words like “delve,” “tapestry,” and “navigate” are dead giveaways. The prompts build in a banned-word list so the output actually sounds human. It’s a small addition that makes a real difference when a student’s parent reads the comments.
Cap the end note at 180 words. Long end notes get skimmed. Short ones get read. This forces the model to prioritize instead of summarizing everything it just flagged.
Handle edge cases up front. Missing thesis, off-prompt essay, plagiarism-suspect paper: each gets its own override instruction so the model doesn’t produce eight margin comments on a paper that needs a conference first. Without these guards, the AI will dutifully comment on a paper that should have been flagged and handed back blank.
Don’t collapse the steps. Generic prompts try to do lens, diagnosis, and voice all at once. The output is mush every time. Keeping the steps separate is what gives each one room to do its job.

🎯 Steal this workflow

The full prompt chain is in the original post with all three prompt bodies and edge case handling. If you teach, or if you know a teacher, it’s worth passing along. It takes about ten minutes to set up the first time and gets faster once the lens template becomes habit.

The real takeaway isn’t “grade faster with AI.” It’s that this chain borrows what the best human graders already do. Diagnosis before comments. Focus before volume. Voice before output. The AI just runs the same process without the Sunday night exhaustion, the cold coffee, or the growing dread of paper 27 belonging to the kid who actually reads.

That’s worth stealing.

After watching a teacher grade 30 essays in one Sunday, I created three prompts to help her
by u/AshxReddit in ChatGPTPromptGenius

📋 Why every AI grading prompt falls flat

🔧 How the three-prompt chain works

💡 Tips that clean up the output

🎯 Steal this workflow

Related: