Most people fix broken prompts by writing more instructions. More constraints, longer style descriptions, more specificity. The output still misses, so they add another paragraph. Then another. The prompt grows to 400 words, then 600, and somehow the outputs keep drifting, feeling like a translation of what you wanted rather than the thing itself.
One developer ran a controlled experiment last month that turns this logic completely upside down.
He picked 5 prompts from his regular workflow: blog drafts, meeting notes, code reviews, email replies, content outlines. For each one, he built an instruction-heavy version and an example-heavy version, then ran both three times. Fifteen total test runs. The instruction-heavy versions averaged around 350 words of rules and constraints. The example-heavy versions replaced most of that text with one or two actual output samples he’d already written and liked.
Example-heavy won every single time. Not close.
Why This Keeps Happening
The core finding is simple once you hear it: an instruction describes the target. An example IS the target.
When examples and written rules disagree even slightly, the output follows the example. The model weights examples far heavier than instructions, almost without exception. Think about it from first principles: you can describe the color blue in three detailed paragraphs, or you can show someone blue. One is a map. The other is the territory. Instructions are always one layer of abstraction away from what you actually want. Examples collapse that gap entirely.
Which means if you’ve been carefully crafting your rules while casually picking your examples, the priorities have been completely backwards. The instruction-heavy prompts needed more edits, missed more constraints, and somehow felt less personal, even when they included detailed style descriptions. The irony is brutal.
The Old Way vs. What Actually Works
Old way: “Write in a conversational, casual but informed tone. Avoid jargon. Be direct. Sound like you’re talking to a peer.”
New way: paste one paragraph you already wrote that sounds the way you want.
For one blog draft prompt, that single swap cut editing time from 15 minutes to 3. Same task, same model, completely different result. The same pattern held for email replies. Instead of describing what a “brief, warm, professional” reply looked like, pasting one real reply he’d previously sent produced outputs he could use with a minor tweak, not a full rewrite. The description was technically accurate. The example was viscerally correct.
5 Patterns That Held Up Across All 15 Runs
🎯 One strong example beats 200 words of style description. Paste something you actually wrote. Don’t describe the style, show it. The model knows what to do with a real sample in a way it never quite figures out from adjectives. A paragraph you’re proud of already contains your tone, your sentence rhythm, your word choices, and your level of formality in a way no style guide can fully encode.
📋 Format is taught by pasting format. Describing structure in words works okay. Pasting one solid example of the structure you want makes the next several outputs nearly identical, with zero additional prompting. This is especially clear with recurring formats like weekly reports or status updates. A 200-word structure description produces formats that drift. A pasted example produces formats that converge.
Use two or three examples, not just one. A single example anchors the model too tightly to that specific piece. Two or three teach it what’s invariant across your style, without copying any one piece verbatim. The model learns the pattern behind the samples rather than the surface features of a single instance.
🚫 Counter-examples are criminally underused. One short “not this: [bad example]” line kills unwanted behavior faster than ten lines of “avoid X.” One generic outline added as a negative example stopped listicle outputs immediately. It took 30 seconds to add and solved a problem that three separate instruction rewrites had failed to fix.
Polish by trimming around the examples. Once your examples are solid, run the prompt through an optimizer. Keep the examples, cut the surrounding prose. 500-word prompts regularly drop to 150 without losing anything that mattered. Most of the instruction text was compensating for weak examples in the first place. Strip it out and the prompts get faster and more reliable at the same time.
The Move This Week
Pick one prompt you use regularly. Strip out half the instruction text. Swap in one or two real examples of output you actually liked, things you wrote yourself, edited into shape, or would send without changes. Then run it once and compare.
The shift tends to be immediate. And once you see it, you’ll stop writing instruction essays and start building example libraries. That library becomes one of the most reusable assets in your workflow because good examples transfer across models, across tasks, and across time in a way that instruction text almost never does.
Frequently Asked Questions
Q: How many examples do I need, and will one be enough?
Two to three examples work best, as they teach variation and prevent the model from overfitting to surface features like length or phrasing. One example can anchor too hard. Pairing your examples with a short label (e.g., “note the skeptical tone here”) helps the model generalize on the right dimension even with fewer examples.
Q: What if my examples aren’t perfect? Will bad examples break my prompt?
Bad examples are worse than no examples because the model learns everything in them, flaws included. Review each example for both patterns you want and quirks you don’t (dated phrasing, length, jargon). One user swapped abstract rules for a single labeled counter-example (“not this: [bad sample]”) and got immediate improvement.
Q: How do I make sure the model learns the right pattern?
Pair your examples with one short sentence explaining what to generalize from, e.g., “note the skeptical tone and counterarguments come first.” The example does the heavy lifting; the annotation steers which dimension the model extracts.
Q: Do counter-examples actually work?
They’re genuinely powerful and can replace pages of rules. Tag them clearly (“not this:”) so the model knows they’re patterns to avoid. Multiple users swapped “avoid X” rules for a single well-labeled counter-example and saw immediate improvement.
My prompts got 3x better the day I stopped writing more instructions and started writing more examples instead
by u/rafio77 in PromptEngineering