Six months building a single prompt.
TL;DR: One developer spent half a year crafting a 5kb system prompt to stop GPTs from drifting. The community’s reaction teaches you more about how LLMs work than the prompt itself does.
What Got Built
Developer decofan posted a system prompt they’d been building on and off since late 2025. Three versions: a full 5kb build, a 3kb mid-tier, and a lean 1.5kb. The whole thing is open on GitHub. Two live custom GPTs to test it directly.
The core goal: keep the model on track, prevent output drift, force consistent discipline across a long conversation. Classic problems anyone building a custom GPT has hit. The prompt covers output formatting, response length control, how the model should handle ambiguous requests, and what to do when the user goes off-topic. It’s not just a personality layer. It’s closer to a behavioral contract written in plain English and aimed at a model that reads instructions rather than executes them.
Six months of iteration means decofan ran into most of the common failure modes: instructions that get ignored after message 15, formatting that holds for three turns then collapses, tone rules that the model follows loosely until it doesn’t. Each version of the prompt reflects what they learned from those failures.
What the Community Said
The top comment acknowledged the craft. Lots of effort went into solving real problems: preventing drift, enforcing output discipline, keeping the model from going off-script mid-thread.
Then came the honest reality check. Another commenter laid it out cleanly: modern models don’t run these configs like executable code. They don’t parse rules the way a compiler does. Some instructions stick. Others get interpreted loosely. The model reads, not executes.
That’s not a flaw in the prompt. That’s how LLMs actually work.
A third thread in the comments got into specifics: shorter, more concrete instructions tend to outperform long rule lists. The model gives more weight to a clear constraint than to a page of nuanced guidance it has to hold in context while also generating a response. Someone compared it to giving a contractor a one-page spec versus a 40-page document. One gets followed. The other gets skimmed.
Why It Still Matters
A detailed system prompt isn’t wasted effort. It still shapes behavior. It still anchors output. The bigger and more specific it is, the more it pulls the model toward a consistent direction.
But it won’t make your GPT deterministic. It’ll make it more predictable. Those are different things, and confusing them is where most people get frustrated.
The pattern that tends to work: clear, layered instructions with specific output constraints. Not 200 rules about tone. Focused constraints that actually get weighted by the model during generation. Think about what you want the model to do in the first sentence of a response, and write a rule for that. Think about what output format you need, and describe it once, precisely. The more your instruction maps to something the model does at generation time, the more reliably it holds.
The other thing worth noting: long conversations break everything. Even a solid system prompt starts losing grip around message 20 or 30 in a heavy thread. That’s not a prompt problem. That’s a context window and attention problem. If your use case involves long sessions, build in re-anchoring. A short reminder in the prompt about checking the original instructions before responding can stretch consistency noticeably further.
🛠️ Use Cases
- Building a customer-facing assistant that needs consistent voice across sessions, where off-brand responses carry real cost
- Shipping a specialized GPT that stays in its lane (support, sales, onboarding) and doesn’t wander into territory you didn’t design for
- Long conversations where output drift starts killing quality after message 10, especially in research or writing workflows where coherence matters
Prompt of the Day
Add this to any system prompt where you’re fighting drift:
“Before every response, check: am I answering what was actually asked? If you’re about to explain background context that wasn’t requested, skip it and answer directly.”
One focused instruction beats fifty tone rules every time. This one works because it maps to something the model does right before generating output. It’s not a personality rule. It’s a behavioral check that fires at the right moment in the process. That’s the difference between an instruction that sticks and one that gets averaged out over the course of a long prompt.
Worth Checking Out
If you’re building custom GPTs, decofan’s work is worth a look. Not because the prompt is perfect, but because 6 months of iteration shows you what someone learned from breaking LLM behavior repeatedly and patching it. That kind of research is hard to find written up this clearly. Most people either give up after version one or never publish what they learned.
The fact that it sparked a real technical debate in the comments is the point. The best way to understand how system prompts actually behave is to watch experienced builders argue about why something works or doesn’t. You’ll learn faster from that thread than from most tutorials.
The GitHub link and live GPTs are in the original Reddit thread if you want to test it yourself.
Frequently Asked Questions
Q: Does this prompt actually enforce all its constraints reliably?
Not entirely. Modern LLMs don’t treat prompt instructions like executable code , some rules influence style and behavior, but others get ignored, conflict with each other, or are redundant. The longer and more complex your prompt block, the less reliably the constraints stick. Real-world testing with actual use cases is your best way to see what works and what doesn’t.
Q: Isn’t a 5KB prompt this complex hard to maintain and share?
Yes. The dense shorthand makes it powerful if you wrote it, but much harder to debug, extend, or explain to others. If you want to ship this more widely, you might split it into a clean general framework (intent preservation, output validation, formatting discipline) and a specialized module for domain-specific behavior (like the Tolkien language rules).
Q: Do all these constraints actually help, or do they fight each other?
Probably both. When you stack tone rules, formatting rules, banned-token logic, and mode-switching simultaneously, the model may spend energy obeying surface rules instead of focusing on the user’s actual problem. A tight set of 3, 5 core constraints often outperforms a comprehensive ruleset with 20+ items.
Q: What’s the main use case , writing assistant, Tolkien expert, or compliance test?
That’s unclear from the post. The prompt handles multiple jobs (conlangs, writing, compliance testing), but the primary “job to be done” could be sharper. Clarifying what it’s *for* would help both you market it and users decide when to use it.
Heavenly prompt set. Too large to post in full but I made a custom gpt so you can try it. 6 months making it on and off, I finally declare it usable!
by u/decofan in PromptEngineering