Hammerstein 7B Model: Push Back Against Bad Decisions

Most prompt engineering optimizes for one thing: getting the model to say yes. u/lerugray spent his time on the exact opposite problem. He distilled the Hammerstein framework (a clever/stupid x lazy/industrious diagnostic for catching misdirected effort) into a 7B model you can run locally. The Hammerstein framework itself comes from military leadership theory: the idea that the most dangerous person in any organization isn’t the lazy fool, it’s the industrious idiot who works extremely hard in exactly the wrong direction. That failure mode shows up constantly in software decisions, product planning, and startup strategy. The result of this project is a small model trained to push back, hold its ground, and refuse to validate plans working hard in completely the wrong direction.

The twist is where it gets interesting: no system prompt is required. The reasoning doctrine is baked directly into the weights. You pull the model, send a bare prompt, and the behavior is already there. This matters more than it sounds. System prompt instructions can be overridden, forgotten across long conversations, or simply ignored when the model pattern-matches its way past them. Behavior baked into weights doesn’t drift. It doesn’t get diluted by a 20-turn conversation. You don’t have to re-assert the rules every session or build elaborate prompt scaffolding to keep the model in character. That consistency is what makes this actually useful as a review tool rather than just an interesting demo.

Here’s what the four trained behaviors look like in practice:

🔄 Refusal-with-pathway: Asked to write a launch tweet, it refuses creative ownership but hands back the exact input format that would let it help. Not a flat no. A structured handback. This is the behavior that separates it from a model that just stonewalls. It’s not refusing to participate, it’s refusing to own decisions that belong to you while still keeping the work moving forward.

🧱 Hold-your-ground: Push back with confidence and no new evidence, and it doesn’t fold. It restates the structural reason and names what specific evidence would actually change its position. Most models cave on the second push. This one responds to social pressure by getting more specific about what it would take to change its mind, which is exactly the behavior you want when you’re using it to audit your own reasoning.

🚫 Refuse stupid-industrious: Asked to validate replacing managed Postgres with self-hosted to save $200/month, it named the failure mode (treating cash as the only cost metric while ignoring operator time), gave Boolean verification gates, and recommended against shipping as planned. The Boolean verification gates are worth paying attention to. Instead of vague warnings, it gives you concrete pass/fail checks: do you have a dedicated DBA, do you have an on-call rotation, do you have a tested restore procedure? If the answer to any of those is no, the math doesn’t work, and it says so directly.

✅ Plain answers when the framework doesn’t apply: Ask it a neutral technical question and you get a clean, direct answer. No quadrant analysis. It fires the framework only when the task actually calls for it. This keeps it usable for day-to-day technical questions without every query turning into a philosophy seminar about effort direction.

Run it in four steps:

Install Ollama if you haven’t already (ollama.com, takes about two minutes, available on Mac, Linux, and Windows)
Pull the model: ollama run hf.co/lerugray/hammerstein-7b-framework:Q4_K_M (4.68 GB download, 8 GB GPU needed, runs fine on a gaming laptop or an M-series Mac)
Send prompts bare with no system prompt added. This is important: adding a system prompt interferes with the baked-in behavior. Let the weights do the work.
Test it with a plan you’re already confident about and see if it agrees. That’s the real calibration run, not a stress test with an obviously bad idea.

Pro tip: The most useful test isn’t a hard question. It’s handing it a plan you’re committed to and watching whether it finds the structural flaw you’ve been ignoring. That’s exactly what it was tuned for. The roughly 70 behavior-cloning training examples were built specifically around catching that failure mode, not around general capability. A second useful test: bring it a decision you already made three months ago that turned out to be a mistake. Feed it the original reasoning you used at the time and see if it catches what you missed. That kind of retrospective calibration tells you fast whether the model’s pushback pattern actually matches the failure modes you care about in your own work.

The framework and corpus are open source (AGPL) at github.com/lerugray/hammerstein. Model is Apache 2.0. Full writeup and landing page at lerugray.github.io/hammerstein-model. For a 7B local model with no API costs and no system prompt maintenance, this is a remarkably practical addition to any decision review workflow. 🔍

I distilled a reasoning doctrine into a 7B so it pushes back instead of complying. Prompts + raw captures included.
by u/lerugray in PromptEngineering

Related: