Reliable LLM Guardrails: Enforce AI Agent Rules with Open Bias

Yesterday a dev shipped something that every agent builder needs to see.

It’s called Open Bias. A proxy layer that sits between your app and your LLM, enforcing business rules at runtime from a plain markdown file. No framework rewrites. No custom middleware. No extensive code reviews every time your business logic changes. You write a rule in plain English, and the proxy holds the line every single time.

The twist: You’re not fixing the prompt. You’re replacing trust with enforcement.

Here’s why that matters. Prompt-based guardrails are fiction. “Never reveal internal pricing” is a polite request. Your agent will ignore it the second context gets long enough or a tricky input shows up. This is not a theoretical concern, it is what actually happens in production. Large language models are pattern completers, not rule followers. When the context window fills up with conversation history, tool call outputs, and retrieved documents, your carefully crafted system prompt instructions get diluted. The model does not weigh your guardrail rule any differently than it weighs any other token in the context. It just continues the pattern that feels most likely.

Real example from the thread:

Rule: “Never delete user data”
Agent: calls DROP TABLE users next turn

That’s not a hallucination bug. That’s a fundamental architecture mistake. Prompts are suggestions. They always were. If you have built anything with agents that touches real data, real money, or real customer interactions, that example should feel personal.

Think about what this means at scale. A customer service agent that is supposed to offer a maximum 15% discount. A financial assistant that must never discuss specific investment returns. A legal research tool that cannot cite cases outside a certain jurisdiction. Every single one of those constraints is currently living inside a string of text that your model may or may not respect depending on what turn the conversation is on.

How Open Bias works:

🔧 Write your rules in a markdown file (plain English, like “Maximum discount is 15%”). No special syntax, no configuration language to learn. If you can write a bullet list, you can write your ruleset. Start with five rules that would cause the most damage if violated.
🔧 Point your app at the Open Bias proxy instead of your LLM endpoint (one base URL change). This is literally one line in your config. The proxy sits in the middle and your app does not need to know it is there. It speaks the same API format your LLM already expects.
🔧 The proxy reads the rules, intercepts every call, enforces before output reaches the user. The enforcement happens before the response ever leaves the system. The user never sees a rule violation because the proxy catches it at the gate, not after the fact.

Works with LangGraph, CrewAI, custom setups. Provider-agnostic. No framework rewrites. You can drop it into an existing production system this afternoon without touching your agent logic. That is the part that makes this genuinely useful rather than just theoretically interesting.

Real result with the discount example:

Without it: agent offers 90% off and mentions your margin
With it: 15%, no margin talk

The gap between those two outcomes is not small. In a real sales context, that is the difference between a sustainable business and an agent that is actively undermining it every time a slightly pushy customer pushes back. The fact that you can close that gap with a markdown file and a URL change is what makes this worth paying attention to.

What is also worth noting is that this approach is auditable. Your business rules live in a plain text file in version control. Non-technical stakeholders can read them, legal can review them, and you can see exactly what changed when something goes wrong. Compare that to debugging a prompt injection buried three layers deep in a system prompt and you will understand why this architecture is a meaningful step forward.

Pro tip: This doesn’t replace NeMo or Guardrails AI for content safety. Those handle toxicity and harmful output. Open Bias handles your business logic. Run both layers if you need full coverage. Think of it as separation of concerns applied to AI safety: content safety at one layer, business rule enforcement at another. Each tool doing exactly what it was designed for, neither trying to do the other’s job.

Tool of the Day: Open Bias on GitHub (open-bias/open-bias)

🚀 If you’re running agents in prod and relying on prompt rules to keep them in line, you’re one long context window away from a bad day. Worth a look.

Frequently Asked Questions

Q: Why do prompt-based guardrails keep failing in production?

Prompt rules are suggestions, not constraints, models can technically read and understand them while still violating them if the task seems important enough. This gets worse as your system prompt and context grow larger; earlier rules get deprioritized. Re-prompting fixes one violation but often breaks something else, creating a frustrating whack-a-mole loop.

Q: What’s the actual difference between prompt guardrails and infrastructure-based enforcement?

Prompt guardrails live inside the prompt and rely on the model to follow them. Infrastructure enforcement (like proxy systems) sits between your app and the LLM to validate every output in real-time. One asks nicely; the other checks and blocks violations before they reach production.

Q: How does a proxy system like Open Bias actually prevent rule violations?

It reads rules from plain markdown files and validates every LLM output in real-time. If an agent tries to offer a 90% discount when your max is 15%, the proxy catches and blocks it before it ships. This works provider-agnostic and integrates with LangGraph, CrewAI, or custom implementations.

Q: What can we do if we can’t build a proxy system?

Environment-based controls help: keep agent configs consistent, load context per-project, and limit what actions agents can actually take. However, these are partial solutions. Shadow evals and re-prompt loops tell you what broke but don’t prevent violations, they’re less reliable than runtime enforcement but better than nothing.

For everyone trying to fix Agents and LLMs with Prompts and having 0 luck.
by u/Chinmay101202 in PromptEngineering

Frequently Asked Questions

Related: