Forget Clever Jailbreaks. RedThread Makes LLM Security Testing Actually Repeatable

Yesterday an open-source CLI called RedThread shipped, and the builder’s take after working on it is worth reading twice.

The tool is built for prompt injection and LLM-agent red-teaming. But here’s the twist: the developer says clever jailbreak wording is basically the wrong thing to optimize for.

What actually matters is the fixture.

A lot of prompt injection tests live and die in a single session. You paste something sketchy, watch what happens, then move on with no record of what went in, what the agent was allowed to do, or how to reproduce the failure next week when someone asks if you actually tested this.

RedThread is built around fixing that gap. Each run captures a proper, replayable fixture: what untrusted text entered the system, what actions the agent was permitted to take, what changed, and whether the run can be repeated and get the same result.

Prompt strings are cheap. Reproducible failures are the hard part. The current demo has three runs baked in: one success, one partial, one failure. That’s not a limitation. That’s the point.

How to start your first red-team campaign:

  1. 🔧 Clone the repo: github.com/matheusht/redthread
  2. 📋 Define your fixture: what untrusted input, what agent permissions, what expected behavior
  3. ▶️ Run the campaign and get structured results across all three outcome types
  4. 🔁 Replay any failure to confirm it’s real and not a one-off
  5. Iterate on your agent’s defenses, not your prompt creativity

Pro tip: Treat injection fixtures the same way you treat unit tests. If you can’t replay the failure, you can’t prove you fixed it.

Pro tip: The three-run demo in the repo is worth studying before you build your own fixture set. It shows you exactly what a complete run looks like in practice.

This is early-stage but the core thinking is right. LLM security testing has needed this kind of rigor for a while. Go check it out. 🚀

Prompt injection tests need fixtures more than clever prompts
by u/Apprehensive-Zone148 in PromptEngineering

Scroll to Top