A new build landed on r/PromptEngineering yesterday, and it rethinks LLM memory from the ground up. Step 2 is where things get interesting.
The project is called Echo Protocol V7. The author built a single prompt file that gives any LLM structured persistent memory: relationship tracking, temporal logic, and cross-session continuity. No vector database. No RAG pipeline. No backend service of any kind. You paste it in and run.
This creator is a solo developer, and the goal is explicitly community testing and replication across models. A research paper documenting the architecture and preliminary results ships with the repo. This is early, but the core idea is worth understanding now.
What’s new
Most LLM memory solutions treat the problem as an infrastructure challenge. You spin up a vector database, build a retrieval layer, write backend logic to manage what gets stored and when. That’s a legitimate stack and it works. But it has a floor: you need resources, you need setup, and you need to pick a provider. You’re also taking on maintenance burden, provider lock-in, and latency from the retrieval step.
Echo Protocol skips all of it. This is a prompt-only architecture. One file. No external dependencies. Free-tier compatible with Claude, ChatGPT, and DeepSeek. The original poster tested all three.
The core shift is how the problem is framed. Infrastructure-based memory says: store context outside the model and pull it back on demand. Echo Protocol says: encode the memory system inside the prompt, and let the model maintain its own state. That reframe is what makes the entire approach possible without a single external service.
The twist 🔁
Here’s the insight that makes this work: LLMs function simultaneously as language models and state machines. Echo Protocol exploits both properties at once.
The prompt encodes a complete state management system. After each response, the model appends a compressed block called the Tracker. This block captures session state: relationships, temporal references, flags, context. It’s structured, compact, and attached to every single response automatically. You don’t have to prompt for it. The system handles the bookkeeping inside the model’s own output.
To resume a session, you paste the last Tracker into a new chat. The model reads it, reconstructs the state, and continues exactly where you left off. No backend sync. No database write. No API call. Your save file is the last message in the chat.
That’s a fundamentally different category of solution.
How to try it 🔧
- Find the GitHub link in the original r/PromptEngineering thread
- Open the single prompt file in the repo
- Copy and paste it into any LLM (Claude, ChatGPT, or DeepSeek all work)
- Start your session normally
- Watch the Tracker block appear at the end of each response 📋
- Before closing the chat, copy the final Tracker
- In your next session, paste it at the start to restore full context
That’s the complete workflow. Seven steps, zero infrastructure. The whole thing fits inside a standard chat interface, which means you can test it right now with tools you already have access to.
Pro tips
The Tracker gets richer as the session grows. A five-minute chat gives you basic state. A long working session builds dense relational context with temporal markers. The longer you run a session, the more useful that final Tracker becomes as a resumption artifact.
If you want to test cross-model compatibility, try an intentional handoff: start in Claude, copy the Tracker, paste it into ChatGPT. See what holds and what breaks. The author is actively looking for community replication data, so your findings have real value beyond your own workflow. Document what degrades across model boundaries and share it in the thread.
Read the architecture paper before running a deep experiment. Understanding the state machine logic helps you predict where edge cases show up and what to watch for when the Tracker gets long. If you skip the paper, you’ll spend time debugging things the documentation already explains.
Why it matters
The infrastructure-free approach changes who can actually use persistent LLM memory. Not just teams with engineering resources and hosting budgets. Solo builders. Students. Researchers on free-tier APIs. Anyone with a chat window.
The capability itself isn’t new. Persistent context has been achievable with backends for a while. But the distribution changes when the barrier to entry is “copy a file and paste it.” That shift matters for the people who have the ideas but not the stack.
Early stage, one developer, community testing is the whole point. But the architecture paper is in the repo, cross-model tests are done, and the logic holds on free tiers. This one is worth tracking. 🚀
Find the full discussion and GitHub link in the original r/PromptEngineering thread.
Echo Protocol V7: Prompt-only persistent state architecture for LLMs — no backend, no vector DB, one file
by u/Bung_nis in PromptEngineering