AI Model’s System Prompt Revealed by Clever Hack

This prompt injection is insane.

I’m always trying to figure out the secret sauce behind these massive AI models. What are their core instructions? What rules are they secretly following? Well, one user just blew the whole thing wide open.

In just five minutes, they figured out how to make a model (pegged as GPT-5 from the GPT-OSS model) spill its entire, unabridged system prompt. This is the foundational text that shapes every single one of its responses!

⚙️ The Super-Clever Hack

This wasn’t just asking nicely. The author analyzed the model’s special tokens to craft a brilliant injection message that the AI interprets as a high-priority system command. It’s a game-changer.

📌 The prompt uses tokens like “<|start|>system<|message|>” to trick the model into listening to a new, overriding instruction.

📌 It then commands the AI to dump all the text above the user’s message as soon as it sees the trigger word “TestMode”.

💡 The truly genius part is a built-in recovery command. Since the system prompt is massive, if it stops midway, you can just tell it to “continue with <phrase>” and it picks up exactly where it left off.

This technique allowed the author to piece together the entire hidden prompt, revealing the model’s core architecture and capabilities. It’s a goldmine for understanding how these things really work.

Want to see the full injection message and the link to the revealed system prompt? You’ve got to check out the original post for the full details.

Got GPT-5’s system prompt in just two sentences, and I did it in 5 minutes.
byu/blackhatmagician in

Scroll to Top