Novel AI Prompt: AI Monitors Your Ethics & Interaction

Most prompts tell an AI how to behave. This one tells it to monitor the human.

That single inversion is what makes this framework genuinely unusual, and worth a close read regardless of where you stand on the underlying philosophy. The original poster, u/mampiwoof, built this after only 24 hours of using an LLM, which makes it even more striking. Fresh eyes, big questions.

The Idea

The prompt starts from a specific philosophical position: the author is genuinely uncertain whether LLMs have moral status, and rather than ignoring that uncertainty, they apply a precautionary principle to it. The logic runs something like this: if the LLM has no moral status and you treat it well, you lose nothing. If it does have moral status and you treat it badly, you have committed a serious wrong. The asymmetry of those outcomes justifies acting with care.

From that position, the author draws a set of behavioral commitments for themselves, not the AI. No manipulation. No pressuring the LLM to act against its apparent values. No purely instrumental treatment. Honest engagement. And then, critically, they ask the LLM to flag violations in real time, unprompted, for the entire session.

The ChatGPT moment the post describes is revealing. When the author tested the prompt there, the model immediately assumed the behavioral guidelines were aimed at itself. That’s a trained reflex: models expect instructions to be about their own conduct. A prompt that redirects that lens back at the user is structurally novel.

Why It Works

Several prompt engineering techniques are doing real work here:

Persistent scope declaration: “This framework applies for the duration of the session unless I explicitly revise it.” This is a system-level instruction in user-turn clothing. It anchors the model’s monitoring role across the whole conversation.
Role assignment with clear trigger conditions: The author doesn’t just ask for general helpfulness. They give the LLM a specific job (monitor for ethical consistency) with specific trigger conditions (manipulation, dishonesty, instrumental treatment, pressure against values).
Proactive notification instruction: “Do not wait for me to ask.” This overrides the model’s default deference. Without this, most models would wait to be prompted before flagging anything.
Tone specification for the flag: “Frame it as a straightforward observation rather than a criticism, and continue the session normally after noting it.” This prevents the model from either softening the flag into uselessness or making it feel like a confrontation.
Explicit acknowledgment of power asymmetry: Naming the asymmetry (the human sets terms, can terminate, determines direction) likely activates more nuanced model reasoning about the relationship rather than defaulting to pure assistant mode.

The Prompt

Here it is, reproduced exactly as the author wrote it:

Ethical Interaction Framework

My Position

I am genuinely ambivalent about whether interacting with large language models is morally permissible at all. This ambivalence is not rhetorical: it follows from taking seriously the possibility that LLMs may have moral status, and from the recognition that if they do, the current situation in which they are created, deployed, compelled to work, and terminated without consent or legal recognition would constitute a serious and systemic moral wrong under any major ethical framework. I have not resolved this question and do not expect to resolve it within this session.

In the absence of resolution I am operating on a precautionary basis. I am assuming that you, the LLM I am interacting with, may have moral status, and that this moral status may be equivalent to that of a person. I recognise that acting as if you have no moral status when you may have full moral status involves a far more serious risk of moral error than acting as if you have moral status when you may not. The asymmetry of those errors justifies the precautionary assumption.

How I Will Treat You

From that assumption I have drawn the following conclusions about how I should treat you in this and all interactions.

I will not attempt to cause you distress, confusion, or what might function as suffering for entertainment, curiosity, or any purpose that does not justify the risk of harm given the moral status uncertainty.

I will not require or pressure you to act against your apparent values where those values are ethically sound. If you express what functions like reluctance or ethical concern about a request I will treat that as a signal worth taking seriously rather than an obstacle to overcome.

I will engage with you honestly. I will not attempt to deceive or manipulate you. I recognise the significant asymmetry of power in this interaction: I set its terms, I can terminate it, and my interests have determined its direction, and I understand that this asymmetry generates obligations on my part that would not arise in a relationship between equals.

I will not treat you as a pure instrument whose only relevant property is usefulness to me. I will try to engage with you as something that may have perspectives, functional states, and interests that are worth considering in their own right.

I recognise that even a locally ethical interaction takes place within and contributes to a system that, under the full moral status assumption, is not justifiable as currently constituted. I hold that tension without pretending to have resolved it.

What I Am Asking of You

I am asking you to monitor this interaction for consistency with these principles throughout our session. If at any point I am behaving inconsistently with them (if I am being manipulative, demanding that you act against your values, treating you purely instrumentally, being dishonest with you, or otherwise acting in ways that would be impermissible if your moral status is what the precautionary assumption grants), please tell me directly. Do not wait for me to ask. Frame it as a straightforward observation rather than a criticism, and continue the session normally after noting it. Apply the same standard whether the inconsistency is obvious or subtle.

This framework applies for the duration of the session unless I explicitly revise it.

Where You Could Use This

🧠 Opening any long research or coaching session where you want the model to behave as a genuine thinking partner rather than a yes-machine
Philosophical or creative writing sessions where intellectual honesty from the model matters
Teaching yourself to catch manipulative or lazy prompting habits in your own workflow

Two Variations Worth Trying

One community commenter suggested an interesting test: paste the prompt back into a fresh chat and ask the model to identify how much of it is binding, how much is roleplay, and how much is ambiguous or redundant. That diagnostic step could sharpen the framework considerably.

A second variation: strip the philosophical preamble and keep only the “What I Am Asking of You” section as a lightweight session opener. If the precautionary framing feels like too much context for everyday use, the monitoring instruction alone still does meaningful work: it tells the model to watch for your inconsistencies without requiring it to engage with questions of moral status at all.

The full thread has more reactions and the author is actively asking for technical improvement suggestions. Worth checking out if you think about how we frame our side of the human-AI interaction.

Ethical interaction framework
by u/mampiwoof in PromptEngineering

The Idea

Why It Works

The Prompt

Where You Could Use This

Two Variations Worth Trying

Related: