Most people write AI prompts the same way they’d write a text message. Everything crammed into one block. Context mixed with data mixed with what you actually want done.
The model tries to parse all of it. Sometimes it gets close. More often, it defaults to the statistical average of its training data, and you get output that sounds like a committee wrote it in a conference room. You ask for a sharp executive summary and get three paragraphs of hedged language that could apply to any company in any industry. You ask for specific recommendations and get a numbered list of obvious suggestions that any MBA grad could have produced without reading a single word of your actual data.
A quant analyst turned AI engineer just posted something worth your attention on r/PromptEngineering. His core argument: when you mix everything into a paragraph prompt, you’re paying an “interpretive tax” the model doesn’t need to charge you.
What plain text prompting actually costs you
When a model reads a block of text, it has to do two things at once: figure out the structure of what you sent, and do the actual task. That first job burns context window on the wrong problem.
XML tags eliminate that first job. Instead of guessing where your data ends and your instructions begin, the model reads architectural boundaries. It knows exactly what each section is. No inference required.
Here’s what that interpretive tax looks like in practice. You write: “Here’s the Q3 report data, I need an executive summary, keep it concise and don’t include the raw numbers, write it for our CFO.” The model now has to decide: where does the data end? Does “keep it concise” modify format or length? What does a CFO-level summary mean in terms of focus? It makes probabilistic guesses on all of those questions simultaneously. Sometimes the guesses align with your intent. Often they don’t, and you spend more time editing the output than it would have taken to just write the thing yourself.
The difference in output quality isn’t subtle.
Plain text vs XML structured
With plain text, the model probabilistically guesses the boundaries between your context, your data, and your ask. It fills in gaps based on statistical likelihood, not your actual intent.
With XML structure, every section has a declared role. The model skips straight to execution. From “expensive autocomplete” to something much closer to deterministic professional output.
Same words, different architecture. Consider a prompt where you’re preparing for a board presentation: your growth is 40% year-over-year but churn is up 8%, and you want the summary to acknowledge both. Flat text usually produces a cheerful growth narrative that glosses over the churn because nothing in the structure told the model the tension was the point. Wrap the same content in proper tags and the model addresses the actual complexity in the data, because the task declaration made it explicit. That’s the gap.
The 5-tag framework
Here’s the structure the author uses for high-stakes executive communication:
- 📌
<context>: Sets the stakes. Who is reading this, what’s the situation. Specificity matters here. “CFO reviewing Q3 before a board meeting” shapes every downstream decision the model makes about tone and emphasis. “Executive” does almost nothing. - 📊
<data>: Isolates raw material from instructions. Spreadsheets, notes, raw inputs go here, separate from everything else. This is the tag that solves the most common failure mode: the model treating your data as instructions, or your instructions as data. - ✅
<task>: The exact specification of what you want done. Not a vague ask. A specification. “Write a 3-paragraph summary that acknowledges the churn increase but anchors on forward momentum” is a task. “Summarize this” is a lottery ticket. - 🚫
<constraints>: Surgically removes failure modes. No hedging. No “as an AI.” No generic summaries. This tag is where you preemptively eliminate the outputs you’ve gotten burned by before. Every frustrating response you’ve received belongs in constraints the next time you run a similar prompt. <output_format>: Fixes the shape of the response before generation starts. Length, structure, tone, all of it declared upfront. “Three paragraphs, no headers, written for someone with 90 seconds to read it” gives the model a concrete target. Without this tag, the model picks a format that felt statistically reasonable based on training data, which is rarely exactly what you needed.
Why Markdown doesn’t do this
Worth being clear on: Markdown is visual formatting, built for human readers. XML creates semantic structure that language models parse as meaningful boundaries. They look similar on screen but do completely different things inside the model’s context window.
Adding a <data> tag isn’t decoration. It tells the model this section has a specific role, isolated from everything else. That’s a different operation than bolding a header or adding an asterisk. When you bold something in Markdown, you’re signaling a human reader to pay attention. When you wrap something in an XML tag, you’re telling the model this content has a declared type and function. That distinction changes how the model allocates attention across the full prompt, which is the entire mechanism behind why this works.
How to test it yourself
Take any prompt you’ve been running as plain text. Wrap the different parts in those five tags. Don’t change a single word of the actual content. Just add the structure.
Run both versions and compare. The structured one will be noticeably more precise. Vagueness doesn’t survive clear boundaries. Specifically, watch for these signals: the XML version will stay closer to your actual data instead of generalizing, the constraints will hold across the full response, and the output length and shape will match what you specified rather than defaulting to whatever length felt natural to the model. Good candidates for this test are prompts where you’ve been rerunning and editing more than twice, or any prompt where you paste in a lot of raw data and ask for something specific about it. Those are the highest-tax situations.
If you’re consistently getting generic AI output, the structure of your prompt is the likely culprit. And it’s the easiest thing to fix.
The full technical breakdown, including a reusable Executive Summary template and a side-by-side comparison, is in the original write-up from u/blobxiaoyao on r/PromptEngineering.
Why your “Paragraph Prompts” are failing: A transition to XML-based Semantic Delineation
by u/blobxiaoyao in PromptEngineering