Prompt Engineering Just Got a Wordle-Style Game With Balatro DNA

Someone shipped a browser game yesterday that teaches prompt engineering better than any course I’ve seen.

It’s called Prompt Eval. Daily Wordle-style challenge. Write a prompt that makes an AI respond within specific constraints: exactly 15 words, no repeated words, response starts with a number written out. Three attempts. You see exactly which constraints passed or failed. That feedback loop is the whole product, and it works because the stakes are low enough to experiment but tight enough that sloppy thinking gets punished immediately. No lectures. No rubric. Just constraints and a verdict.

What makes this different from the standard “practice prompting” advice is the precision of the failure feedback. Most prompt practice is vague. You write something, the output isn’t quite right, you tweak it by feel. Here, the game tells you: word count constraint failed, repeated word detected, opening word constraint passed. That specificity changes how you think. You stop asking “why didn’t this work” and start asking “which rule did I break.” That’s a fundamentally more useful mental model for anyone who writes prompts regularly.

That’s the base game. Here’s the twist: before each run you pick modifier cards, Balatro-style. “Model only responds in questions.” “Every sentence starts with the same letter.” “No word longer than 5 characters.” They stack on top of the daily constraints and the combinations break your brain fast. What feels manageable in isolation becomes genuinely difficult when layered. A 15-word constraint is easy. A 15-word constraint where every word is under 5 characters and the response must start with a spelled-out number is a puzzle. The card mechanic takes something educational and makes it compulsive, which is harder to engineer than it sounds.

One card is a gacha pull. Random constraint, can’t remove it. Yesterday someone drew “no word longer than 5 characters” on top of a 12-word constraint. The suffering was real. There’s a reason gacha mechanics are addictive in mobile games: the randomness creates narrative. You’re not just solving a prompt puzzle. You’re solving this specific puzzle with this cursed constraint that you drew and now own. The community angle follows naturally. People share their worst pulls. Someone always has a worse draw than you.

Here’s how to use it as a learning tool:

  1. 🎯 Read the daily constraints before touching the prompt. Map what’s non-negotiable. Constraints that affect word count need to be reconciled first because they control the total budget for everything else. A 12-word limit with a spelled-out-number opener means you’ve already spent 1-3 words before your actual content begins. Count the ceiling before you start building.
  2. 🃏 Pick modifier cards that stack without contradicting each other. Word-count plus questions-only is hard but manageable. Letter constraints plus no-punctuation is chaos. Look for modifiers that constrain format rather than vocabulary. Format constraints are predictable. Vocabulary constraints interact with everything and are much harder to plan around when they’re stacked three levels deep.
  3. Write your first attempt fast and loose. See which constraints fail. Don’t try to solve the whole puzzle on attempt one. Use it as a diagnostic. You’re buying information with that first run, not trying to win. The players who struggle most are the ones who spend ten minutes crafting attempt one and treat the failure personally. The players who improve fastest treat attempt one as a free scout.
  4. 🔍 The failure readout is the lesson. It tells you exactly what broke and why. Screenshot your failures. After a week of daily runs you’ll notice patterns in which constraint types trip you up most. That pattern is your actual gap. Some people consistently blow word-count limits. Others nail the count but miss structural constraints like sentence openers. Knowing which category you default-fail changes how you approach the next run.
  5. 🔁 Iterate across 3 attempts. By the third you’ve internalized word economy in a way no tutorial touches. There is something about doing this under game conditions, with a score and a three-attempt cap, that makes the lesson stick differently than free-form practice. Constraint forces compression. Compression forces clarity. Clarity is the whole point of prompt engineering, and three attempts is exactly enough pressure to feel it.

Pro tip: Skip the mystery modifier your first few runs. The 3.5x score multiplier is tempting but it locks in a random constraint you can’t plan around. Build the feel first, then go for chaos. Once you’ve run five or six daily challenges and you have a sense of how the constraint types interact, the mystery modifier becomes fun. Before that it mostly teaches you what not to do, which is slower learning than you’d get from controlled stacks where you know exactly what broke.

Free, browser-based, new challenge every day. Jump in at prompt-eval.com/en/daily and find out which combo wrecks you first. 🎮

Frequently Asked Questions

Q: What’s the mystery modifier and why is it special?

The mystery modifier is a random constraint you draw before each run and can’t remove. It comes with a 3.5x score multiplier because it’s the only one you can’t plan around, you have to adapt on the fly. This unpredictability is what makes it so effective at teaching real prompt engineering improvisation.

Q: Do the modifiers I pick stack with the daily constraint?

Yes, everything applies simultaneously, both your chosen modifiers and the daily challenge all count at once. The layering is intentional: real prompts rarely have just one requirement, so juggling multiple constraints at the same time trains you in a realistic way.

Q: What’s the hardest modifier + constraint combo players have hit?

Early players report that drawing “no word longer than 5 characters” on top of a 12-word limit is brutal. You have to hit an exact word count while staying within super simple vocabulary, which forces you to think hard about word economy and why specificity actually matters in prompts.

I built a Wordle-style game to teach prompt engineering, with a modifier system like Balatro
by u/noiteestrelada in PromptEngineering

Scroll to Top