Better AI Images: The Secret to Shorter, Smarter Prompts

More words in your prompt might actually be destroying your image quality.

I just came across a fascinating experiment by a developer working on an app called TemporaMap. The original poster was struggling with a common issue in AI image generation: inconsistency. They were trying to generate historical scenes on a map, but the results were messy. Their initial approach was to throw more and more information at the model, resulting in massive prompts nearly 30 lines long.

However, after a series of tests, this innovator discovered something counterintuitive. By cutting the prompt length down to just 11 lines, the quality didn’t just stay the same: it skyrocketed. The creator realized that by focusing on the wrong kind of details, they were confusing the model rather than helping it. It turns out that clarity and structure beat volume every single time.

Here is how this expert re-engineered their pipeline to get sharper, more realistic results with fewer words.

💡 The “JSON Sandwich” Workflow

The core of this creator’s strategy involves a clever workflow that I like to call a “JSON Sandwich.” The problem the developer faced was dynamic context. Because users of their app could click anywhere on a map to generate an image, the developer couldn’t write a single, static prompt. They needed the prompt to adapt to the specific location and year selected by the user.

To solve this, the author brought in a heavy hitter: Gemini 3.

Instead of asking the image generator (referred to as “NanoBanana” in the post) to figure everything out, the expert first sends the map context to the Large Language Model (LLM). They ask the LLM to act as a “director” and organize the scene into a structured JSON format. This includes fields for camera settings, lighting, environment, and color grading.

This is a brilliant move because LLMs are excellent at logic and structure. They can take a vague concept like “London, 1850” and break it down into the specific components needed to visualize it. However, the expert found that feeding this raw code directly into the image generator was a mistake.

The image model struggled to parse the complex brackets and nesting of the JSON file. Instructions buried deep in the code tree were often ignored, leading to “rubbery” textures and weird angles.

The fix was simple but effective. The creator added a translation step. After the LLM generates the detailed JSON, a script parses that data and rewrites it into a clean, capitalized, natural-language list. It strips away the code syntax and leaves only the pure instructions. This gives the image generator the best of both worlds: the logic and detail of an LLM, but presented in a simple format that image models can easily understand.

📌 Shift Focus from “What” to “How”

The most significant mindset shift this professional shared was moving away from describing the content and focusing on the container.

In their original 30-line prompts, the author was obsessed with listing every single object in the scene. They were describing the buildings, the people, the items on the ground, and the weather in exhausting detail. This approach often leads to “token overload,” where the model tries to cram too many distinct concepts into one canvas, resulting in a chaotic mess.

The breakthrough came when the creator started prioritizing how the image should be generated rather than just what was in it.

The new, shorter prompts focus heavily on technical constraints. They specify the camera angle, the type of lighting, the color palette, and the overall vibe. By setting these stylistic boundaries, the model is free to fill in the scene details more naturally. It’s the difference between telling a painter, “Draw a red cat, a blue mat, a wooden chair, and a sunny window,” versus saying, “Paint a cozy afternoon scene with warm lighting and a wide-angle perspective.”

The latter gives the model clear artistic direction, which prevents the “rubbery” and artificial look that plagues many AI generations. This proves that you don’t need to micromanage every pixel; you just need to set the stage.

📌 The Translation Layer is Critical

I want to double down on the importance of the translation step the author implemented. This is a technical nuance that is easy to miss but makes all the difference.

When the expert used the LLM to generate JSON, the output looked something like this:

{ “camera”: { “type”: “DSLR”, “lens”: “50mm” }, “lighting”: “sunset” }

While this is readable to a programmer, image generation models are trained primarily on natural language captions and image-text pairs. They aren’t native code interpreters. When you force them to read JSON, you are using up valuable “attention” on curly braces and quotation marks.

By converting that JSON into a clean list like CAMERA: DSLR, 50mm and LIGHTING: Sunset, the author removed the noise. This is what improved the consistency.

The lesson here is about knowing the strengths of your tools. Use LLMs for logic and structure because they are good at reasoning. Use image models for visuals, but speak to them in the language they understand best: simple, descriptive text. Don’t try to force one tool to do the other’s job.

📌 The Aperture Hack for Realism

This might be my favorite tip from the entire post. The author shared a specific technical trick to instantly boost realism and hide hallucinations: forcing a shallow depth of field.

In photography, “depth of field” refers to how much of the image is in sharp focus. A “deep” depth of field means everything from the foreground to the horizon is crisp. A “shallow” depth of field means the subject is sharp, but the background is blurry.

The expert programmed the prompt to always request an aperture between f/1.4 and f/2.8.

Why does this work so well for AI? Two reasons. First, it mimics the look of high-end professional photography. We associate that creamy, blurred background (bokeh) with quality images, so it immediately tricks our brain into seeing the image as “better.”

Second, and more importantly, it acts as a camouflage for the AI’s mistakes. Image models often hallucinate weird details in the background of scenes: twisted faces in a crowd, floating limbs, or nonsensical architecture. By forcing the background to be blurry, you effectively hide those errors. The viewer’s eye focuses on the sharp subject in the front, and the messy background just looks like an artistic choice!

✅ Use This Structure for Your Prompts

Based on the creator’s success, you can apply this structure to your own work immediately. You don’t need an automated pipeline to benefit from this format; you can write your manual prompts this way too.

Instead of a wall of text, break your prompt into these capitalized headers:

CAMERA: (Lens type, angle, aperture)
LOCATION: (General setting)
COMPOSITION: (Framing, perspective)
LIGHTING: (Time of day, light source quality)
ENVIRONMENT: (Weather, atmosphere)
KEY ELEMENTS: (The main subject only)
COLOR: (Palette or grading style)
PERIOD DETAILS: (Specific to the era or genre)

This modular approach keeps you disciplined. It stops you from rambling and ensures you cover the stylistic essentials that actually drive image quality.

The transformation this developer achieved for TemporaMap is a perfect example of working smarter, not harder. If your AI images are looking messy, stop adding words. Start adding structure.

Check out the full post to see the incredible before-and-after comparison photos.

💡 FAQ & Troubleshooting

Why are my structured JSON prompts resulting in “rubbery” or inconsistent images?

While using an LLM to generate detailed JSON context is useful for data organization, feeding raw JSON directly into image generation models (like NanoBanana) often leads to poor adherence. The model tends to ignore instructions buried deep in the JSON tree. To fix this, parse the JSON and rewrite it into a clean, natural-language format using capitalized headers (e.g., CAMERA: …, LIGHTING: …) before sending it to the image generator.

How can I make generated images look more photorealistic and less hallucinated?

A highly effective technique is to explicitly request a shallow Depth of Field. Instruct the model to use an aperture setting between f/1.4 and f/2.8. This not only creates a professional photographic look but also blurs the background, effectively hiding background artifacts and hallucinations that often occur in sharp, deep-focus generations.

Does adding more detail to the prompt guarantee better results?

Not necessarily. Quality is more important than quantity. Reducing a prompt from 30 lines to roughly 11 lines can actually improve performance. Instead of listing every object in the scene, focus the prompt on how the image should be generated—prioritizing camera settings, lighting, vibe, and constraints over exhaustive object lists.

Is it possible to try the tool without a Google account?

Currently, the application requires a Google login immediately upon visiting to access the functionality. There is no public overview or intro page available before the authentication step.

How I used structured prompts to improve the NanoBanana generations for my app
byu/ExpertPlay in