Scroll through any AI image forum and you’ll spot the pattern: people write one long block of keywords and hope the model does the heavy lifting. Results? Flat images, weird lighting, no depth. Faces that look vaguely plastic. Backgrounds that feel pasted in rather than part of the scene.
A prompt engineer on Reddit just broke down a better approach. It’s not about using better keywords. It’s about order and structure.
Six layered sections. Each one doing a specific job. The result is the kind of cinematic portrait that looks like it was lit by an actual cinematographer, not assembled by a keyword lottery.
The Old Way vs. the Right Way
The typical approach: dump every detail into one sentence, separate with commas, submit. The model gets confused trying to weigh everything equally, and the output shows it. You end up with images where the lighting contradicts the environment, or the lens feel doesn’t match the depth cues, or the subject blends into the background because nothing was anchored first.
The structured approach assigns each element its own section. Subject first. Then lighting. Then environment. Then camera. Then styling. Then negative prompts. Order matters because context builds on itself. When the model processes a structured prompt, earlier sections set expectations that later sections refine. It’s the difference between handing someone a complete brief versus a pile of sticky notes.
Breaking Down the Six Sections
1. Subject and Base Description
Keep this clean and simple. The goal is to anchor the model without overwhelming it. Overloading the subject description early kills consistency across generations. One or two physical descriptors, a clear action or pose cue, and you’re done. Let the other sections carry the atmosphere.
2. 🌅 Lighting (The Most Important Section)
This is where most people underinvest. The breakdown uses three specific elements:
- “golden hour” for natural warmth
- “rim light” to separate subject from background
- “volumetric light rays” for depth and atmosphere
Lighting shapes everything else in the frame. Get this section wrong and no amount of styling fixes it. Think about how a cinematographer approaches a shot: they build the scene around the light source, not the other way around. Your prompt should work the same way. Rim lighting alone can take a flat portrait and give it a sense of dimension that most keyword-dump prompts never achieve.
3. Environment and Atmosphere
Layer the background here, not inside the subject description. “Lush forest background, soft bokeh, glowing particles, depth of field” creates that dreamy layered feel instead of a flat backdrop stapled behind the subject. Think of this section as setting the stage after the subject and light are already placed. You’re adding distance, texture, and mood, not competing with the subject for attention.
4. 📷 Camera and Realism Enhancers
Lens choice matters more than most people realize. 85mm consistently produces a portrait feel because it compresses the background slightly and flatters facial proportions, similar to how photographers choose it for headshots and editorial work. Add “shallow depth of field” and “highly detailed skin texture” here. This section is what pushes the image toward photorealism. Without it, you get something that reads as illustrated rather than photographed.
5. Styling and Details
Keep this section subtle. Too much styling detail confuses the model or pulls it away from realism. “Soft fabric dress, natural pose, natural color grading” is enough to do the job. The mistake most people make is treating this section like a costume department and layering in too many specifics. One or two anchors are all you need. The model fills in the rest consistently once the earlier sections are solid.
6. Negative Prompts
Don’t skip this. A solid negative prompt cleans up the most common generation issues before they appear in the output. Common culprits to exclude: extra limbs, blurry faces, oversaturated colors, artificial skin texture, watermarks. Treating negative prompts as an afterthought is one of the most common reasons otherwise well-structured prompts still produce inconsistent results.
The Full Prompt
Here’s what the complete structured prompt looks like assembled:
ultra realistic adult female, long blonde hair, soft expression, standing in a forest, soft cinematic lighting, golden hour, rim light, volumetric light rays, warm glow, lush forest background, soft bokeh, glowing particles, depth of field, 85mm lens, shallow depth of field, highly detailed skin texture, natural color grading, soft fabric dress, natural pose
Simple when you see it. But the structure behind it is what makes it work. Each section feeds the next. The lighting sets the mood, the environment builds around it, and the camera settings translate everything into something that reads as real.
Why This Approach Holds Up
There’s a real difference between knowing which keywords exist and knowing how to layer them. This breakdown does both. It’s not a list of magic words. It’s a mental model for how image generation models process context. Most prompting advice tells you what to say. This framework tells you when to say it, which turns out to matter just as much.
Start with the subject. Let lighting do the heavy work. Build the environment around it. Push realism with camera settings. Keep styling minimal. Clean up with negative prompts. That’s the system.
Worth Testing 🎯
If you generate AI portraits, try reorganizing your current prompt into these six layers without changing the actual keywords. Don’t add anything new. Just reorder what you already have and run the same generation again. The difference in output quality is often immediate, which tells you something important: the words were never the problem.
One question worth thinking about: when you’re going for photorealism, do you prioritize lighting keywords first, or do camera and lens settings matter more to you? Drop your take in the comments.
The full breakdown is available in r/PromptEngineering.
Frequently Asked Questions
Q: Why is lighting the most important part of a portrait prompt?
Lighting creates mood, dimension, and separation between your subject and background, without it, even detailed prompts feel flat. Specific terms like ‘rim light’ and ‘volumetric light rays’ tell the model exactly which effects to render, which is why it makes such a visible difference in results.
Q: Should I use all the lighting keywords or start with just a few?
Layering works because each keyword creates a different effect: ‘golden hour’ sets warmth, ‘rim light’ adds dimension, ‘volumetric light rays’ adds atmosphere. Start with 2-3 core lighting terms, then test adding more. Too many can actually contradict each other and muddy the result.
Q: Why 85mm specifically? Would other lens choices work?
85mm compresses features in a flattering way without distortion, wider lenses like 50mm can flatten subjects, while longer ones like 135mm over-compress. It’s the sweet spot for that professional portrait look with natural depth of field.
Q: Can I adapt this prompt structure for different subjects or scenes?
Totally. The structure (subject → lighting → environment → camera → styling) is universal. Just swap your subject description, adjust environment keywords to match your setting, and keep most lighting and camera terms the same. The framework matters more than the specific keywords.
How I structured this prompt for soft cinematic lighting + realistic portrait depth (breakdown)
by u/PartGlitteringaway in PromptEngineering