Midjourney --sref: Understanding Composition Logic

Treating --sref like a pure style dial makes total sense. It holds the look. It locks the visual identity. You pull in a reference image, the aesthetic carries across, and your outputs stay coherent. That part is true and it works reliably.

But it’s been doing a second job without announcing it.

A developer in r/PromptEngineering ran a focused series of multi-figure scene tests in Midjourney. Style stayed consistent. Black-and-white held. Illustration language held. Everything looked right on the surface, exactly the kind of output you’d show a client as proof the system is working.

But the figure count kept failing.

Three figures collapsed into two. One figure absorbed another. The observer disappeared entirely. Same scene, same roles, same prompt structure. Different results on every run. The frustrating part wasn’t the inconsistency itself. It was that nothing in the visible output explained why. The style was clean. The rendering was sharp. The failure was invisible until you counted heads.

What the Tests Actually Revealed

Here’s the finding: --sref doesn’t just import a visual look. It also imports the composition tendencies that came with that reference image. Figure spacing. Pose logic. Cropping habits. How many subjects fit in frame before the model starts merging them.

Think about what that actually means. If your reference image was originally a two-person portrait, it likely had tight framing, subjects close together, maybe one dominant figure and one secondary. Those aren’t just aesthetic choices. They’re structural signals the model absorbed. When you bring that SREF into a three-figure prompt, you’re not just asking for that style. You’re asking the model to resolve a tension between your explicit scene requirements and the implicit spatial logic the reference encoded.

--sw controls how strongly those tendencies enter your scene. Crank it up and you’re not just amplifying the aesthetic. You’re amplifying everything the SREF carries, including whatever structural biases came with it. A value of 100 on a tight two-figure reference is essentially telling the model that the reference’s composition logic matters as much as your prompt. That’s the conflict. High --sw on a mismatched reference isn’t a style boost. It’s a scene override.

That’s why a polished, beautiful SREF can still wreck multi-character prompts. The style looks perfect. The composition logic is fighting you the whole time.

The Old Assumption vs. the Working Model

Old assumption: consistent style equals a controlled scene.

The model that actually works, with each system treated as separate:

🎨 --sref = visual reference AND latent composition tendencies
⚙️ --sw = strength of those tendencies, not just the look
📝 Prompt = explicit scene structure you control directly
🚫 --no = penalty against known failure states (name what keeps breaking)

Split those four systems and failures become diagnosable. Look wrong? Fix the look layer. Figure count wrong? Fix the scene architecture. Model keeps collapsing the same way? Name that failure state and block it explicitly.

The --no parameter is underused for exactly this. Most people think of it as a way to exclude visual elements like “no text” or “no watermarks.” It works just as well as a behavioral guardrail. If your three-figure scene keeps dropping the third figure, add “no merged figures, no two-person scene” to your --no string. You’re not describing what you want. You’re describing the specific failure you keep getting and telling the model to avoid it. That combination of lowering --sw and explicitly naming failure states in --no gives you two separate levers to pull when scenes break, rather than one ambiguous prompt revision cycle.

How to Stress Test an SREF Before You Build on It

Run this check on any SREF before committing multi-figure work to it:

Prompt a simple 3-figure scene with neutral instructions. No complex poses, no special relationships between subjects. Keep the language as plain as possible so the SREF’s tendencies are the main variable you’re isolating.
Run it 4 to 5 times at your normal --sw value.
Check if all 3 figures appear consistently or if the model starts merging and dropping them.
If figures collapse, lower --sw by 25 to 50 points and run the same test again. This tells you whether the SREF can work at reduced strength or whether its composition logic is too rigid to recover.
If lowering --sw fixes the count but the style thins out too much to be useful, that SREF is not a viable foundation for multi-figure work. Move on early.
Only after it passes the consistency check should you build more complex multi-character work on top of it.

The whole test takes maybe ten minutes. A stylistically beautiful SREF that fails it will cost you hours of debugging later, usually during a production run when you’re under deadline pressure and least able to diagnose the root cause. Filter it out early.

The real lesson here is simple: don’t test SREFs just by how they look. Test them by whether your scenes survive them. Those are two very different questions, and only one of them protects your workflow.

Two parameters everyone thinks are style controls. Turns out they’re also regulating your figure count.
by u/jeffbradshaw in PromptEngineering

What the Tests Actually Revealed

The Old Assumption vs. the Working Model

How to Stress Test an SREF Before You Build on It

Related: