Researchers gave an LLM 9 lines of code. It beat Optuna on 96% of benchmarks.

🤖 LLMs can now write better optimization algorithms than the tools we spent years crafting.

Here’s what happened. A research team handed an LLM a 9-line random-search stub, a budget of 2,000 evaluations, and 5 rounds of contrastive feedback. The LLM rewrote itself until it beat Optuna’s TPE algorithm on 53 out of 55 standard benchmarks. That’s 96%.

No hand-tuning. No architecture decisions. Just feedback loops where the model sees what worked, what didn’t, and writes a better version. They called the method ContraPrompt.

🔬 The win rate is impressive. What’s more impressive is what the LLM independently discovered along the way:

  • Corner enumeration: probing the edges of the search space first. Most practitioners skip this entirely and lose coverage they didn’t know they were missing.
  • Differential evolution seeding: borrowed from classical optimization theory, without being told it existed. The model reverse-engineered a known technique from scratch.
  • Multi-phase refinement: structuring the search in stages, exactly the way a seasoned expert would do it by hand.

Nobody taught it any of this. It figured it out from contrastive feedback alone.

💡 If you’re working on hyperparameter tuning, prompt optimization, or any problem that involves searching over a parameter space, read the full writeup at vizops.ai. It’s one of the cleaner benchmark studies you’ll find this year.

What you’re watching here is the early stage of something bigger: AI systems that improve their own tooling through iteration. Not a concept. 53 out of 55 benchmarks.

We let an LLM write its own optimizer — it beat Optuna on 96% of standard benchmarks
by u/se4u in PromptEngineering

Scroll to Top