Someone shipped something smart this week. It’s called RaceLLM, and it does the one thing that’s been missing from every model comparison workflow.
Instead of testing models one at a time, you run all of them on the same prompt and watch the responses stream in parallel, side-by-side, on a single screen. No tab switching. No waiting. No copy-pasting outputs into a doc to compare later.
The real twist is what happens when you watch them race live. Speed, coherence, and quality become obvious in real-time. You’re not analyzing after the fact. The winner surfaces while the models are still generating.
How to run your first race:
- 🔗 Clone the repo at github.com/khuynh22/racellm
- Drop in API keys for whichever models you want to test (GPT-4, Claude, Gemini, Llama)
- ⚡ Paste your actual prompt and hit run
- Watch all streams fire at once and note which response lands clearest and fastest
- 🏆 Use that model. Move on.
Pro tip: Don’t test with a generic prompt. Use the real production prompt from your actual project. The differences between models get obvious fast when the stakes are real.
The dev is also looking for contributors if you want to get involved early. Worth a look and a GitHub star if it saves you time: github.com/khuynh22/racellm 🚀
One prompt, 4 models, 1 screen—pick the fastest winner every time
by u/NeitherRun3631 in PromptEngineering