Wayfinder Router Cuts LLM Costs Without Model Calls

A new open-source tool called Wayfinder Router wants to cut your LLM bill by deciding, in microseconds and entirely offline, whether a prompt should go to your small local model or your big cloud one. According to Hacker News, where the launch climbed to 175 points, Wayfinder makes that call without ever phoning another model. It reads the shape and wording of your prompt, hands you a score and a recommendation, and stays out of the way after that.

This is significant because most routers do the opposite. They decide by calling a model, whether a trained classifier, an LLM judge, or a hosted API. That adds latency, cost, and a bit of randomness to the exact step meant to save you money. Wayfinder reads structure instead, so the decision is free and identical every time.

What it actually does

Wayfinder looks at the structure of a prompt and routes accordingly. Cheap requests stay local, hard ones go to the expensive model, so you stop paying frontier prices for “summarize this” and “fix my typo.”

Deterministic scoring. It reads prompt length, headings, lists, and code blocks, then returns a complexity score. Same input, same decision, every single time.
Zero model calls. No API key, no network request, no classifier to decide the route. The decision is sub-millisecond and runs fully offline.
Bring your own models. It forwards each call to any OpenAI-style /chat/completions endpoint. A tier is just a base_url, a model name, and a key read from your environment at request time.
Calibrate on your own traffic. You tune the thresholds against your actual prompts rather than chasing someone else’s benchmark.
Secrets stay out. Wayfinder never stores keys. It reads an env var, or pulls from 1Password, macOS Keychain, Vault, Doppler, and similar at startup.

How it stacks up

The team is refreshingly honest about positioning. Per Hacker News, RouteLLM uses a trained classifier and needs retraining, while NotDiamond, Martian, and OpenRouter’s Auto mode all rely on learned, hosted routing you can’t run offline. LiteLLM proxies providers but doesn’t route by complexity at all. Wayfinder’s pitch isn’t top accuracy. It’s the one router you can run offline, with no model calls, and tune on your own data.

The honest caveats

What stands out here is how openly the project flags its own limits. By default, Wayfinder scores structure only. It can also read lexical cues like proofs, math, and constraints, but those ship off by default. A double-blind test on independently written prompts showed that lexical lift doesn’t generalize. It caught only about 20% of unseen hard prompts and lost to a plain word-count baseline. So those weights are opt-in, and you should only raise them after calibrating to your own vocabulary.

There’s a structural blind spot too. A prompt whose difficulty is purely semantic, like a subtle code snippet or an innocent-looking “what is the 100th prime number?”, has no structural tell. A semantic router will beat Wayfinder there. The FAQ even admits it’s no better than random on RouterBench’s short-but-hard items. The edge that survives the blind test is the one they lead with: a deterministic, offline routing decision with no model call.

Getting started

You can try the routing logic with zero setup. Run `uvx wayfinder-router chat –dry-run` for a terminal chat that needs no install and no keys, or `pip install wayfinder-router` for the full version. Every turn shows where it routed, the score and why, and your running savings versus always-cloud. There’s also a browser UI with a live threshold slider via the gateway package.

To get real replies, `wayfinder-router init` scaffolds a starter config, with presets for a hybrid Ollama-to-Anthropic setup, two OpenAI tiers, or two Gemini tiers. Then `wayfinder-router doctor` confirms your keys resolve.

The bigger trend here is cost discipline. As teams run hybrid local-and-cloud stacks, a free, predictable router that never adds its own model call is a sharp idea, even with the semantic gaps the authors freely admit. You can find the full breakdown, benchmarks, and FAQ at the original source.

Read original article

What it actually does

How it stacks up

The honest caveats

Getting started

Related: