Ornn Rolls Out Token Benchmarks for Top AI Labs

A new player just stepped into one of the messiest corners of the AI economy: figuring out what you’re actually paying for. Ornn, a startup backed by Index, has launched a set of token benchmarks measuring the models from Anthropic and OpenAI, according to The Information. The Information reports the launch puts Ornn squarely in the business of comparing how the two biggest names in frontier AI stack up on the unit that drives every bill in this industry: the token.

If you’ve never thought hard about tokens, here’s why this matters. Tokens are the chunks of text models read and write, and they’re how labs price their APIs. Every prompt, every response, every agent loop burns tokens. For companies running AI at scale, token costs are the single biggest line item, and they’re notoriously hard to compare across providers. That’s the gap Ornn is aiming at.

What Ornn launched

Based on The Information’s report, here’s the core of it:

  • Token benchmarks for Anthropic and OpenAI. Ornn is measuring and comparing the token-level performance of models from the two leading labs.
  • Index backing. The startup carries support from Index, a signal that serious venture money sees a real market in benchmarking and cost transparency.
  • A focus on the economics, not just the leaderboard. Plenty of benchmarks rank models on accuracy or reasoning. Ornn’s angle is the token itself, the thing buyers actually pay for.

The Information’s write-up is brief, so the deeper specifics, like exact methodology, which models are covered, and how Ornn scores them, aren’t spelled out in what’s been published so far.

Why this is worth watching

What stands out here is the timing. The AI industry has spent two years obsessing over raw capability: who has the smartest model, who tops the reasoning charts. But the conversation is shifting. Buyers now want to know what performance costs them, and whether a cheaper model can do the same job for a fraction of the spend.

That’s a hard question to answer honestly. Two models can quote similar per-token prices yet behave completely differently in practice. One might be “chattier,” using more tokens to reach the same answer. Another might handle a task in a single clean pass. The sticker price on a provider’s pricing page tells you almost nothing about your real bill until you run your own workload. A neutral benchmark that puts Anthropic and OpenAI side by side on token behavior could give teams a sharper way to make that call.

How it compares to what’s out there

Model benchmarking isn’t new. Leaderboards and eval suites already rank models on everything from math to coding to instruction-following. What Ornn appears to be adding is an economic lens, treating the token as the measuring stick rather than just a quality score. The Information frames Ornn as launching these benchmarks specifically around the two dominant commercial labs, which is where the most enterprise dollars are flowing today.

The practical use cases are easy to picture:

  • Procurement decisions. Teams choosing between Anthropic and OpenAI for a production workload.
  • Cost forecasting. Estimating what a given task will actually cost at scale before committing.
  • Vendor negotiation. Walking into a contract conversation with independent numbers instead of marketing claims.

The caveats

A few things to keep in mind. The source article is short, so treat the finer points as still developing until Ornn or The Information shares more. Independent benchmarks also live or die on methodology, and the value of Ornn’s work will hinge on whether its measurements reflect the kind of workloads real companies run. And covering only Anthropic and OpenAI, while sensible as a starting point, leaves out a crowded field of other providers that buyers also weigh.

Still, the direction is clear. As AI spending climbs and finance teams start asking harder questions, tools that translate model performance into dollars are going to matter more, not less. Ornn is betting that the next phase of the AI race gets measured at the cash register. For the full details, check the original report at The Information.

Scroll to Top