Manage Soaring AI Costs: Optimize Token Spending & ROI

The party where AI tokens were free-flowing is over, and the invoices are landing hard. According to TechCrunch AI, companies across the industry are now balking at what their AI habit actually costs. Uber burned through its entire 2026 AI coding budget by April. Microsoft yanked its developers’ Claude Code licenses just months after handing them out. One company reportedly racked up a $500 million Claude bill after forgetting to set usage limits for employees.

What stands out here is the speed of the mood swing. Eighteen months ago, the question was “Is this good enough?” Now it’s “Where is all our money going?”

What changed

Per-token prices have actually fallen. So why are bills exploding? Two words: agentic tools. The November wave of models, Anthropic’s Claude Opus 4.5, OpenAI’s GPT-5.1, and Google’s Gemini 3 Pro, made AI agents dramatically more capable, and capable agents chew through tokens at a staggering rate.

The numbers TechCrunch AI cites tell the story:

Per-developer token consumption rose about 18.6x in nine months, per engineering platform Jellyfish.
A Priceline employee said a routine Cursor renewal came back 4-5x more expensive.
J.R. Storment of the FinOps Foundation says he started hearing from companies that were “3x over our entire 2026 token budget and it’s only April.”

Chris Reed, Priceline’s senior director of IT finance, put it bluntly to TechCrunch: “It’s like the crack-cocaine epidemic. They let you try it to get you hooked on it, and now you’re kind of beholden to it.”

The productivity question nobody can answer

Here’s where it gets uncomfortable. More spending doesn’t cleanly equal more value. Jellyfish found that heavy token users were roughly twice as productive, but they spent 10x the tokens to get there. A Faros AI study of 20,000 developers found output rising alongside bugs and rewrites.

One CTO told Faros CEO Vitaly Gordon that an engineer spent $40,000 on tokens in a month, adding: “I genuinely don’t know whether I should stop him or should I go and tell everyone else to be like him.”

That confusion is the whole problem. As Jellyfish’s Nicholas Arcolano told TechCrunch, whether extreme spend pays off depends on the business value of shipped code, “which most companies still can’t measure.”

A market rushes in

Where there’s pain, there’s a market. TechCrunch AI reports the Linux Foundation just unveiled the Tokenomics Foundation, aiming to bring the cost discipline that FinOps brought to cloud spend. The scale is brutal: Storment notes cloud cost tracking is a hundreds-of-millions-of-rows-a-month problem, while token tracking is a trillions-of-rows-a-month problem.

The players lining up:

Pure-plays like Pay-i and Paid, built to track, measure, and optimize GenAI costs.
Dev monitoring from Jellyfish, Waydev, and Faros AI, focused on proving ROI.
Incumbents like Ramp, Datadog, and New Relic bolting on token observability and GPU monitoring. AWS is expected to add AI financial management features at FinOps X next week.
The harness layer, where NEA partner Tiffany Luck expects efficiency to live. Factory just launched a model router that auto-picks the right model per task.

What this means for you

This is the cloud-cost reckoning all over again, just faster and bigger. If you run AI tooling at any scale, the move now is to get ahead of it:

Set hard usage limits before you need them. That $500M bill happened because nobody did.
Audit vendor reporting against your own data. Reed is already finding discrepancies at Priceline. New billing systems are, in his words, “ripe for billing errors.”
Tie tokens to outcomes, not headcount. Measure value per shipped feature, not tokens per developer.
Add a model router. Stop sending every trivial query to a frontier model.

The next 12 to 18 months will reward the teams that treat tokens like a metered utility, not an open bar. Expect frontier labs themselves to start pushing OpenRouter-style optimization that routes queries to cheaper models automatically. The companies that build cost discipline now will keep moving fast. The ones still tokenmaxxing will spend the year explaining their budgets to a very unhappy CFO.

Full reporting is available at the original source, TechCrunch AI.

Read original article

What changed

The productivity question nobody can answer

A market rushes in

What this means for you

Related: