Deepseek just turned a promo into a permanent strategy. The Chinese AI lab announced on X that its 75 percent discount on Deepseek V4 Pro is now permanent, no longer set to expire on May 31, 2026, according to The Decoder. The move locks in pricing that undercuts Western frontier labs by an order of magnitude and reshapes the economics of building with large models.
The new price floor
Under the permanent discount, Deepseek V4 Pro charges $0.435 per million input tokens (without cache) and $0.87 per million output tokens. Cache hits drop input pricing to a fraction of a cent. The smaller Deepseek V4 Flash goes lower still: $0.14 in, $0.28 out.
Compare that to the frontier:
- GPT-5.5: $5 input / $30 output per million tokens
- GPT-5.5 long context (>272K): $10 input / $45 output
- Opus 4.7: $5 input / $25 output
The Decoder’s math puts Deepseek V4 Pro at roughly 11.5x cheaper than GPT-5.5 on standard input and about 34.5x cheaper on output. Against GPT-5.5’s long-context tier, the gap widens to 23x on input and 51.7x on output.
Both Deepseek models ship with a one-million-token context window and up to 384,000 output tokens. Crucially, the API supports both OpenAI and Anthropic formats, so developers can swap providers with minimal code changes.
Why this matters
This is a price war, and Deepseek is firing first with permanent ammunition. What stands out here is the asymmetry of pressure. OpenAI and Anthropic are both marching toward IPOs and need to justify revenue. Deepseek is just entering its first funding round and doesn’t carry the same monetization burden. That gives it room to bleed margin on purpose.
For practitioners, the calculus shifts hard. Agentic systems chew through tokens at multiples of a normal chatbot session. At GPT-5.5 output pricing, a moderately busy agent stack can rack up bills fast. At Deepseek’s rates, the same workload becomes almost a rounding error.
The catch nobody should ignore
The Decoder makes a sharp point: token price is only half the story. Token consumption per task matters just as much. Think gas prices: a cheap gallon doesn’t help if the engine drinks it.
- Google’s Gemini Flash 3.5 looks cheap on paper but burns more tokens than the older Pro 3.1, sometimes ending up pricier in practice.
- Anthropic’s Opus 4.7 prices below GPT-5.5 per token but uses more tokens than its predecessor.
- GPT-5.5 actually consumes fewer tokens than GPT-5.4, though both new flagships still landed 30 to 90 percent more expensive than the models they replaced.
And Deepseek V4 still trails GPT-5.5 and Opus 4.7 on raw frontier benchmarks. How much it lags depends entirely on the task, and benchmarks won’t settle it. Real production workloads will.
What to expect next
The broader trend is clear: as AI usage scales and ROI stays hard to pin down, companies are getting price-sensitive. Many teams will quietly shift away from “the best model” toward “the cheapest model that’s still good enough.” That category is where Deepseek wants to live.
Expect Western labs to feel the squeeze on commodity workloads first: classification, summarization, bulk extraction, internal agents. Frontier capability still wins for hard reasoning, but the middle of the market just got a lot harder to defend at $30 per million output tokens.
Full breakdown and pricing tables at the original source on The Decoder.