DeepSeek just dropped V4, and according to MIT Tech Review, the new model goes toe-to-toe with the heaviest closed-source hitters on the market. The company’s benchmark results show V4-Pro matching Anthropic’s Claude-Opus-4.6, OpenAI’s GPT-5.4, and Google’s Gemini-3.1 across major evaluations. For an open-source release, that’s a serious statement.
MIT Tech Review reports the model is a huge jump from R1, beating other open-source contenders like Alibaba’s Qwen-3.5 and Z.ai’s GLM-5.1 on coding, math, and STEM tasks. It also ranks among the strongest open-source options for agentic coding, multistep problem solving, writing, and world knowledge.
What’s actually new
Three things stand out from the release:
- Top-tier benchmarks. V4-Pro plays in the same league as the closed frontier models, not just the open-source pack.
- A 1 million token context window. Both versions handle it by default. That’s enough room for all three Lord of the Rings volumes plus The Hobbit, and it puts DeepSeek in line with Gemini and Claude’s long-context tiers.
- Architectural rework on attention. The team rebuilt the attention mechanism, which is the part of the model that compares every token to every other token in a prompt. Long prompts blow up that cost fast, and attention is the usual bottleneck. DeepSeek’s changes are aimed straight at it.
Why developers are paying attention
In a technical report shared with the launch, DeepSeek included an internal survey of 85 experienced developers. More than 90% put V4-Pro in their top picks for coding work. The company also says it tuned V4 specifically for popular agent frameworks, naming Claude Code, OpenClaw, and CodeBuddy.
That last detail matters. Agent frameworks have become the default way developers wire models into real workflows, and a model that’s optimized for those harnesses lands differently than one that just posts strong raw benchmarks.
What stands out here
The gap between open and closed frontier models was supposed to be the moat. V4 says otherwise. If DeepSeek’s numbers hold up under independent testing, teams running sensitive workloads on private infrastructure now have an open option that doesn’t force a trade-off on capability. That changes procurement conversations, especially in regulated sectors where shipping data to a closed API is a non-starter.
The 1 million token context is the other quiet shift. It’s no longer a premium feature reserved for the biggest labs. It’s becoming table stakes, and DeepSeek making it the default across its services raises the floor for what users will expect from any serious model going forward.
What to watch next
Expect three follow-on moves. First, independent benchmark runs will either confirm or pressure-test DeepSeek’s claims over the coming weeks. Second, hosting providers and inference platforms will rush to support V4-Pro, since open weights mean anyone can serve it. Third, the closed labs will respond, either with pricing changes, new releases, or fresh capability pushes that try to widen the gap again.
For practitioners, the practical question is simple: should V4-Pro be in your evaluation pipeline this week? Based on what MIT Tech Review describes, the answer leans yes, especially for coding-heavy and long-context use cases. Full technical details and benchmark breakdowns are available at the original source.