xAI Unveils Grok 4: A Monster AI Model

Trying to keep up with AI news feels like drinking from a firehose, right? Well, get ready for a tidal wave. I just finished watching the livestream from the xAI team, and they officially unveiled Grok 4. It’s not just an update; it’s a whole new level of intelligence.

This AI professional explained that the leap from Grok 2 to 4 involved a staggering 100x increase in training compute. But here’s the kicker: the big secret sauce for Grok 4 was a massive focus on post-training with reinforcement learning, using what they call “verifiable rewards.”

🧠 Superhuman Reasoning on Display

To prove its smarts, the team at xAI threw Grok 4 at the notoriously difficult “Humanities Last Exam” (HLE), a benchmark so hard that even top AI models only score in the single digits. These are PhD-level problems that most human experts can’t solve.

Here’s how Grok 4 did:

  • Without tools: It solved over a quarter of the problems (~27%). Impressive.
  • With tools: It got even better.
  • Grok 4 Heavy: This is where it gets wild. The mind behind it explained this is a multi-agent version where several instances of Grok tackle a problem together, compare notes, and find the best solution. This version solved over 50% of the HLE problems. Insane.

⚙️ It’s Not Just Book-Smart

The team didn’t just stop at benchmarks. They showed off some real-world applications that blew me away.

  • 📌 Advanced Tool Use: The creator demonstrated Grok 4 visualizing two black holes colliding by searching the web and reading academic texts. It even predicted World Series odds by analyzing betting markets to find its own “alpha.”
  • 📌 The Vending Machine Challenge: They had folks from Endor Labs test Grok 4 on a simulation where it had to run a vending machine business. The result? It generated double the net worth of the next-best model because it could stick to a long-term strategy.
  • 📌 Real-Time X Integration: One of Grok’s biggest advantages is its native access to X (Twitter). The person who shared it asked it to find the “xAI employee that has the weirdest profile photo,” and it correctly identified Greg Yang. It understands concepts like “weird”!

✨ New Features & API Access

xAI is making this power accessible right away.

  • ✅ API Available Now: Developers can start building with Grok 4 immediately. It has a 256k context window and real-time data search.
  • ✅ Improved Voice Mode: They showcased new, incredibly natural-sounding voices with much lower latency, making conversations feel snappy and smooth.
  • ✅ Dominant Benchmarks: On top of the HLE, Grok 4 scored a perfect 100% on the AIME math exam and solidly beat competitors like Claude 4 Opus and Gemini 2.5 Pro on several other key reasoning benchmarks.

🚀 What’s Next?

This innovator was clear that this is just the beginning. They openly admitted that image understanding is a current weakness, but their next foundation model (version 7) is already training to fix that. They even dropped a roadmap:

  • August: New Coding Model
  • September: Multimodal Agent
  • October: Video Generation

I think the ability for the “Heavy” version to use multiple agents is a true game-changer for complex problem-solving. This is one of the most exciting AI releases I’ve seen all year.

For the full deep-dive and to see all the demos for yourself, make sure to watch the original video from the creator!

Scroll to Top