New Gemini 3 Pro benchmarks are wild

Google’s new Gemini 3 Pro Preview might just be the new king of logic and math puzzles!

Seriously, I was scrolling through my feed this morning and had one of those stop-everything moments. I just stumbled upon some fantastic analysis from an AI professional who ran a full suite of benchmarks on Google’s newly released Gemini 3 Pro Preview. This contributor didn’t waste any time putting the model through its paces to see where it shines and where it stumbles, and the results are pretty revealing.

It’s clear from the expert’s findings that Gemini 3 Pro isn’t just an incremental update; it’s a major leap forward in a few key areas. The model is showing top-tier performance in logical and mathematical reasoning, often tying for first place with other heavy-hitters like Grok 4. But the real story is that it often does so with a significant speed advantage. Still, it’s not a clean sweep, which makes the analysis even more interesting.

Here’s a breakdown of what the one who posted it discovered:

🧠 Logic & Reasoning: A Near-Perfect Performance

The author tested the model on complex logical puzzles in two different languages to really push its reasoning skills. The results were incredible. In English, Gemini 3 Pro scored a perfect 100%, tying with the previous Gemini-2.5-pro model. This demonstrates an amazing consistency and mastery of complex logical chains in English. But what really caught my eye was its performance on the Polish logical puzzles. It scored an impressive 92%, tying for first place with Grok 4. This is a huge deal because it shows the model’s powerful reasoning abilities aren’t just limited to English. Building models that can reason effectively across multiple languages is a massive challenge, and this performance suggests Gemini 3 Pro is making serious headway.

Math Champion (with a Speed Boost)

Next up was the AIME25 Mathematical Reasoning benchmark, a tough test designed to challenge a model’s quantitative skills. Once again, Gemini 3 Pro landed at the top, tying for first place with Grok 4. But here’s the kicker the creator highlighted: the latency for Gemini was significantly lower. In plain English, it was much faster. This is a massive advantage in the real world. For anyone building applications that need quick, accurate answers: think financial modeling tools, engineering assistants, or even advanced educational software, speed is everything. Getting a correct answer is great, but getting it instantly is what makes a tool truly useful and practical. This lower latency could make Gemini 3 Pro the go-to choice for developers working on time-sensitive, math-intensive applications.

🌎 A Surprising Linguistic Stumble

This is where the story gets nuanced. The model wasn’t a universal champion. The innovator also ran a benchmark on “Semantic and Emotional Exceptions in Brazilian Portuguese,” a test designed to evaluate a model’s grasp of subtle linguistic and cultural nuances. Here, Gemini 3 Pro placed only sixth. It lagged behind several other models, including `glm-4.6`, `deepseek-chat`, and `qwen2-72b-instruct`. This is a fascinating result that reminds us that even the most powerful models have blind spots.

It highlights the “spiky” nature of AI performance, where a model can be state-of-the-art in one domain and merely average in another.

It’s a powerful lesson that the “best” model always depends on the specific task. This also shows that specialized or regionally-focused models can sometimes outperform the global giants on their home turf, likely due to more focused training data.

The full breakdown from this savvy professional is a fantastic read with links to all the benchmarks used in the testing. Check out the original post for all the details!

Running Benchmarks on new Gemini 3 Pro Preview
byu/Substantial_Sail_668 in

Scroll to Top