ChatGPT Proves Original Math Theorems Without Human Help

Fields Medalist Timothy Gowers handed ChatGPT 5.5 Pro a stack of open problems in number theory and watched it produce doctoral-level mathematical research in under two hours, with what he calls zero contribution from himself. According to The Decoder, Gowers wrote on his blog that he didn’t even bother engineering clever prompts. The model, he says, did the work.

This is the most concrete public claim yet that a frontier LLM can do real, original research math without a human steering the ideas. And it comes from one of the most respected living mathematicians, holder of the Combinatorics Chair at the College de France and a Fellow at Trinity College Cambridge.

What actually happened

Gowers fed the model open problems from a paper by number theorist Mel Nathanson on the sizes of certain sets of integer sums. The Decoder lays out the timeline:

17 minutes, 5 seconds: ChatGPT 5.5 Pro improved Nathanson’s exponential bound to the best possible quadratic bound by swapping in a more efficient combinatorics tool whose application to this specific problem wasn’t obvious.
2 minutes, 23 seconds: It rewrote the argument as a LaTeX preprint.
31 minutes, 40 seconds total: On a harder generalized version, it pushed an exponential bound down to polynomial across multiple iterations.

Gowers verified the work. The preprint is already public.

Why this is different

The key witness here is MIT student Isaac Rajagopal, whose own paper on the harder problem the model improved on. His verdict, per The Decoder: the first improvement was a “routine modification” of his work, but the polynomial bound was “quite impressive,” and the central idea was “quite ingenious” and “completely original.” Rajagopal said he’d be proud to come up with that idea after a week or two of pondering. ChatGPT found and proved it in under an hour.

That’s the part worth pausing on. Earlier AI math claims, including OpenAI’s GPT-5 attempt at an Erdos problem, turned out to be retrieval dressed up as discovery. The model had found an existing proof in the literature. Here, by contrast, a working researcher who knows the problem space says the new idea wasn’t sitting in a paper somewhere.

Gowers’ takeaway: the bar just moved

He doesn’t oversell it. He calls the result “a perfectly reasonable chapter in a combinatorics PhD,” not an “amazing result,” since it builds on Rajagopal’s work. But the implication is sharp. From his blog, via The Decoder:

“The lower bound for contributing to mathematics will now be to prove something that LLMs can’t prove, rather than simply to prove something that nobody has proved up to now.”

Gowers predicts that anyone starting a doctorate today and finishing in 2029 will see math research “changed out of all recognition.” That tracks with Terence Tao’s vision of “industrial-scale mathematics” run by large AI-augmented teams instead of lone researchers grinding on narrow problems for years. Tao previously called AI models “mediocre but not completely incompetent” research assistants. Gowers thinks that read is already out of date.

What this means for practitioners

A few things to watch:

Originality claims now have witnesses. When the author of the prior art signs off that an LLM’s idea is original, the “it’s just retrieval” critique gets harder to make.
Verification becomes the bottleneck. Gowers spent his time checking, not generating. That’s the new shape of expert work in domains where AI can produce candidate proofs.
The collaboration question is open. Gowers raises a thought experiment: if a mathematician guides an LLM that does all the technical work and has the main ideas, does the human get credit? His honest answer is no.
Skill still compounds. He compares it to coding: “Very good coders are better at vibe coding than not such good coders.” The people who actually wrestled with the math will get more out of these tools, not less.

If this generalizes beyond combinatorics, the timeline for AI’s role in serious research just compressed. Full details and the preprint links are at the original source on The Decoder.

Read original article

What actually happened

Why this is different

Gowers’ takeaway: the bar just moved

What this means for practitioners

Related: