Picture this: you pay Microsoft every month for Excel, you trust their AI to make your life easier, and then you ask it to actually build a spreadsheet from scratch. Crickets. That’s the exact frustration that pushed one LinkedIn creator down a rabbit hole most of us would love to avoid, and the results are honestly a little wild.
I came across this post from an AI professional who ran a head-to-head test of 11 different AI models on one very specific job: build a board-ready 12-month financial forecast in Excel. Not a toy spreadsheet. A real one, with tabs, formulas, and the kind of polish you could actually send to leadership. What this savvy professional found completely flipped my expectations of who’s leading the AI race when it comes to practical, get-it-done work.
The setup that exposed every AI’s weak spot
The original poster wasn’t testing AIs on trivia or creative writing. The task was concrete and measurable: produce a multi-tab Excel file with working formulas, ready to present. That’s the kind of benchmark that strips away the marketing fluff and shows you what these tools can actually do under pressure.
Here’s the part I love about this experiment. Most comparisons online test AIs on stuff like “write a poem” or “explain quantum physics.” Useful, sure, but not the work most of us actually do all day. A spreadsheet test is brutal because either the file works or it doesn’t. No room to hide.
The contenders, ranked
The expert broke down the results in a way that’s both painful and hilarious if you’ve ever paid for Microsoft 365:
- Copilot inside Excel: the AI built into the very product you’re trying to create, and it couldn’t pull it off. The one tool with literal home-field advantage failed at its own game.
- Grok: confidently announced “done,” but the file the creator was promised simply didn’t exist. Style points for confidence, zero for delivery.
- Gemini: functional, but the post’s author said it wasn’t even in the same league as the top two contenders. Not terrible, just not the answer.
- ChatGPT: genuinely strong output, the second-best of the bunch, but it burned through the creator’s token budget before finishing the full job. Powerful but expensive.
- Claude: the clear winner. Six tabs. Over 700 working formulas. Done in roughly seven minutes.
Tons of people pay Microsoft to make Excel easier. Very few know its AI is the worst at the one job they bought it for.
Side-by-side: Copilot vs Claude
Let’s zoom in on the most jarring comparison from the test, because it’s the one most readers will care about.
Microsoft Copilot: Lives inside Excel. Built by the company that built Excel. Has access to every formula, every function, every template Microsoft has shipped over 30+ years. And yet, when asked to actually build the spreadsheet, it stalled.
Claude: A general-purpose AI with no special Excel integration. Treats the file as an outsider would, generating tabs and formulas from a description alone. Delivered a finished, board-ready model in under ten minutes.
That gap is the whole story. The tool you’d assume was purpose-built for the job got smoked by a tool that wasn’t even designed with Excel in mind.
Why this matters more than it looks
The creator made a sharp observation buried in the post: most people who struggle with these tasks blame themselves. They assume their Excel skills are the problem, or that they’re prompting the AI wrong, or that they need to learn another formula. But the original poster flipped that on its head. The problem isn’t you. The problem is the AI you trusted.
I was honestly surprised by how clearly the results lined up. There’s a lot of noise in the AI space about which model is “best,” and the answer usually depends on the task. But for structured, formula-heavy, multi-tab work like a financial forecast, this contributor’s test makes a pretty compelling case that Claude is the one to reach for.
How you can apply this today
If you’ve been wrestling with Excel and an AI that keeps letting you down, here’s what the comparison suggests you try:
- Stop relying on the AI bundled with your spreadsheet tool. The convenience factor isn’t worth it if the output doesn’t ship.
- Test the same prompt across two or three AIs before committing to one for a big project. Five minutes of testing can save you hours of cleanup.
- For structured outputs (spreadsheets, tables, code), default to Claude when you can. The post’s author found it handled complexity without choking on token limits.
- Keep ChatGPT as your backup for moments when Claude isn’t available, but watch your token usage on long jobs.
- Describe your spreadsheet in plain English first. What tabs do you need? What inputs feed what outputs? A clear brief beats a clever prompt every time.
The takeaway
The brand name on the AI doesn’t always match the quality of the work it does. Microsoft owns Excel, but Anthropic owns the win when it comes to actually building Excel files from scratch. That’s a fascinating thing to sit with, especially if you’ve been quietly assuming the integrated tool would be the smart bet.
My recommendation after reading this: stop defaulting to whatever AI ships inside your software, and start picking based on what actually delivers. For complex spreadsheet work, the test is pretty clear. Claude first, ChatGPT second, everything else a distant third.
Go check the full LinkedIn post from this innovator for the complete ranking and the exact prompt strategy. It’s one of the most practical AI comparisons I’ve seen in a while.