Why Google's AI Can't Spell: LLM Tokenization Explained

Ask Google how many P’s are in “Google” and its AI Overview will tell you there are two. It also claims there’s one R in “poop,” insists “journalism” has two D’s, then spells it j-o-u-r-n-a-d-i-s-m. As TechCrunch AI reports, Google’s AI-forward Search overhaul is tripping over the most basic task a first-grader masters: spelling.

What stands out here is how predictable this was. The first time Google bolted AI Overviews onto Search, the feature cited satirical Onion posts and told people to eat rocks and put glue on their pizza. Now, as Google doubles down on making generative AI the centerpiece of its 29-year-old flagship product, the stumbles are back. Google even spelled the U.S. president’s last name as t-r-p-u-m.

“Counting within words has been a known challenge for LLMs, and we’re working to fix this particular issue,” Google told TechCrunch in an emailed statement.

Why this keeps happening

This isn’t a bug someone forgot to fix. It’s baked into how these models work. Large language models, the tech behind chatbots and text generators, don’t read the way you do. They don’t see sentences as words made of letters. They break text into tokens, then convert those into numbers and run the math.

Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta, put it plainly to TechCrunch: “LLMs are based on this transformer architecture, which notably is not actually reading text. When it sees the word ‘the,’ it has this one encoding of what ‘the’ means, but it does not know about ‘T,’ ‘H,’ ‘E.'”

That’s the whole problem in one sentence. The model knows what a word means in context. It has no reliable view of the letters inside it. So counting R’s in “strawberry,” a running joke in AI circles for years now, exposes a real architectural limit.

Researchers aren’t optimistic about a clean fix

Here’s the part practitioners should sit with. The people who study this don’t expect a tidy solution. Sheridan Feucht, a PhD student researching LLM interpretability at Northeastern University, told TechCrunch that defining what a “word” even is for a language model is genuinely hard.

“Even if we got human experts to agree on a perfect token vocabulary, models would probably still find it useful to ‘chunk’ things even further,” Feucht said. “My guess would be that there’s no such thing as a perfect tokenizer due to this kind of fuzziness.”

Google’s troubles go past spelling, too. The company already patched an issue where searching the word “disregard” returned what looked like a dictionary entry, except the definition read: “Understood. Let me know whenever you have a new prompt or question!” The model was answering an instruction instead of defining a word.

What this means for you

The takeaway isn’t that LLMs are useless. The same models that fumble “poop” can write working code in seconds or chip away at problems that stumped mathematicians for decades. Spelling just isn’t where their value lives, and tokenization is the reason.

A few practical things to carry forward:

Don’t use an LLM as a spell-checker or a character counter. Tasks that hinge on individual letters, counting characters, anagrams, exact string manipulation, sit right on the model’s blind spot.
Verify anything user-facing. If your product surfaces AI output directly to people, the “t-r-p-u-m” failure mode is what an unchecked pipeline looks like in public.
Treat confident output with skepticism. The model sounds equally sure whether it’s right or spelling journalism with a D. Tone is not a signal of accuracy.
Reach for deterministic tools where precision matters. Regular code handles spelling, counting, and exact text operations far better than a probabilistic model ever will.

Google says it’s working on the counting issue, and it may well patch these specific examples. But the researchers are telling us the underlying limitation isn’t going anywhere soon. The honest lesson, as TechCrunch AI frames it, is the oldest one: AI is not all-knowing, and you can’t blindly trust its output without checking the work.

More detail is available in the original TechCrunch AI report.

Read original article

Why this keeps happening

Researchers aren’t optimistic about a clean fix

What this means for you

Related: