Optimize LLM Prompts: SMS Shorthand, Token Count & AI Robustness

A build dropped this week that nobody asked for and a few people genuinely needed. It’s called autoincorrect. The concept: write your LLM prompts the way you texted in 2008. “wht if we jst wrt lyk ths.” Then let the model figure it out. The repo includes an actual spec, not just a meme. Someone sat down and wrote formal rules for how to abbreviate prompts systematically. That detail is what makes it worth paying attention to.

The author’s logic is solid at first glance. LLMs were trained on mountains of this exact text. SMS shorthand, forum posts, early Twitter, Reddit comment threads from 2009. All of it went into the training data. “brb,” “tbh,” “imo,” “u r,” “lmk” are not foreign tokens to these models. They’ve seen billions of examples. So the argument goes: if the model already fluently understands degraded text, you get compressed input with zero cognitive overhead on the model’s end. You type faster, your prompts are shorter, and the model processes them without breaking a sweat. It sounds like a clean win.

Here’s the actual twist. The community poked the hole immediately: token count doesn’t track character count. Tokenizers split text into subword units, not letters. Writing “u” instead of “you” doesn’t save tokens. It might save exactly zero. The way most modern tokenizers work, “you” is often already a single token. “u” might also be a single token. So you save one character and zero tokens. The “lossless compression” claim doesn’t hold up the way the author frames it. The shorthand isn’t compressing anything at the level the model actually measures. You’re not saving money, you’re not speeding up inference, you’re just writing weirdly. That said, the person who built this isn’t wrong that the experiment reveals something real about model robustness.

But the underlying experiment is still worth running. LLMs are genuinely robust to degraded, abbreviated input. Studies on prompt sensitivity consistently show that models can extract intent from surprisingly mangled text. Typos, missing words, unconventional grammar, SMS shorthand: models handle all of it better than most people expect. That robustness has practical implications. It means the perfectionism many people bring to prompt writing is often overkill. You probably don’t need to polish every word. The model will follow your meaning even when your sentence structure is a mess. And if you’re on an API that meters by character count rather than token count, some billing models, some older endpoints still do this, then the math actually changes and the compression argument becomes real.

How to try it:

📂 Clone the repo at github.com/dev-boz/autoincorrect and read the spec first. The spec matters. It’s not just random shorthand, it’s a defined ruleset for which abbreviations to apply and in what order, which is what makes it a useful experiment rather than noise.
✍️ Rewrite a prompt using SMS shorthand: drop vowels from common words, swap “you” for “u”, compress repetitive phrasing. A good test case is a prompt you use regularly, something you already know the expected output of. That gives you a clean baseline to measure against.
Run both versions through your LLM and compare output quality side by side. Be specific about what you’re measuring. Is the structure the same? Are the key points present? Does the tone match? Vague “feels similar” comparisons won’t tell you anything useful.
🔬 Use a tokenizer visualizer (OpenAI’s tokenizer tool works for this even if you’re using a different model) to actually count tokens before and after. Paste both versions and read the numbers. This is the step most people skip and it’s the most important one.
Note where output quality starts degrading. That’s the real data. There’s a threshold somewhere where shorthand stops being robust input and starts producing confused or incomplete responses. Finding that threshold is more valuable than any compression claim.

Pro tip: Flip the experiment entirely. Instead of compressing inputs, test how much noise your prompts can absorb before model output breaks down. Deliberately introduce typos, cut words, swap sentence order, use slang. See at what point the output quality actually shifts. That’s a more useful calibration than chasing character savings. Most prompts have 20 to 30% filler that can be cut without touching token count at all. Hedge phrases like “please,” “if possible,” “I was wondering if you could,” unnecessary restating of context the model already has. Cut those and you often get tighter, better outputs without any SMS shorthand involved. The real lesson from autoincorrect isn’t that txt speak works. It’s that your prompts have more slack in them than you think.

📬 Try talking to your LLM in txt speak this week. Run the token count. Share what you find. The answer will probably surprise you either way.

Frequently Asked Questions

Q: Does SMS-style text compression actually reduce token usage?

Not really, and that’s the key misconception. Tokens are calculated based on words and subword units, not raw characters. Writing “u” instead of “you” doesn’t save tokens because they’re still tokenized. One commenter noted this approach might even *increase* token count, since the model has to output its interpretation of the compressed text before generating the response.

Q: Will LLMs understand compressed text?

Yes, LLMs can read and understand SMS-style compression since it exists in training data. However, according to comments, understanding it requires *more* cognitive effort from the model, especially in thinking modes. This can increase hallucinations and inaccurate responses because the model is focusing on decoding your input rather than crafting a good output.

Q: Is the time and effort worth it?

That’s debatable. While LLMs handle compressed text fine, the human cognitive load, writing “wrt” instead of “write” for longer prompts, might offset any theoretical savings. One commenter suggested running actual token count comparisons across different prompt types to find the real breakeven point, which hasn’t been done yet.

Q: How is this different from the “caveman” approach everyone talks about?

The caveman method works by using *simpler words* (reducing vocabulary complexity), which actually does reduce token count. SMS compression just removes characters without simplifying the vocabulary, so it doesn’t trigger the same efficiency gains.

autoincorrect – in/out compression
by u/Bravo_Oscar_Zulu in PromptEngineering

Frequently Asked Questions

Related: