pxpipe: Slash AI Token Costs by 70% with Image Text

A developer named Steven Chong just shipped an open-source tool called pxpipe that does something clever and a little strange: it hides your text inside images to slash AI token costs by up to 70 percent. According to The Decoder, the tool works as a local proxy for Claude Code, and the savings in early tests are hard to ignore.

The idea rests on a quirk in how Anthropic prices things. Text costs you roughly one token per character. Images, though, cost a fixed number of tokens based on their pixel size, no matter how much text you cram into them. So if you render dense stuff like code or JSON as a picture, you can pack about 3.1 characters into every single image token. That’s the whole trick.

How pxpipe actually works

pxpipe sits between you and Claude Code and intercepts your requests. It doesn’t turn everything into images. It’s selective:

Rendered as PNGs: the bulky, static parts of a session, like system prompts, tool documentation, and older chat history.
Left as text: recent messages and the model’s own outputs, so the live parts of the conversation stay sharp.

The Decoder shares a concrete example. Around 48,000 characters of system prompt and tool docs get squeezed onto a single densely packed PNG page. As plain text, that would run you about 25,000 tokens. As an image, it’s roughly 2,700. That’s the compression in action.

The numbers that got attention

Chong reports total savings averaging 59 to 70 percent. The headline demo really lands: in one Fable 5 session, costs dropped from $42.21 to $6.06. Same work, a fraction of the bill.

What stands out to me is the accuracy figure. Fable 5 hit 100 percent accuracy on benchmark math problems using fresh random numbers the model couldn’t have memorized. That matters, because it means the model is genuinely reading the image, not guessing from training data.

The tradeoffs are real

This isn’t free money, and Chong is upfront about the downsides. The approach is lossy. Exact strings like hashes can come back garbled when the model reads them from an image. Processing is also slower, since the model has to run each rendered page through a vision encoder instead of just reading text.

Accuracy also depends heavily on the model:

Fable 5: 100 percent on the memorization-proof math benchmark.
Opus 4.7 and 4.8: misread about 7 percent of rendered images.
GPT 5.5: also performs worse with image context.

Because of that, pxpipe supports Claude Fable 5 and GPT 5.6 by default. The weaker performers are switched off unless you turn them on manually. Benchmarks and evaluations are documented in the repository for anyone who wants to check the work.

Not a brand-new idea

Feeding text to models as compressed images has been explored before. The Decoder points to DeepSeek, which built an OCR system that processes documents as images and, per its technical paper, compresses them by up to a factor of ten while keeping 97 percent of the information. pxpipe takes that concept and wires it straight into everyday Claude Code and Fable 5 workflows.

Why it matters

This is significant because it exposes a pricing gap that a lot of heavy users are quietly paying for. If you run long sessions with fat system prompts and tool docs, most of your token bill is static context you’re re-sending over and over. pxpipe attacks exactly that.

There’s a catch worth flagging, though. If this exotic trick catches on widely, AI companies could respond by raising image processing prices and closing the gap themselves. Arbitrage like this tends to have a shelf life.

For now, it’s a working tool with public benchmarks and a real cost story behind it. If you’re burning through tokens on repetitive context, it’s worth a look. You can find the full breakdown, the demos, and the evaluation data at the original report from The Decoder.

Read original article

How pxpipe actually works

The numbers that got attention

The tradeoffs are real

Not a brand-new idea

Why it matters

Related: