mcp2cli: Slash LLM Tool Token Costs by 96-99%

A new open-source tool called mcp2cli promises to slash the token costs of connecting LLMs to external tools by 96-99%, according to a Show HN post on Hacker News that quickly climbed to 158 points.

The pitch is straightforward: instead of injecting full tool schemas into your LLM’s context window on every single turn, mcp2cli lets agents discover and call tools on demand through a standard CLI interface. No codegen, no recompilation. Just point it at an MCP server or OpenAPI spec and go.

The Problem It Solves

Anyone who’s wired up multiple MCP servers to an AI coding agent knows the pain. Each server dumps its full schema into the system prompt, and that context tax gets paid on every message, whether the model uses those tools or not.

The numbers are stark. According to analysis cited in the project, 6 MCP servers with 84 tools consume roughly 15,540 tokens at session start. A 50-endpoint API alone costs 3,579 tokens of context before a single word of conversation. That adds up fast.

Even Anthropic recognized this issue, building Tool Search directly into their API with a deferred-loading pattern that cuts usage by about 85%. But mcp2cli’s creator argues that’s not enough, since Tool Search still pulls full JSON schemas when a tool is actually fetched.

What mcp2cli Actually Does

The tool installs via pip (pip install mcp2cli) or runs directly with uvx. Here’s what it brings to the table:

Zero codegen: Point it at a spec URL or MCP server and the CLI exists immediately. New endpoints appear on next invocation without rebuilding anything
Universal compatibility: Works with Claude, GPT, Gemini, local models. It’s just a CLI tool any model can shell out to
Compact discovery: --list summaries cost roughly 16 tokens per tool vs 121 for native schemas. That’s where the 96-99% savings come from
OpenAPI support: Handles both MCP servers and OpenAPI specs (JSON or YAML, local or remote)
TOON output: A token-efficient encoding format for LLM consumption, cutting 40-60% more tokens from large result sets
Caching: Specs and tool lists are cached locally with a 1-hour TTL by default
Auth support: Pass API keys and auth headers directly via command flags

The Skill System

What stands out is the skill integration. mcp2cli ships with an installable skill that teaches AI coding agents (Claude Code, Cursor, Codex) how to use it. One command:

npx skills add knowsuchagency/mcp2cli --skill mcp2cli

After that, your agent can discover and call any MCP server or OpenAPI endpoint, and even generate new skills from APIs automatically. This is a clever approach: instead of replacing MCP, it wraps it in a more token-efficient interface.

How It Compares

The project builds on prior work by Kagan Yilmaz’s CLIHub, which demonstrated 92-98% token savings by converting MCP servers to CLIs. mcp2cli extends that concept by eliminating the codegen step entirely. Everything happens at runtime.

Compared to Anthropic’s native Tool Search (which defers loading but is locked to the Anthropic API), mcp2cli is provider-agnostic. Any LLM that can run shell commands can use it. The trade-off is that it requires the agent to make a subprocess call rather than using native tool-calling protocols.

Why This Matters

Token costs are a real constraint for production AI systems. As developers connect more and more tools to their agents, the context window fills up with schemas that may never get used in a given conversation. This is essentially a caching and lazy-loading problem applied to LLM tool discovery.

The project is free, open-source, and available now via pip. It’s early-stage (this is a Show HN launch), so expect rough edges. But the core idea, treating tool discovery as a CLI pattern rather than a context-stuffing exercise, is sound engineering.

For teams running multi-tool agent setups where token costs are ballooning, this is worth a look. Full details and documentation are available through the project’s Hacker News discussion.

Read original article