Yesterday a developer named SrMugre shipped a small CLI tool that solves one of the most annoying problems in production AI work. It’s called pCompiler, and the twist isn’t what it does (cost estimation), it’s when it does it: before your prompt ever touches the API.
That timing changes everything.
What pCompiler Actually Does
The tool analyzes your compiled prompt locally and gives you two numbers: how much it will cost and how long it will take. No API call needed. No tokens burned. You get the answer while the prompt is still sitting on your machine.
It does this by tokenizing your prompt using the same encoding libraries the model providers use, so the count is accurate, not a rough character-divided-by-four guess. That accuracy matters when you’re budgeting a job with thousands of rows.
The creator built this because batch-processing jobs have a nasty habit of blowing budgets. If you’ve ever kicked off a 10,000-prompt run and realized halfway through that you picked the wrong model, you know the pain. pCompiler catches that mistake at the planning stage, not the billing stage.
The Twist: Model Comparison
Here’s where it gets interesting. Running the tool with a --compare flag generates a side-by-side table of every model in your config. Cost per prompt, latency estimate, the works. So instead of guessing whether GPT-4o or Claude Sonnet is cheaper for your specific prompt, you just look at the table.
The tool reads pricing from a local config.json file, and since API prices change constantly, there’s a built-in command (pcompile update-pricing) that syncs the latest rates automatically. One less thing to track manually.
How to Get Started
The workflow is straightforward:
- 🔧 Clone the repo from GitHub (search for marcosjimenez/pCompiler) and install dependencies.
- 📝 Set up your
config.jsonwith the models you use and their current pricing per million tokens. Or runpcompile update-pricingto pull rates automatically. - 🧮 Run the estimate command against your prompt file. The tool counts input tokens accurately and multiplies against stored rates.
- 📊 Add the
--compareflag to see a full model comparison table. This is the real power move for batch jobs where model choice can mean the difference between $5 and $500. - ✅ Pick your model based on actual numbers, then send the prompt knowing exactly what you’ll pay.
Pro Tips
Run estimates before every batch job, not just the first time. Prompt length creeps up as you iterate, and what cost $12 last week might cost $40 now because you added a longer system prompt or more few-shot examples. A quick pre-run check takes five seconds and can save you from an unpleasant invoice surprise.
Keep your pricing config fresh. API providers change rates without fanfare. The update-pricing command exists for a reason. Make it part of your weekly routine, or better yet, add it to a pre-run script so it happens automatically before every batch job kicks off.
Use the comparison table for model selection, not vibes. The cheapest model per token isn’t always the cheapest per task. A model that needs two retries to get the output right costs more than one that nails it first try. But at least the cost side of the equation won’t be a mystery.
Who This Is For
If you’re sending a handful of prompts a day, you probably don’t need this. The real value shows up when you’re running hundreds or thousands of prompts in a pipeline. Data processing, content generation at scale, automated classification jobs: that’s where a 2x price difference between models turns into real money. Think document parsing across a 50,000-row dataset, or nightly enrichment jobs pulling product descriptions from a catalog.
As one commenter in the original Reddit thread put it, “blowing your budget on a massive prompt run is the absolute worst.” Hard to argue with that.
Limitations Worth Noting
The tool estimates input tokens, but output tokens depend on what the model actually generates. So your real cost will always be higher than the estimate. Think of it as a floor, not a ceiling. For planning purposes, the original poster’s approach of tracking per-million-token rates works well, but you’ll want to add a buffer (20, 30%) for output costs in your budget math. If your outputs are long or variable in length, lean toward the higher end of that buffer.
Also, latency estimates are averages from config, not live measurements. Actual response times vary based on provider load, prompt complexity, and time of day.
If you’re running production AI pipelines and want to stop guessing at costs, check out the original discussion on r/PromptEngineering. The full source code and setup instructions are linked in the thread.
Frequently Asked Questions
Q: How accurate is the token estimation?
The Prompt Compiler analyzes your compiled prompt to accurately estimate input tokens, then applies your current API pricing from config.json. This gives you a reliable cost estimate before sending to the API, eliminating surprise bills on batch jobs.
Q: How do I find the cheapest model for my prompt?
Run the estimate command with the –compare flag, and it generates a comparison table across all your registered models. You’ll see both cost and latency side-by-side, making it easy to pick the most cost-effective option for your specific use case.
Q: How do I keep my pricing data current?
Use the pcompile update-pricing command to automatically sync API prices to your config.json. Since provider pricing changes frequently, this automated approach beats manual guess-estimating and keeps your estimates accurate for ongoing batch jobs.
Q: Is this tool only for large-scale operations?
While especially valuable for batch-processing and massive prompts, even smaller projects benefit from knowing upfront costs. Whether you’re building complex flows or testing a quick workflow, preventing surprise bills and understanding scalability makes it worthwhile.
The prompt compiler – How much does it cost ?
by u/SrMugre in PromptEngineering