Yesterday a clever build shipped. The twist is in what it deliberately skips.
A developer working with real codebases kept hitting the same wall: large repos don’t fit into context windows. Models miss files, reason over partial information, retry constantly. You paste in a 40-file project and the model confidently edits the wrong function in the wrong file because it only half-read the codebase. The usual fix is RAG plus embeddings plus a vector database. That means setting up infrastructure, managing an index, paying for embedding API calls, and debugging retrieval quality before you’ve even written a line of application code. This build skips all of that.
What’s new
Sigmap is a lightweight context layer for LLM coding tasks. It parses your codebase for structural signals (functions, classes, routes), builds a local index with zero external dependencies, then ranks the most relevant files per query using token overlap, structural signals, and basic heuristics like recency and dependency links.
No embeddings. No vector store. No cloud calls. Sigmap reads your code the way a senior developer would skim a codebase: look at the structure, find what’s defined where, trace the imports, check what was touched recently. It builds a map of your repo in memory and uses that map to answer “what should I actually be looking at for this query?”
Result: instead of feeding 80K tokens into context, you send roughly 2K. Relevant files appeared in the top-5 about 70-80% of the time across multiple repos. Context size dropped 97%. That’s not a small efficiency gain. That’s the difference between a model that reasons well and one that hallucinates because it’s drowning in irrelevant code.
The twist
Structured context mattered more than model size. Not a bigger model. Not semantic embeddings. Just knowing what lives in your files and ranking it intelligently per query.
This is the part worth sitting with for a moment. The instinct when LLMs fail on large codebases is to reach for a smarter model or a fancier retrieval system. Sigmap is an argument that the bottleneck was never intelligence. It was signal-to-noise. Give a capable model a clean 2K context with the right files and it outperforms the same model given a noisy 80K dump. Garbage in, garbage out applies to context as much as training data.
How to use it 🔧
- Clone the sigmap repo and point it at your codebase
- It extracts functions, classes, and routes automatically with plain parsing, no configuration needed for standard Python, JavaScript, or TypeScript projects
- Run a query in natural language (“where does auth token validation happen?”); sigmap ranks the relevant files and emits a compact context layer showing the most relevant code blocks
- Feed those ~2K tokens to your LLM instead of the entire repo, paste directly into your prompt or pipe into any coding assistant that accepts context
- Iterate fast: fewer retries, fewer hallucinations, faster feedback loop 🚀
The setup genuinely takes under 10 minutes for a mid-size repo. There’s no account to create, no API key to generate, no hosted service to trust with your code.
Pro tips
- Works best for literal queries (“where is the auth logic?”, “which file handles rate limiting?”) rather than conceptual ones (“refactor for scalability”) where semantic understanding starts to matter. For conceptual work, use sigmap to shortlist candidates, then review them yourself before prompting.
- Zero cloud, zero API keys, zero setup beyond the repo. Runs fully local, which matters if your codebase contains anything you’d rather not pipe through a third-party embedding API.
- Pair with a strong coding model for tasks where surgical file selection beats broad coverage. Sigmap finds the files; the model reasons over them. Each tool doing its actual job.
- Re-index after significant refactors. The local index updates fast, but if you rename modules or restructure heavily, a fresh index gives you cleaner rankings.
- Use it as a forcing function for your own architecture. If sigmap is struggling to rank relevant files for basic queries, your codebase might be poorly organized. That’s useful information on its own.
Bottom line
If your LLM keeps missing files or losing track of your codebase, the problem might not be the model. It might be the noise you’re feeding it. Most RAG pipelines are solving a problem you don’t need to have if you just parse structure intelligently and rank before feeding. Sigmap does that with a simple local tool, no infrastructure, and results that hold up across real projects. It’s worth 10 minutes to try. 🎯
Docs: https://manojmallick.github.io/sigmap/
GitHub: https://github.com/manojmallick/sigmap
Frequently Asked Questions
Q: How far can heuristic ranking take me before I need embeddings?
You’ll do fine with pure heuristics for literal queries, finding specific functions, routes, imports. But conceptual stuff like “refactor this for better scalability” doesn’t map well to token overlap. Most folks hit the wall around 100, 150 files; that’s when embeddings start earning their keep. The real magic? Hybrid: use structure for hard filters first, then let embeddings re-rank what’s left.
Q: How do I keep the model from making stuff up that isn’t in my context?
Force it to cite its work. Have the model quote the specific line and filename before writing any solution, if it can’t find that quote in your 2K token layer, it flags it as hallucination. Dead simple, but it actually works.
Q: Should I just use this instead of embeddings, or do both?
Both, actually. Use structural signals (imports, routes, class defs) to narrow the haystack first, then hit the remainder with semantic search. You cut the noise way down compared to pure vector search, and the model actually stays grounded in your code.
Reducing LLM context from ~80K tokens to ~2K without embeddings or vector DBs
by u/Independent-Flow3408 in ChatGPTPromptGenius