Multiverse API: Local, Efficient Compressed AI Models for Enterprise

Spanish startup Multiverse Computing just made its compressed AI models a lot easier to get. The company launched a self-serve API portal alongside a consumer app, both designed to bring smaller, more efficient AI models to developers and enterprises, TechCrunch AI reports.

The timing isn’t random. With private company defaults hitting 9.2% and financial instability shaking up the AI supply chain, businesses are rethinking their dependence on external compute infrastructure. Multiverse is betting that smaller models running on local devices (no data center, no cloud provider, no counterparty risk) are finally good enough to matter.

What Multiverse Actually Launched

Two things dropped today:

CompactifAI App: an AI chat tool (think ChatGPT or Le Chat) that runs Gilda, a model small enough to operate locally and offline on your device. No data leaves your phone.
CompactifAI API Portal: a self-serve gateway giving developers direct access to Multiverse’s compressed versions of models from OpenAI, Meta, DeepSeek, and Mistral AI. No AWS Marketplace required.

“The CompactifAI API portal gives developers direct access to compressed models with the transparency and control needed to run them in production,” CEO Enrique Lizaso said.

The API includes real-time usage monitoring, which matters because lower compute costs are one of the primary reasons enterprises are eyeing smaller models as alternatives to full-size LLMs.

How the Compression Works

Multiverse uses a quantum-inspired compression technology (also called CompactifAI) to shrink models from major labs while preserving performance. Its latest compressed model, HyperNova 60B 2602, is built on gpt-oss-120b, an OpenAI model with publicly available code. The company claims it delivers faster responses at lower cost than the original, particularly useful for agentic coding workflows where AI handles complex, multi-step programming tasks.

The Caveats

The local-first approach has real limitations. Your device needs enough RAM and storage to run the model. Many older iPhones won’t cut it. When hardware falls short, the app automatically routes requests to cloud-based models via API, handled by a system Multiverse named “Ash Nazg” (yes, that’s a Lord of the Rings reference). But cloud routing kills the privacy advantage, which is the whole point.

The consumer app isn’t exactly flying off shelves either. According to Sensor Tower data cited by TechCrunch AI, the app had fewer than 5,000 downloads in the past month. Mass adoption was likely never the goal. The real play is enterprise.

Why This Matters

The small model space is heating up fast. Mistral just updated its small model family with Mistral Small 4, optimized for chat, coding, agentic tasks, and reasoning. Apple Intelligence already combines on-device and cloud models. What Multiverse adds to the mix is compression of existing frontier models, taking something like gpt-oss-120b and making it run where it physically couldn’t before.

The practical use cases go beyond cost savings. Think drones, satellites, and field operations where connectivity can’t be guaranteed. A model that works offline and keeps data local is a different product category than one that needs a cloud connection.

Multiverse already serves over 100 global customers, including the Bank of Canada, Bosch, and Iberdrola. The company raised $215 million in a Series B last year and is rumored to be raising a fresh €500 million round at a valuation above €1.5 billion.

The broader trend is clear: as AI infrastructure costs and risks climb, the argument for running capable models on your own hardware keeps getting stronger. Multiverse is positioning itself as the company that makes that transition possible. More details are available in the original TechCrunch AI report.

Read original article

What Multiverse Actually Launched

How the Compression Works

The Caveats

Why This Matters

Related: