Osaurus: Run AI Locally on Mac Without Cloud Costs

A new open source tool called Osaurus is trying to solve a problem that’s been nagging at AI power users: why pay per token when your Mac could just run the model itself? According to TechCrunch AI, the Apple-only LLM server lets users switch between local and cloud AI models while keeping files, tools, and memory all on their own hardware.

The project came out of an earlier app called Dinoki, a desktop AI companion that co-founder Terence Pae described as an “AI-powered Clippy.” Customers kept asking Pae, a former Tesla and Netflix engineer, why they should pay for tokens when the assistant lived on their machine. That question turned into Osaurus.

What it actually does

Osaurus works as a “harness”: a control layer that ties different AI models, tools, and workflows together through one interface. TechCrunch AI compares it to tools like OpenClaw and Hermes, but with a key difference: those are built for developers comfortable in a terminal. Osaurus is pitched at regular consumers.

Key capabilities:

Local model support: MiniMax M2.5, Gemma 4, Qwen3.6, GPT-OSS, Llama, DeepSeek V4, plus Apple’s on-device foundation models and Liquid AI’s LFM family.
Cloud connections: OpenAI, Anthropic, Gemini, xAI/Grok, Venice AI, OpenRouter, Ollama, and LM Studio.
Full MCP server: Any MCP-compatible client can access your tools through it.
20+ native plug-ins: Mail, Calendar, Vision, macOS Use, XLSX, PPTX, Browser, Music, Git, Filesystem, Search, Fetch, and more.
Voice capabilities: Added in a recent update.
Sandboxed execution: Runs in a hardware-isolated virtual sandbox to address security concerns common with similar tools.

The hardware reality check

This isn’t quite plug-and-play for everyone. Running local models needs serious silicon. Pae recommends at least 64GB of RAM, and 128GB if you want to run larger models like DeepSeek v4. That puts it firmly in Mac Studio and high-spec MacBook Pro territory.

Pae argues the curve is bending fast though. “Last year, local AI could barely finish sentences, but today it can actually run tools, write code, access your browser, and order stuff from Amazon,” he told TechCrunch AI. He calls intelligence-per-wattage the real metric to watch.

How it stacks up

Osaurus is going after the same territory as Ollama, Msty, and LM Studio, but it’s leaning hard on user-friendliness for non-developers. The sandboxing is also a real selling point: tools like OpenClaw have had security concerns that made them hard to recommend to regular users.

The project has been downloaded over 112,000 times since launching nearly a year ago, according to its website. The founders are currently in the New York-based Alliance accelerator and eyeing a B2B move into industries like legal and healthcare, where keeping LLMs on-premise solves real privacy headaches.

Why it matters

What stands out here is the underlying bet: that local AI can eat into cloud demand. “Instead of relying on the cloud, they can actually deploy a Mac Studio on-prem, and it should use substantially less power,” Pae said. With data center power draw becoming a genuine grid problem, that pitch is going to find buyers, assuming the hardware keeps catching up to the ambition.

More details are available in the original TechCrunch AI report.

Read original article

What it actually does

The hardware reality check

How it stacks up

Why it matters

Related: