Offline AI Guide: Run a Private LLM on Your Phone

Recently, a fascinating new mobile build shipped that completely changes how we interact with large language models on the go. The unexpected twist is that it requires absolutely zero internet connection to function. I just saw this incredible app developed by an AI professional named Adrian Gronden, and it completely blew me away. We are so used to relying on cloud servers and constant data connections for our smart assistants to work. If you lose your cellular signal, you usually lose your access to advanced problem-solving tools entirely. Now, you can carry a highly capable digital assistant right in your pocket, ready to brainstorm ideas, answer questions, or analyze data even if you are entirely off the grid.

The author built an application called Locally AI, which is currently available to download. This tool lets you pull open-weight models directly onto your smartphone hardware and run them locally. While we have seen smaller local models attempted before, the landscape recently shifted with the release of the Qwen 3.5 family of models. These are highly efficient, surprisingly smart systems that punch way above their weight class. They perform better than many of the top-tier models from just a year or two ago, and now they fit entirely in your hand.

The app gives you a menu of choices right out of the gate. You can select from models like the Apple Foundation model, Gemma 2, Llama 3.2, or the newly released Qwen 3.5. The Qwen models come in various sizes, ranging from an 800 million parameter version up to a 9 billion parameter giant. For mobile use, the sweet spot seems to be the 2 billion or 4 billion parameter versions. The creator designed the interface to be as simple as choosing a download, waiting a few minutes for the files to save to your device over your home network, and then chatting just like you would with any other popular platform.

The real magic here is the complete separation from the cloud. When you type a prompt, snap a photo, or use the voice feature, your data never leaves your physical device. It is not sent to servers owned by massive tech corporations to be processed or used for future training data. Everything stays completely private, making it a brilliant solution for handling sensitive information or personal brainstorming sessions.

This creator ran a test by switching a phone into airplane mode, cutting off all Wi-Fi and cellular data completely. Despite having zero internet access, the app continued to stream responses rapidly. Imagine being on a long flight with no Wi-Fi, and your kid starts having a meltdown because you took their tablet away. You could pull out your phone, open the app, and ask for creative ways to calm them down. The assistant will generate a detailed, thoughtful response right there at 30,000 feet.

Furthermore, this savvy professional included advanced features that you typically only expect from heavy, server-bound tools. The app actually supports vision capabilities. You can take a picture of a drink label using your phone’s camera and ask the offline assistant if it is a healthy option. It will read the context of the image and give you a detailed breakdown of the ingredients. It also features a dedicated thinking mode. By toggling a small lightbulb icon, the model will engage in a visible chain-of-thought process. It takes extra time to reason through complex requests, showing you its internal logic before delivering a highly detailed list of ideas or solutions. There is even a voice mode where you can speak naturally to the app and have it suggest dinner ideas, reading the responses back to you aloud.

If you want to try running a private assistant on your own hardware, the person who shared it made the setup process incredibly straightforward.

Head to your mobile store and download the free Locally AI application to your device.
Open the settings menu and select a model that fits your hardware capabilities, such as the Qwen 3.5 2 billion parameter version.
Wait a few minutes for the model files to fully download over your home Wi-Fi network.
Turn on airplane mode, open a brand new chat window, and test its offline reasoning skills with a complex question.

Hardware matters quite a bit when running heavy computations locally. If you have an iPhone 14, the 800 million parameter model is your best bet for smooth, fast performance. Those with a standard iPhone 15 can comfortably run the 2 billion parameter version without much trouble. If you are holding an iPhone 15 Pro or a newer flagship device, you have enough dedicated processing power to handle the heavier 4 billion parameter model.

Keep in mind that running complex calculations will use your phone’s processor intensely. You will likely feel the back of your device getting warm during extended brainstorming sessions, especially if you enable the deep-thinking mode. This is totally normal, as your phone is doing the heavy lifting that a massive server farm usually handles.

Additionally, as your conversation history gets longer, the app has to hold more context in its memory. You might notice the text generation or even your screen scrolling starting to slow down slightly after a long chat. Simply starting a fresh chat clears the context window and instantly restores the rapid generation speeds. Finally, the innovator behind this app added a neat integration with iOS shortcuts. You can set it up so that you simply say a trigger phrase to your phone, and it will route your spoken question directly into your private, local model for a hands-free offline experience.

This is a massive leap forward for mobile privacy and offline productivity. It is truly remarkable to see how far these open-weight models have come in such a short time.

👇 Check out the full video to see exactly how fast these models run in airplane mode!

Related: