No cloud required: local LLMs as a practical app dependency

TL;DR: Ollama on a MacBook Air with M4 is great and it's a taste of the future. Developers should try it now.

This post is going to be a look at running LLMs locally on Apple laptops, with a passing mention of Apple desktops. We'll ignore mobile altogether—not because it's not important, but because it's not where my mind is at for this particular post.

Running models on older hardware

When I first started using Ollama to run LLMs locally in 2023, it felt simultaneously like a glimpse of the future and like breathing through a straw.

I was on an MacBook Pro with M1 at work, and MacBook Air with M2 at home. LLM responses on those systems were painfully slow due to hardware constraints and the LLMs you could install were quite limited due to limited RAM.

But even then it felt like just a matter of time before hardware would catch up and local LLMs would become practical. (I mentioned something to this effect on stage at Austin API Summit last March.)

Not long after, I got a Mac Studio with M2 Ultra and it felt like I had warped into the future—the experience with Ollama was suddenly smooth and I could install massive models. Of course, using an M2 Ultra with tons of RAM in 2023 was a cheat code—great for me as one person, but most people wouldn't have that kind of horse power in their machines.

While I could have written this same post about the Mac Studio M2 Ultra's impressive performance, it wouldn't have been useful information to most people. But now it feels like we're just on the edge of LLMs being a practical local dependency for applications.

The MacBook Air M4 is the start line

Fast forward to last week, and I moved into a new MacBook Air with M4 with 32GB of RAM.

I've been really impressed with the speed of running Llama 3.2 with Ollama on this machine. What makes the smooth performance I'm getting from Ollama exciting is that it's happening on a MacBook Air—not a Mac Studio, Mac Pro, or some specialized hardware setup. LLMs now run nicely on Apple's most popular laptop line.

When air-gapped AI capabilities become reliably available on devices anyone has access to, the addressable audience suddenly expands and eloquent UX patterns become possible. For me, this means Ollama has gone from "interesting tech demo" to properly usable as an application dependency.

Sure, it probably won't take any time at all for model makers to push the baseline beyond what today's machines can handle. But the fact is that the hardware is already within the range of usable. I would expect that it only gets better from here.

Trying It Yourself

If you want to get a feel for how it might feel on your machine, you can start by just installing Ollama and running the CLI.

If you want a simple web-based chat interface, I've made a repo you can try:

  1. Install Ollama from ollama.com
  2. Pull a model: ollama pull llama3.2
  3. Clone the UI repo: git clone https://github.com/ashryanbeats/ollama-chat
  4. Install dependencies: npm install
  5. Start the dev server: npm run dev

This repo isn't meant to blow your mind, just to give an idea of how to talk to Ollama at the code level via API and also to let you get a feel for Ollama outside of the terminal.

The Inflection Point

We've reached a significant inflection point in using LLMs as application dependencies. Even a MacBook Air can be spec'd out enough to run local LLMs smoothly.

For application developers, local LLMs aren't just curiosities anymore. Locally-runnable LLMs will open up new patterns of AI integration, reduce barriers to user adoption, and could drive down service costs for developers. They're practical tools ready to be integrated into your next project. No cloud required.