Tribute 9 min read

A Tribute to Ollama: The Tool That Made Local AI Accessible

Before Ollama, running an AI model on your own computer meant wrestling with Python environments, CUDA drivers, and cryptic error messages. Two developers from Palo Alto changed that with a single command: ollama run. This is their story.

The Problem Nobody Was Solving

In 2023, AI was everywhere — but only if you had a cloud subscription. OpenAI charged monthly fees. Google locked models behind APIs. Even the open-source models that were technically free required a computer science degree to install.

Want to run a model locally? Good luck. You'd need to clone a GitHub repo, set up a Python virtual environment, install the right CUDA toolkit version, hunt down model weights, configure quantisation settings, and pray nothing crashed. Most people gave up before they got a single response.

Jeffrey Morgan and Michael Chiang saw this and thought: what if running an AI model was as easy as running a Docker container?

Two Developers Who'd Done This Before

Jeffrey and Michael weren't strangers to making complex technology simple. Before Ollama, they'd co-founded Kitematic — one of the earliest graphical interfaces for Docker. Kitematic made it possible to run Docker containers on Mac without touching the command line. It was so good that Docker acquired it, and it became the foundation for what we now know as Docker Desktop.

They'd literally built the tool that made containers accessible to everyday developers. Now they were going to do the same thing for AI models.

Based in Palo Alto, California, they went through Y Combinator's W21 batch — the same accelerator that launched Dropbox, Airbnb, and Stripe. Their mission was clear: make it possible for anyone to run AI models on their own computer, with a single command.

What Ollama Actually Did

Ollama took everything painful about running local AI and made it disappear. Instead of a 45-minute installation ordeal, you get this:

ollama run llama3

That's it. One command. Ollama handles everything else — downloading the model, quantising it to fit your hardware, managing GPU memory, serving it through a local API. The same model that would have taken an afternoon to set up now takes 30 seconds.

But Ollama didn't just simplify installation. It created an entire ecosystem:

A model library — hundreds of models available with a single ollama pull command
Automatic GPU detection — works with NVIDIA, AMD, and Apple Silicon out of the box
A local REST API — any application can talk to your models using standard HTTP requests
Modelfile format — customise models with system prompts, parameters, and templates
Multi-model support — run different models for different tasks on the same machine

From Side Project to 163,000 GitHub Stars

The numbers tell the story:

163,000+ GitHub stars — making it one of the most popular open-source projects in the world
6.4 million monthly downloads of the Python client alone
1.6 million weekly downloads — and growing
175,000+ active servers detected running Ollama across 130 countries

To put that in perspective: Ollama has more GitHub stars than React Native, Kubernetes, and TensorFlow. It became the default way to run AI locally — the tool that everyone from hobbyists to enterprises reaches for first.

And they did it with a pre-seed round of just $125,000. While competitors raised hundreds of millions, Jeffrey and Michael built the most-used local AI tool in the world on a shoestring budget.

Why This Matters to Us

Vox Bar wouldn't exist without Ollama. Full stop.

When you install Vox Bar, the Voxtral transcription model gets served through Ollama. It's Ollama that manages the GPU memory, handles the model loading, and provides the API that Vox Bar talks to. Without that layer, we'd have had to build an entire model serving infrastructure from scratch — a task that would have taken months and produced something far less reliable.

Ollama is the invisible engine inside Vox Bar. You never see it, but it's doing the heavy lifting every time you speak.

Jeffrey and Michael built the tool that made local AI accessible to everyone. We just happened to build a voice transcription app on top of it. Without Ollama, Vox Bar would still be an idea on a whiteboard.

The Docker Parallel

There's a beautiful symmetry in what Jeffrey and Michael have done. In the early 2010s, Docker containers were powerful but painful to use. They built Kitematic to fix that. Docker acquired it.

A decade later, AI models were powerful but painful to use. They built Ollama to fix that. History repeated itself — except this time, they're the ones in charge.

The pattern is always the same: powerful technology becomes transformative only when someone makes it accessible. That's what Ollama did for AI. And that's why millions of people — including us — can now run frontier models on their own computers.

Part of Something Bigger

Ollama sits at the heart of a movement. Combined with open-source models from teams like Mistral AI, DeepSeek, Meta, Google, and Alibaba, Ollama has helped create an entire ecosystem where anyone can run AI privately on their own hardware.

No subscriptions. No data leaving your machine. No corporate surveillance of your prompts. Just you, your computer, and the model — exactly as it should be.

🦙

Thank You, Ollama

To Jeffrey Morgan, Michael Chiang, and everyone who's contributed to Ollama — thank you for making local AI as easy as a single command.

Vox Bar runs on Ollama. You built the foundation we stand on.

See Ollama in action — invisibly

Vox Bar uses Ollama under the hood to deliver real-time voice transcription. Private. Local. Yours.

Coming Soon Early Bird