Guide 8 min read

Introduction to Ollama: Run AI Models on Your Own Computer

Ollama makes it dead simple to download and run large language models locally. No cloud account. No API key. No monthly subscription. Just AI running on your own hardware, completely offline. Here's how to get started.

What Is Ollama?

Ollama is a free, open-source tool that lets you download and run AI language models on your own computer. Think of it as an app store for local AI — you browse a catalogue of models, pick one, download it, and start chatting. Everything runs on your machine. Nothing goes to the cloud.

It supports hundreds of models — from small, fast assistants that run on modest hardware to large, powerful models that rival cloud services like ChatGPT. The key difference? Your data never leaves your computer. Every conversation, every document you analyse, every question you ask stays on your hard drive.

Getting Started (5 Minutes)

Setting up Ollama on Windows takes about five minutes:

Step 1: Visit ollama.com and download the Windows installer
Step 2: Run the installer — it's a standard one-click setup
Step 3: Open a terminal (PowerShell or Command Prompt)
Step 4: Type ollama pull llama3.2 to download your first model
Step 5: Type ollama run llama3.2 to start chatting

That's it. You now have a fully functional AI assistant running locally on your PC. No sign-up. No API key. No subscription. It works even with your internet disconnected.

Choosing the Right Model

Not all models are created equal, and the right choice depends on your hardware and what you want to do:

4-6 GB VRAM (entry-level GPU): Smaller models like Phi-3, Gemma 2B, or Llama 3.2 3B — great for quick questions, summarisation, and basic chat
8 GB VRAM (mid-range GPU like RTX 3060/4060): Models like Llama 3.1 8B, Mistral 7B, or Qwen 2.5 7B — excellent all-rounders for writing, coding, and analysis
12+ GB VRAM (high-end GPU): Larger models like Llama 3.1 70B (quantised) or DeepSeek-R1 — approaching cloud-quality intelligence, fully local

The model file is just a download — typically 4 to 16 GB. Once it's on your machine, it's yours forever. No ongoing costs. If you want to understand what's actually inside these files, check out our guide on what an LLM actually is.

The Local AI Ecosystem

Ollama is just the starting point. Once you're running models locally, a whole ecosystem of tools opens up:

Open WebUI — a beautiful ChatGPT-like interface for Ollama models, runs in your browser entirely locally
LM Studio — another option for running models with a graphical interface
Vox Bar — local speech-to-text that lets you dictate into any of these tools using your voice
Continue.dev — connects local models to VS Code for AI-assisted coding

The common thread? Everything runs on your hardware. Your conversations with the AI, your documents, your code — it all stays on your machine. This is the privacy promise that cloud AI can never make.

Why Local AI Matters

Every time you use ChatGPT, Claude, or Gemini in the cloud, your prompt is sent to a remote server, processed, logged, and potentially used for training future models. You're paying a monthly subscription for the privilege of giving away your data.

With Ollama and local models, the equation flips. You own the model. You own the hardware. You own the conversation. Nobody can revoke your access, raise prices, or change the terms. The model on your hard drive works the same today as it will in five years — with or without an internet connection.

And as models continue to get smaller and more capable, the gap between local and cloud AI is shrinking fast. For transcription, writing, coding assistance, and everyday questions, local models are already good enough.

Add voice to your local AI stack

Vox Bar brings local speech-to-text to Ollama, Open WebUI, and every app on your PC.

Coming Soon Early Bird