Home / How-to / How to Run Ollama Locally: Windows, Mac & Linux Guide (2026)

How to Run Ollama Locally: Windows, Mac & Linux Guide (2026)

How to Run Ollama Locally: Windows, Mac & Linux Guide (2026) | Photo by Lukas on Unsplash
Table of Contents
  1. What Is Ollama and Why Run AI Models Locally?
  2. How to Install Ollama on Windows, Mac, and Linux
  3. How to Run Your First AI Model with Ollama
  4. Best AI Models to Run with Ollama in 2026
  5. Common Questions — How to Use Ollama Locally
  6. Conclusion

Key Takeaways

  • Ollama hit 52M monthly downloads in Q1 2026 — the fastest path to running open-weight LLMs locally on your own hardware.
  • Minimum viable setup: 16GB RAM and a mid-range GPU run Llama 3.3 8B comfortably; 32GB+ unlocks Gemma 2 27B and Qwen 2.5 32B.
  • Install is one command on Mac, Linux, or Windows — then `ollama run llama3.3` pulls the model and drops you into a chat.
  • Best 2026 local models: Llama 3.3, Gemma 2, Qwen 2.5, Mistral Small 3, and DeepSeek-Coder for programming tasks.
  • Privacy, offline access, and zero per-token cost make Ollama ideal for sensitive work or heavy experimentation.

Want to run powerful AI models on your own computer — completely free, with no API keys, no usage limits, and zero data sent to the cloud? Learning how to use Ollama locally is the fastest way to get there. Ollama is an open-source tool that lets you download and run large language models (LLMs) like Llama 3, Mistral, and Gemma 2 directly on your machine. Whether you’re on Windows, Mac, or Linux, this guide walks you through every step: installing Ollama, pulling your first model, running it from the terminal, and choosing the right model for your needs. No cloud subscription required — just your hardware and a few commands.

Top view of NVIDIA GTX 1080 and RTX 2080 graphics cards used in advanced computer setups. — Photo by Nana  Dua on Pexels

What Is Ollama and Why Run AI Models Locally?

Ollama is a free, open-source runtime that simplifies downloading, managing, and running LLMs on your local machine. Think of it as Docker — but for AI models. With a single command you can pull a model, and with another you can start chatting with it in your terminal or integrate it into your own apps via a local REST API.

There are three compelling reasons to run AI locally instead of relying on cloud services like ChatGPT or Claude:

  • Privacy: Every prompt and response stays on your machine. Nothing is logged by a third party or used to train future models.
  • Cost: Once the model is downloaded, inference is completely free — no per-token billing, no monthly subscription.
  • Control: You choose which model to run, you can run it offline, and you can customize its behavior through system prompts or fine-tuning.

Ollama supports macOS, Linux, and Windows (currently in preview). It handles model quantization automatically, so even consumer hardware with 8 GB of RAM can run capable 7B-parameter models at a usable speed.

How to Install Ollama on Windows, Mac, and Linux

GeForce RTX graphics card installed inside a PC, glowing under warm lighting. — Photo by Matheus Bertelli on Pexels

Installation takes under five minutes on any platform. Follow the steps for your operating system below.

macOS

  1. Open your browser and go to ollama.com/download.
  2. Click Download for macOS — this gives you a standard .dmg installer.
  3. Open the downloaded file, drag Ollama into your Applications folder, and launch it.
  4. Ollama will appear in your menu bar. Open Terminal and verify the install:
ollama --version

Ollama runs as a background service on macOS. It supports both Apple Silicon (M1/M2/M3/M4) and Intel Macs, and it automatically uses the Metal GPU on Apple Silicon for significantly faster inference.

Linux

Linux installation is a single command. Open your terminal and run:

curl -fsSL https://ollama.com/install.sh | sh

The installer detects your distribution, installs the binary to /usr/local/bin/ollama, and sets up a systemd service that starts automatically on boot. It also detects NVIDIA or AMD GPUs and configures GPU acceleration if available. After installation, confirm it’s running:

ollama --version
systemctl status ollama

Windows

  1. Visit ollama.com/download/windows and download the OllamaSetup.exe installer.
  2. Run the installer — it installs Ollama and adds it to your system PATH automatically.
  3. Open PowerShell or Command Prompt and verify:
ollama --version

Windows support is currently in preview. NVIDIA GPU acceleration works via CUDA; AMD GPU support is improving. If you hit issues, ensure your GPU drivers are up to date and that you have the Visual C++ Redistributable installed.

Minimum system requirements (all platforms): 8 GB RAM (16 GB recommended for 7B models), 10–15 GB free disk space per model, GPU optional but strongly recommended for speed.

How to Run Your First AI Model with Ollama

Once Ollama is installed, running a model is a single command. Let’s start with Meta’s Llama 3 8B — one of the best open-weight models available in 2026.

Step 1: Pull and run Llama 3

ollama run llama3

This command downloads the Llama 3 8B model (~4.7 GB) the first time you run it, then drops you straight into an interactive chat session. Type your prompt and press Enter.

Step 2: Pull a model separately (without running it immediately)

ollama pull mistral

Step 3: List all models you have downloaded

ollama list

Step 4: Serve Ollama as a local REST API

If you want to integrate Ollama with your own apps, scripts, or tools like Open WebUI, start the API server:

ollama serve

This exposes a REST API on http://localhost:11434. You can send requests to it exactly like the OpenAI API, making it a drop-in local replacement for many tools. For example:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt": "Explain how transformers work in simple terms"
}'

On macOS and Linux, Ollama already runs the server in the background automatically — you only need ollama serve if you stopped it manually. Check out our how-to guides for tutorials on building apps on top of the Ollama API.

Best AI Models to Run with Ollama in 2026

Ollama’s model library has grown substantially. Here’s a comparison of the most popular models available today, including their size, ideal use case, and RAM requirements:

ModelPull CommandSizeRAM NeededBest For
Llama 3 8Bollama run llama34.7 GB8 GBGeneral chat, summarization
Mistral 7Bollama run mistral4.1 GB8 GBInstruction following, fast replies
Phi-3 Miniollama run phi32.3 GB4 GBLow-resource devices, quick tasks
Gemma 2 9Bollama run gemma25.4 GB8 GBReasoning, Q&A, analysis
CodeLlama 7Bollama run codellama3.8 GB8 GBCode generation, debugging
DeepSeek-R1 7Bollama run deepseek-r14.7 GB8 GBStep-by-step reasoning, math

Recommendation: Start with Llama 3 8B for general use — it’s the most well-rounded model at this size. If you’re on a machine with only 8 GB RAM, try Phi-3 Mini first; it’s surprisingly capable and much lighter. For coding tasks, CodeLlama is purpose-built and outperforms general models on code completion and debugging. If you want chain-of-thought reasoning similar to o1-class models, DeepSeek-R1 is worth exploring. You can find the full model library with all variants at ollama.com/library.

Common Questions — How to Use Ollama Locally

Do I need a GPU to run Ollama?

No — Ollama runs on CPU-only machines. However, a GPU makes a significant difference in speed. On a modern CPU, a 7B model might generate 5–10 tokens per second, which feels slow for conversation. An NVIDIA GPU (RTX 3060 or better) or Apple Silicon chip typically delivers 40–80+ tokens per second for the same model. For occasional use on a laptop, CPU-only is fine. For regular or production use, a dedicated GPU is worth it.

Is Ollama free to use?

Yes, completely. Ollama is open-source (MIT license) and free to download and use without any account, API key, or payment. The models it supports — Llama 3, Mistral, Gemma 2, etc. — are also open-weight models released free for personal and commercial use (check each model’s specific license). You only pay for the electricity and hardware you already own.

How is Ollama different from running models on Hugging Face or Google Colab?

Hugging Face and Colab require you to write Python code, manage dependencies, and often deal with CUDA configuration. Ollama abstracts all of that — it’s a single binary that handles model download, quantization, and inference with no Python environment needed. It’s also persistent: models stay on your machine ready to use anytime, with no session timeouts or idle shutdowns. For developers who want a dead-simple local AI backend, Ollama is hard to beat.

Can I use Ollama with a chat interface instead of the terminal?

Yes. Once Ollama is running, you can connect several open-source chat UIs to it. The most popular is Open WebUI (formerly Ollama WebUI), which gives you a ChatGPT-like browser interface that talks to Ollama’s local API on port 11434. Other options include Chatbox, LM Studio (which has its own runtime but supports the same models), and VS Code extensions like Continue for AI-assisted coding. These tools require no additional configuration beyond pointing them to http://localhost:11434. For more developer tools and integrations, browse our Dev & IT Ops articles.

Conclusion

Running AI models locally with Ollama is now genuinely practical for everyday users. Installation takes under five minutes, models like Llama 3 and Mistral run well on standard laptops, and the privacy benefit — keeping every prompt on your own machine — is hard to overstate. Whether you need a general-purpose assistant, a coding helper, or a reasoning engine, there’s an open-weight model ready to pull and run right now.

To go further, explore our collection of step-by-step how-to guides for tutorials on building apps on top of the Ollama API, and check the Dev & IT Ops section for guides on self-hosting AI tools, setting up GPU servers, and integrating local models into your development workflow.

About the author: TouchEVA is a tech journalist covering AI, software, and cybersecurity for Hubkub.com — independent tech media since 2025.

Last Updated: April 13, 2026

TouchEVA

TouchEVA

Founder and lead writer at Hubkub. Covers software, AI tools, cybersecurity, and practical Windows/Linux workflows.

Tagged: