Table of Contents
Running powerful AI models once required expensive cloud subscriptions or enterprise-grade hardware. That changed in January 2025, when DeepSeek released R1 — a 671-billion-parameter reasoning model built for approximately $5.6 million, matching OpenAI’s o1 on key benchmarks. Today, you can run DeepSeek locally on a laptop with 8 GB of RAM, at zero recurring cost.

Cloud AI tools charge per token, log your prompts, and go offline when your internet connection drops. Running a model locally eliminates all three problems. Your queries and data never leave your machine.
This guide covers how to set up DeepSeek R1 on any Windows, macOS, or Linux PC using Ollama — a free, open-source model runner. You will learn which model size fits your hardware, how to install everything with three commands, and how to expose a local REST API for your own applications.
Why Run DeepSeek R1 Locally? Benefits That Actually Matter
DeepSeek R1 is not a standard chatbot. It uses chain-of-thought reasoning — working through problems step by step inside visible <think> tags before delivering its final answer. On the AIME 2024 math benchmark, R1 scores 79.8%, matching OpenAI’s o1. On MATH-500, it reaches 97.3%. These are significant results for a model anyone can download for free.
Running it locally delivers three concrete advantages over using a cloud API:
- Full privacy: Prompts never reach external servers. For legal documents, client data, or medical records, this matters significantly.
- No usage limits: No token caps, no rate limits, no subscription fees. Run it as much as your hardware allows.
- Offline access: Works without an internet connection once the model is downloaded — essential for air-gapped or travel environments.
DeepSeek R1 is licensed under the MIT License. It permits commercial use, modification, and redistribution. This is genuinely free software, not a freemium service with a paid tier waiting behind it.
For businesses in Southeast Asia processing customer data under strict privacy regulations, or individual developers building tools without cloud infrastructure costs, local AI deployment is increasingly the pragmatic choice. Explore our how-to tutorials on Hubkub for more practical guides on setting up AI tools on your own hardware.
System Requirements: Choosing the Right DeepSeek Model Size

The full 671B DeepSeek R1 model requires approximately 512 GB of RAM and a multi-GPU server. That rules it out for most users. Fortunately, Ollama provides smaller distilled variants — models trained on the full R1’s outputs — that run on standard consumer hardware with strong reasoning performance intact.
Which Model Size Fits Your Hardware?
Choose your variant based on available RAM or GPU VRAM:
| Model | RAM Needed | Storage | Best For |
|---|---|---|---|
| deepseek-r1:1.5b | ~2 GB | 1.1 GB | Low-end laptops, quick testing |
| deepseek-r1:7b | ~8 GB | 4.7 GB | Standard laptops, daily use |
| deepseek-r1:8b | ~8 GB | 5.2 GB | Recommended default |
| deepseek-r1:14b | ~16 GB | 9.0 GB | Mid-range workstations |
| deepseek-r1:32b | ~32 GB | 20 GB | High-end desktops with GPU |
| deepseek-r1:70b | ~48 GB | 43 GB | Servers, multi-GPU setups |
For most users, the 8b variant is the best starting point. It needs 8 GB of RAM and about 5 GB of disk space, and produces reasoning output meaningfully better than standard language models of similar size. The 1.5b model is only appropriate for testing the installation pipeline on very constrained hardware.
GPU owners with 8+ GB of VRAM will see substantial speed improvements. Ollama automatically offloads model layers to the GPU during inference. On CPU alone with the 8b model, expect 3–8 tokens per second — slower, but usable for non-time-critical tasks.
Step-by-Step: How to Install Ollama and Run DeepSeek R1
The installation process takes three commands. The steps are identical on Linux, macOS, and Windows via WSL2. Start with a fresh terminal window and follow each step in order.
Step 1 — Install Ollama
On Linux and macOS, run the official installer script:
curl -fsSL https://ollama.com/install.sh | sh
On Windows, download the native installer from ollama.com and run the .exe setup file. After installation, Ollama runs as a background service on localhost:11434.
Step 2 — Pull the DeepSeek R1 model
Pull your chosen model variant. Replace 8b with 1.5b, 7b, 14b, 32b, or 70b to match your hardware:
ollama pull deepseek-r1:8b
Ollama downloads the model file once (around 5.2 GB for the 8b variant). If the download is interrupted, running the same command will resume from where it stopped — no need to restart from scratch.
Step 3 — Start an interactive session
Launch a terminal chat session with the model:
ollama run deepseek-r1:8b
Type your prompt and press Enter. DeepSeek R1 first outputs its reasoning inside <think> tags, then delivers the final answer. This chain-of-thought trace is normal behavior — and confirms the model is processing entirely on your local hardware, not a remote server.
Optional: Access the model via REST API
Ollama exposes an OpenAI-compatible REST API on port 11434. Start the server:
ollama serve
Then send requests using any HTTP client or library:
curl http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{"model": "deepseek-r1:8b", "messages": [{"role": "user", "content": "Explain recursion simply"}]}'
Any application built against the OpenAI SDK can point at this local server by changing only the base URL. No other code modifications are required. This makes local DeepSeek R1 a practical drop-in replacement for paid API calls during development and testing.
Common Questions — — Run DeepSeek Locally
Q: What is the minimum RAM to run DeepSeek R1 locally?
A: The smallest variant, deepseek-r1:1.5b, runs on approximately 2 GB of RAM and requires just 1.1 GB of disk space. For practical everyday reasoning output, the 8b model is the minimum recommended choice, needing 8 GB of RAM. Machines with less than 8 GB will resort to disk swap, causing severe slowdowns that make the model nearly unusable.
Q: Is running DeepSeek R1 locally safe and private?
A: Yes. When you run DeepSeek R1 locally via Ollama, all inference happens on your hardware. No prompts, responses, or metadata are sent to DeepSeek or any external server. Ollama binds to localhost by default, keeping the model inaccessible from other devices on your network unless you explicitly change the host configuration.
Q: How fast is DeepSeek R1 on a regular laptop?
A: On a CPU-only machine with 8 GB of RAM running the 8b model, expect approximately 3–8 tokens per second. An NVIDIA GPU with 8 GB of VRAM pushes inference to 20–40 tokens per second. For conversational use, most people find 10 tokens per second is a comfortable reading pace, so CPU-only is viable for non-intensive tasks.
Q: Can DeepSeek R1 be used commercially if run locally?
A: Yes. DeepSeek R1 is released under the MIT License, which allows commercial use, modification, and redistribution with no fees. There are no per-token charges or usage restrictions imposed by DeepSeek for the open-weight models. Always review the official license terms directly at the DeepSeek repository before deploying in a production environment.
Conclusion
Running DeepSeek R1 locally is now practical for anyone with a standard laptop. The three key takeaways: the 8b model runs on 8 GB of RAM and scores 79.8% on AIME 2024 math benchmarks; Ollama installs in a single command and handles all model management automatically; and the built-in OpenAI-compatible API makes it straightforward to connect your local AI to existing tools and workflows.
Local AI deployment is no longer exclusive to researchers or enterprises. It is a genuinely practical option for privacy-conscious developers, cost-aware teams, and anyone who wants capable reasoning AI without monthly subscription fees or cloud dependency.
For the latest open-source model releases, tool updates, and AI research coverage, follow our AI news section on Hubkub.
Last Updated: April 13, 2026








