Local AI Models

14 models tracked · Click any model to see hardware requirements and setup guide

Alibaba

Qwen 3.5 27B

27BQwen

11–29GB

VRAM

Balanced 27B model with strong reasoning. Runs on 16GB VRAM with Q4 quantization.

ollama pull qwen3.5:27b

Qwen 3.5 3B

3BQwen

2.5–4GB

VRAM

Ultra-compact 3B model for edge devices and low VRAM setups. Runs on 4GB VRAM.

ollama pull qwen3.5:3b

Qwen 3.5 72B

72BQwen

27–44GB

VRAM

Alibaba's flagship 72B model with exceptional multilingual capabilities and strong reasoning. Requires multi-GPU or high VRAM setup.

ollama pull qwen3.5:72b

Qwen 3.5 9B

9BQwen

6.2–10.5GB

VRAM

Highly capable 9B model, excellent for consumer hardware. Punches well above its weight class in reasoning tasks.

ollama pull qwen3.5:9b

DeepSeek

DeepSeek R1 7B

7BMIT

5.2–8.5GB

VRAM

DeepSeek's 7B reasoning-focused distilled model. Strong chain-of-thought reasoning, runs on 8GB VRAM.

ollama pull deepseek-r1:7b

Google

Gemma 4 27B

27B (4B active MoE)MoEGemma

11.5–30GB

VRAM

Google's 27B MoE model with only 4B active parameters per token. Near-frontier quality at a fraction of compute cost.

ollama pull gemma4:27b

Gemma 4 31B

31BApache 2.0

11–34GB

VRAM

Google's flagship dense 31B model with 256K context. Near-frontier quality, top open-source performer on code and reasoning. Arena Elo ~1452.

ollama pull gemma4:31b

Gemma 4 E2B

2.3B active / 5B totalMoEApache 2.0

3.2–6GB

VRAM

Google's ultra-compact multimodal MoE. Only 2.3B active params with full text/image/audio support. Lowest VRAM entry point in the Gemma 4 family.

ollama pull gemma4:e2b

Gemma 4 E4B

4B active (MoE)MoEGemma

3.2–5.5GB

VRAM

Google's efficient 4B-active MoE model. Excellent performance per compute unit, runs on modest consumer hardware.

ollama pull gemma4:e4b

Microsoft

Phi-4

14BMIT

9–16GB

VRAM

Microsoft's 14B model with exceptional reasoning for its size. Particularly strong on math, science, and STEM tasks.

ollama pull phi4

MiniMax

MiniMax M2.7

Unknown (MoE)MoEApache 2.0

31–50GB

VRAM

MiniMax's self-evolving MoE model with 1M token context. Recently open-sourced under Apache 2.0 license.

ollama pull minimax-m2.7

Mistral AI

Mistral Small 3.2

22BApache 2.0

13.5–24.5GB

VRAM

Mistral's efficient 22B model with strong instruction following and multilingual support. Apache 2.0 license.

ollama pull mistral-small

Zhipu AI

GLM 5.1

~32BGLM

19–35GB

VRAM

#1 open-source model on Code Arena. Exceptional for software development, within 3 points of Claude Opus on agentic benchmarks.

ollama pull glm-5.1

Local AI Models

Alibaba

DeepSeek

Google

Meta

Microsoft

MiniMax

Mistral AI

Zhipu AI