Local AI Models
14 models tracked · Click any model to see hardware requirements and setup guide
Alibaba
Qwen 3.5 27B
11–29GB
VRAM
Balanced 27B model with strong reasoning. Runs on 16GB VRAM with Q4 quantization.
ollama pull qwen3.5:27b
Qwen 3.5 3B
2.5–4GB
VRAM
Ultra-compact 3B model for edge devices and low VRAM setups. Runs on 4GB VRAM.
ollama pull qwen3.5:3b
Qwen 3.5 72B
27–44GB
VRAM
Alibaba's flagship 72B model with exceptional multilingual capabilities and strong reasoning. Requires multi-GPU or high VRAM setup.
ollama pull qwen3.5:72b
Qwen 3.5 9B
6.2–10.5GB
VRAM
Highly capable 9B model, excellent for consumer hardware. Punches well above its weight class in reasoning tasks.
ollama pull qwen3.5:9b
Gemma 4 27B
11.5–30GB
VRAM
Google's 27B MoE model with only 4B active parameters per token. Near-frontier quality at a fraction of compute cost.
ollama pull gemma4:27b
Gemma 4 31B
11–34GB
VRAM
Google's flagship dense 31B model with 256K context. Near-frontier quality, top open-source performer on code and reasoning. Arena Elo ~1452.
ollama pull gemma4:31b
Gemma 4 E2B
3.2–6GB
VRAM
Google's ultra-compact multimodal MoE. Only 2.3B active params with full text/image/audio support. Lowest VRAM entry point in the Gemma 4 family.
ollama pull gemma4:e2b
Gemma 4 E4B
3.2–5.5GB
VRAM
Google's efficient 4B-active MoE model. Excellent performance per compute unit, runs on modest consumer hardware.
ollama pull gemma4:e4b