Local AI Models
36 models tracked · Click any model to see hardware requirements and setup guide
Alibaba
Qwen 3.5 27B
11–29GB
VRAM
Balanced 27B model with strong reasoning. Runs on 16GB VRAM with Q4 quantization.
ollama pull qwen3.5:27b
Qwen 3.5 3B
2.5–4GB
VRAM
Ultra-compact 3B model for edge devices and low VRAM setups. Runs on 4GB VRAM.
ollama pull qwen3.5:3b
Qwen 3.5 72B
27–44GB
VRAM
Alibaba's flagship 72B model with exceptional multilingual capabilities and strong reasoning. Requires multi-GPU or high VRAM setup.
ollama pull qwen3.5:72b
Qwen 3.5 9B
6.2–10.5GB
VRAM
Highly capable 9B model, excellent for consumer hardware. Punches well above its weight class in reasoning tasks.
ollama pull qwen3.5:9b
Qwen 3.6 27B
11–30GB
VRAM
Alibaba's 27B dense multimodal model. SWE-bench Verified 77.2 — surpasses Qwen 3.5 397B on coding. Native image/video/text input; compatible with Claude Code and Qwen Code tooling.
ollama pull qwen3.6:27b
Qwen 3.6 35B-A3B
13–38GB
VRAM
Alibaba's sparse MoE with 35B total and only 3B active params. Major jump on coding benchmarks vs Qwen 3.5, with inference cost closer to a 3B dense model. Abliterated variant also on HuggingFace.
ollama pull qwen3.6:35b-a3b
DeepSeek
DeepSeek R1 7B
5.2–8.5GB
VRAM
DeepSeek's 7B reasoning-focused distilled model. Strong chain-of-thought reasoning, runs on 8GB VRAM.
ollama pull deepseek-r1:7b
DeepSeek V4 Flash
180–580GB
VRAM
Distilled Flash variant of V4 Pro. Near-Pro performance on simple agent tasks at 1/12 the cost ($0.14/$0.28 per M tokens). Inference gap closes to zero on routine tasks vs V4 Pro. Not local-runnable on consumer hardware.
DeepSeek V4 Pro
3100–3100GB
VRAM
DeepSeek's frontier 1.6T MoE, trained with FP4 QAT at scale. Novel Hybrid Attention (CSA+HCA) and Manifold-Constrained Hyper-Connections replace residuals. $1.74/$3.48 per M tokens via API; not local-runnable.
Gemma 4 27B
11.5–30GB
VRAM
Google's 27B MoE model with only 4B active parameters per token. Near-frontier quality at a fraction of compute cost.
ollama pull gemma4:27b
Gemma 4 31B
11–34GB
VRAM
Google's flagship dense 31B model with 256K context. Near-frontier quality, top open-source performer on code and reasoning. Arena Elo ~1452.
ollama pull gemma4:31b
Gemma 4 E2B
3.2–6GB
VRAM
Google's ultra-compact multimodal MoE. Only 2.3B active params with full text/image/audio support. Lowest VRAM entry point in the Gemma 4 family.
ollama pull gemma4:e2b
Gemma 4 E4B
3.2–5.5GB
VRAM
Google's efficient 4B-active MoE model. Excellent performance per compute unit, runs on modest consumer hardware.
ollama pull gemma4:e4b
OpenAI
GPT-5.5
OpenAI's new flagship multimodal model — marketed as 'genius-grade visual IQ'. Closed weights, API-only; listed as a cloud reference point for local-model benchmarking.
GPT-5.5 Pro
GPT-5.5's deep-reasoning tier: passes Gödel-style tests and has cracked open math conjectures. API-only premium model; included strictly as a frontier comparison anchor.
PrismML
Ternary Bonsai 1.7B
0.6–0.6GB
VRAM
Edge-class ternary model — runs on phones and small embedded devices. 1.58-bit quantization at 1.7B parameters. MLX packed only today; llama.cpp / vLLM ports in progress.
Ternary Bonsai 4B
1.2–1.2GB
VRAM
Mid-tier ternary model. 1.58-bit weights {-1, 0, +1} give laptop-class footprint with capability that beats dense peers at the same byte budget. MLX 2-bit packed only for now.
Ternary Bonsai 8B
2–2GB
VRAM
1.58-bit ternary quantization — weights are only {-1, 0, +1}. Memory footprint ~1/9 of FP16 at the same parameter count. MLX 2-bit packed format today; other backends coming soon.
Zhipu AI
GLM 4.6
105–195GB
VRAM
Zhipu's 357B MoE. $0.6/M tokens via API; local deployment needs 8×H200 or equivalent multi-GPU with vLLM v0.19+. Not a consumer-GPU target.
GLM 5
215–400GB
VRAM
Zhipu's 744B frontier MoE. $1.0/M tokens via API. Cluster-scale deployment only; expect 200GB+ VRAM even at aggressive quantization.
GLM 5.1
220–410GB
VRAM
Zhipu's 754B flagship MoE. $1.4/M tokens via API; strong on agentic coding benchmarks. Not local-runnable on consumer hardware — included for completeness alongside GLM-5.