Gemma 4 31B
Apache 2.0
Google's flagship dense 31B model with 256K context. Near-frontier quality, top open-source performer on code and reasoning. Arena Elo ~1452.
Provider
Parameters
31B
Context
262.144K
Released
2026-04-08
VRAM Requirements by Quantization
| Method | Disk Size | VRAM Required | Fits GPUs |
|---|---|---|---|
| Q8_0 | 31 GB | 34 GB | 4 GPUs |
| Q4_K_M | 17 GB | 19 GB | 9 GPUs |
| Q4_0 | 15.5 GB | 17.5 GB | 9 GPUs |
| Q2_K | 9.5 GB | 11 GB | 18 GPUs |
Install with Ollama
Benchmark Scores
mmlu89%
humaneval82%
Scores are approximate and may vary by quantization level.
MTP (Multi-Token Prediction)
Model ships with MTP heads — works with llama.cpp ≥ 2026-05-16, LM Studio, Lemonade.
Community speedup numbers not yet collected for this model.
Google ships MTP drafter weights on HuggingFace + Ollama v0.23.1; concrete community speedup numbers still emerging.
Compatible GPUs (18)
AMD RX 9070 XT (16GB)AMD RX 7900 GRE (16GB)AMD RX 7900 XTX (24GB)AMD Ryzen AI Max+ 395 (unified memory) (64GB)Apple M4 Pro (24GB) (24GB)Apple M3 Max (36GB) (36GB)Apple M4 Max (48GB) (48GB)Apple M4 Ultra (64GB) (64GB)NVIDIA RTX 3080 12GB (12GB)NVIDIA RTX 4070 SUPER (12GB)NVIDIA RTX 4070 Ti SUPER (16GB)NVIDIA RTX 4080 SUPER (16GB)NVIDIA RTX 5070 Ti (16GB)NVIDIA RTX 4060 Ti 16GB (16GB)NVIDIA RTX 5080 (16GB)NVIDIA RTX 4090 (24GB)NVIDIA RTX 3090 (24GB)NVIDIA RTX 5090 (32GB)
HuggingFace
google/gemma-4-31b-it