runlocal.cc
Check My GPU →

Gemma 4 31B

Apache 2.0

Google's flagship dense 31B model with 256K context. Near-frontier quality, top open-source performer on code and reasoning. Arena Elo ~1452.

Provider

Google

Parameters

31B

Context

262.144K

Released

2026-04-08

VRAM Requirements by Quantization

MethodDisk SizeVRAM RequiredFits GPUs
Q8_031 GB34 GB4 GPUs
Q4_K_M17 GB19 GB9 GPUs
Q4_015.5 GB17.5 GB9 GPUs
Q2_K9.5 GB11 GB18 GPUs

Install with Ollama

Run in terminal:

ollama pull gemma4:31b

Minimum 11GB VRAM required. Install Ollama from ollama.com

Benchmark Scores

mmlu89%
humaneval82%

Scores are approximate and may vary by quantization level.

MTP (Multi-Token Prediction)

Model ships with MTP heads — works with llama.cpp ≥ 2026-05-16, LM Studio, Lemonade.

Community speedup numbers not yet collected for this model.

Google ships MTP drafter weights on HuggingFace + Ollama v0.23.1; concrete community speedup numbers still emerging.

Compatible GPUs (18)

HuggingFace

google/gemma-4-31b-it

View on HF →