runlocal.devCheck My GPU →

Llama 4 Scout

MoELlama 4

Meta's efficient MoE model with an unprecedented 10M token context window. 17B active parameters from a 109B total pool.

Provider

Meta

Parameters

17B active (109B total MoE)

Context

10,485.76K

Released

2026-04-05

VRAM Requirements by Quantization

MethodDisk SizeVRAM RequiredFits GPUs
Q4_K_M53 GB58 GB0 GPUs
Q4_050 GB55 GB0 GPUs
Q2_K32 GB35 GB1 GPUs

Install with Ollama

Run in terminal:

ollama pull llama4:scout

Minimum 35GB VRAM required. Install Ollama from ollama.com

Benchmark Scores

mmlu79.8%
humaneval75.3%

Scores are approximate and may vary by quantization level.

Compatible GPUs (1)

HuggingFace

meta-llama/Llama-4-Scout-17B-16E

View on HF →