Llama 4 Scout

MoELlama 4

Meta's efficient MoE model with an unprecedented 10M token context window. 17B active parameters from a 109B total pool.

Provider

VRAM Requirements by Quantization

Method	Disk Size	VRAM Required	Fits GPUs
Q4_K_M	53 GB	58 GB	0 GPUs
Q4_0	50 GB	55 GB	0 GPUs
Q2_K	32 GB	35 GB	1 GPUs

Run in terminal:

ollama pull llama4:scout

Minimum 35GB VRAM required. Install Ollama from ollama.com

mmlu79.8%

humaneval75.3%

Scores are approximate and may vary by quantization level.

HuggingFace

meta-llama/Llama-4-Scout-17B-16E