DeepSeek V4 Flash
MoEDeepSeek
Distilled Flash variant of V4 Pro. Near-Pro performance on simple agent tasks at 1/12 the cost ($0.14/$0.28 per M tokens). Inference gap closes to zero on routine tasks vs V4 Pro. Not local-runnable on consumer hardware.
Provider
DeepSeek
Parameters
158B active / 292B total (MoE)
Context
128K
Released
2026-04-24
VRAM Requirements by Quantization
| Method | Disk Size | VRAM Required | Fits GPUs |
|---|---|---|---|
| BF16 (reference) | 530 GB | 580 GB | 0 GPUs |
| Q4_K_M (est.) | 165 GB | 180 GB | 0 GPUs |
Benchmark Scores
mmlu88%
humaneval90%
Scores are approximate and may vary by quantization level.
HuggingFace
deepseek-ai/DeepSeek-V4-Flash