Check My GPU →

DeepSeek V4 Flash

MoEDeepSeek

Distilled Flash variant of V4 Pro. Near-Pro performance on simple agent tasks at 1/12 the cost ($0.14/$0.28 per M tokens). Inference gap closes to zero on routine tasks vs V4 Pro. Not local-runnable on consumer hardware.

Provider

DeepSeek

Parameters

158B active / 292B total (MoE)

Context

128K

Released

2026-04-24

VRAM Requirements by Quantization

Method	Disk Size	VRAM Required	Fits GPUs
BF16 (reference)	530 GB	580 GB	0 GPUs
Q4_K_M (est.)	165 GB	180 GB	0 GPUs

Benchmark Scores

mmlu88%

humaneval90%

Scores are approximate and may vary by quantization level.

HuggingFace

deepseek-ai/DeepSeek-V4-Flash