runlocal.devCheck My GPU →

DeepSeek V4 Flash

MoEDeepSeek

Distilled Flash variant of V4 Pro. Near-Pro performance on simple agent tasks at 1/12 the cost ($0.14/$0.28 per M tokens). Inference gap closes to zero on routine tasks vs V4 Pro. Not local-runnable on consumer hardware.

Provider

DeepSeek

Parameters

158B active / 292B total (MoE)

Context

128K

Released

2026-04-24

VRAM Requirements by Quantization

MethodDisk SizeVRAM RequiredFits GPUs
BF16 (reference)530 GB580 GB0 GPUs
Q4_K_M (est.)165 GB180 GB0 GPUs

Benchmark Scores

mmlu88%
humaneval90%

Scores are approximate and may vary by quantization level.

HuggingFace

deepseek-ai/DeepSeek-V4-Flash

View on HF →