DeepSeek V4 Pro
MoEDeepSeek
DeepSeek's frontier 1.6T MoE, trained with FP4 QAT at scale. Novel Hybrid Attention (CSA+HCA) and Manifold-Constrained Hyper-Connections replace residuals. $1.74/$3.48 per M tokens via API; not local-runnable.
Provider
DeepSeek
Parameters
862B active / 1.6T total (MoE)
Context
128K
Released
2026-04-24
VRAM Requirements by Quantization
| Method | Disk Size | VRAM Required | Fits GPUs |
|---|---|---|---|
| BF16 (reference) | 2900 GB | 3100 GB | 0 GPUs |
Benchmark Scores
mmlu91%
humaneval93.5%
Scores are approximate and may vary by quantization level.
HuggingFace
deepseek-ai/DeepSeek-V4-Pro