Nemotron-3 Nano Omni 30B-A3B

MoENVIDIA Open License

Hybrid Mamba-2 + MoE + Attention architecture. Native unified four-modality (text/image/video/audio) understanding; best-in-class on MMLongBench-Doc / OCRBenchV2 / VoiceBench. Mamba layers give ~4× compute efficiency. 256K context (up to 1M). BF16 on HuggingFace, Unsloth GGUF and free OpenRouter tier available.

Provider

NVIDIA

Parameters

3.5B active / 30B total (Mamba-2 + MoE + Attention)

Context

262.144K

Released

2026-04-29

VRAM Requirements by Quantization

Method	Disk Size	VRAM Required	Fits GPUs
Q8_0	31 GB	33 GB	4 GPUs
Q4_K_M	17.5 GB	19 GB	9 GPUs
Q4_0	16.5 GB	18 GB	9 GPUs
Q2_K	10.5 GB	12 GB	16 GPUs

Install with Ollama

Run in terminal:

ollama pull nemotron3-nano-omni

Minimum 12GB VRAM required. Install Ollama from ollama.com

Benchmark Scores

mmlu80.5%

humaneval78%

Scores are approximate and may vary by quantization level.

Compatible GPUs (16)

AMD RX 9070 XT (16GB)AMD RX 7900 GRE (16GB)AMD RX 7900 XTX (24GB)AMD Ryzen AI Max+ 395 (unified memory) (64GB)Apple M4 Pro (24GB) (24GB)Apple M3 Max (36GB) (36GB)Apple M4 Max (48GB) (48GB)Apple M4 Ultra (64GB) (64GB)NVIDIA RTX 4070 Ti SUPER (16GB)NVIDIA RTX 4080 SUPER (16GB)NVIDIA RTX 5070 Ti (16GB)NVIDIA RTX 4060 Ti 16GB (16GB)NVIDIA RTX 5080 (16GB)NVIDIA RTX 4090 (24GB)NVIDIA RTX 3090 (24GB)NVIDIA RTX 5090 (32GB)

HuggingFace

nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning

View on HF →