runlocal.cc
Check My GPU →

Nemotron-3 Nano Omni 30B-A3B

MoENVIDIA Open License

Hybrid Mamba-2 + MoE + Attention architecture. Native unified four-modality (text/image/video/audio) understanding; best-in-class on MMLongBench-Doc / OCRBenchV2 / VoiceBench. Mamba layers give ~4× compute efficiency. 256K context (up to 1M). BF16 on HuggingFace, Unsloth GGUF and free OpenRouter tier available.

Provider

NVIDIA

Parameters

3.5B active / 30B total (Mamba-2 + MoE + Attention)

Context

262.144K

Released

2026-04-29

VRAM Requirements by Quantization

MethodDisk SizeVRAM RequiredFits GPUs
Q8_031 GB33 GB4 GPUs
Q4_K_M17.5 GB19 GB9 GPUs
Q4_016.5 GB18 GB9 GPUs
Q2_K10.5 GB12 GB16 GPUs

Install with Ollama

Run in terminal:

ollama pull nemotron3-nano-omni

Minimum 12GB VRAM required. Install Ollama from ollama.com

Benchmark Scores

mmlu80.5%
humaneval78%

Scores are approximate and may vary by quantization level.

Compatible GPUs (16)

HuggingFace

nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning

View on HF →