Qwen 3.6 35B-A3B
MoEQwen
Alibaba's sparse MoE with 35B total and only 3B active params. Major jump on coding benchmarks vs Qwen 3.5, with inference cost closer to a 3B dense model. Abliterated variant also on HuggingFace.
Provider
Alibaba
Parameters
3B active / 35B total (MoE)
Context
131.072K
Released
2026-04-17
VRAM Requirements by Quantization
| Method | Disk Size | VRAM Required | Fits GPUs |
|---|---|---|---|
| Q8_0 | 36 GB | 38 GB | 3 GPUs |
| Q4_K_M | 19.5 GB | 21 GB | 9 GPUs |
| Q4_0 | 18.5 GB | 20 GB | 9 GPUs |
| Q2_K | 11.5 GB | 13 GB | 16 GPUs |
Install with Ollama
Run in terminal:
ollama pull qwen3.6:35b-a3bMinimum 13GB VRAM required. Install Ollama from ollama.com
Benchmark Scores
mmlu83.5%
humaneval88.2%
Scores are approximate and may vary by quantization level.
MTP (Multi-Token Prediction)
Model ships with MTP heads — works with llama.cpp ≥ 2026-05-16, LM Studio, Lemonade.
HardwareSpeedup (TG)
AMD Strix Halo1.60×
MoE — MTP gains are mixed; routing limits draft-token verification per forward pass.
Compatible GPUs (16)
AMD RX 9070 XT (16GB)AMD RX 7900 GRE (16GB)AMD RX 7900 XTX (24GB)AMD Ryzen AI Max+ 395 (unified memory) (64GB)Apple M4 Pro (24GB) (24GB)Apple M3 Max (36GB) (36GB)Apple M4 Max (48GB) (48GB)Apple M4 Ultra (64GB) (64GB)NVIDIA RTX 4070 Ti SUPER (16GB)NVIDIA RTX 4080 SUPER (16GB)NVIDIA RTX 5070 Ti (16GB)NVIDIA RTX 4060 Ti 16GB (16GB)NVIDIA RTX 5080 (16GB)NVIDIA RTX 4090 (24GB)NVIDIA RTX 3090 (24GB)NVIDIA RTX 5090 (32GB)
HuggingFace
Qwen/Qwen3.6-35B-A3B-Instruct