Qwen 3.6 35B-A3B

MoEQwen

Alibaba's sparse MoE with 35B total and only 3B active params. Major jump on coding benchmarks vs Qwen 3.5, with inference cost closer to a 3B dense model. Abliterated variant also on HuggingFace.

Provider

Alibaba

Parameters

3B active / 35B total (MoE)

Context

131.072K

Released

2026-04-17

VRAM Requirements by Quantization

Method	Disk Size	VRAM Required	Fits GPUs
Q8_0	36 GB	38 GB	1 GPUs
Q4_K_M	19.5 GB	21 GB	7 GPUs
Q4_0	18.5 GB	20 GB	7 GPUs
Q2_K	11.5 GB	13 GB	12 GPUs

Install with Ollama

Run in terminal:

ollama pull qwen3.6:35b-a3b

Minimum 13GB VRAM required. Install Ollama from ollama.com

Benchmark Scores

mmlu83.5%

humaneval88.2%

Scores are approximate and may vary by quantization level.

Compatible GPUs (12)

AMD RX 7900 GRE (16GB)AMD RX 7900 XTX (24GB)Apple M4 Pro (24GB) (24GB)Apple M3 Max (36GB) (36GB)Apple M4 Max (48GB) (48GB)NVIDIA RTX 4080 SUPER (16GB)NVIDIA RTX 4060 Ti 16GB (16GB)NVIDIA RTX 4070 Ti SUPER (16GB)NVIDIA RTX 5080 (16GB)NVIDIA RTX 3090 (24GB)NVIDIA RTX 4090 (24GB)NVIDIA RTX 5090 (32GB)

HuggingFace

Qwen/Qwen3.6-35B-A3B-Instruct

View on HF →