runlocal.cc
Check My GPU →

Blog

Condensed weekly issues — model releases, benchmarks, Ollama workflows. Also published on Substack.

Issue #12May 27, 2026

A FLUX-class 4B image model, squeezed to 1.21 GB — and yes, it runs in your browser

PrismML's Bonsai Image quantizes a FLUX.2-derived diffusion transformer down to sub-2-bit weights, with MLX, CUDA, WebGPU and iPhone builds. The footprint and speed are real; the quality cost is real too. Here's the honest tradeoff — official numbers, our own benchmarks pending.

Issue #11May 21, 2026

Claude shipped 'Agent Skills'. r/LocalLLaMA already converged on the canonical 4 — copy these into your local agent today.

Plan-first, test-first, refactor-with-constraint, debug-loop. Four skill files that should ship with every local-LLM coding agent — sourced from the threads that actually got Qwen3.6 27B to daily-driver quality.

Issue #10May 21, 2026

Stop trying to use Cline locally. r/LocalLLaMA's real answer for daily-driving Qwen3.6 27B + MTP.

Cloud agents fall apart on local models. Three scaffold-first tools the community is actually shipping with — SmallCode, PI Coding Agent, and little-coder — plus a decision matrix by VRAM.

Issue #9May 21, 2026

llama.cpp + MTP is here: Qwen3.6 27B hits 2.17× on an RTX 3090 — should you upgrade tonight?

Multi-Token Prediction landed in mainline llama.cpp this week. The real numbers across RTX 5090 / 3090 / Strix Halo / 8GB cards, and a one-table answer to whether MTP is worth your weekend.

Issue #8May 20, 2026

AI coding agents compared (2026): Claude Code vs Cursor vs OpenCode vs OpenClaw vs Gemini CLI vs Cluely vs z.ai — and the best local models on Ollama

Seven cloud agents, four local models, one decision matrix. Pricing, underlying models, benchmark numbers, and the cases where each one is the right pick.

Issue #7May 20, 2026

OpenClaw + Ollama: the complete setup after Anthropic's third-party clampdown

Search interest in `openclaw ollama` is up 507,000% in 30 days. Here's the full migration path off Claude Pro — install, model pick, common errors, and what you actually give up.

Issue #6May 19, 2026

The inference-engine wave: MTP, DFlash, PAGED MoE

Single-card 600 tok/s. 397B on a 64GB Mac. 85 tok/s at 524k context. Three weeks of runtime breakthroughs reset what 'local' means.

Issue #5May 8, 2026

Omni-modal locals land, GPT-5.5 resets the frontier

Nemotron-3 Nano Omni puts four modalities on a 24GB card. OpenAI ships GPT-5.5 to everyone. Xiaomi enters the open-weight flagship tier.

Issue #4Apr 25, 2026

The local-AI map redrawn in 7 days

Qwen 3.6 27B beats a 397B predecessor. Gemma 4 26B-A4B lands with 22 quants. Kimi K2.6 hits Opus parity at 1T params.

Issue #3Apr 12, 2026

Persistent AI memory on a Raspberry Pi 5

Local embeddings + ChromaDB + Ollama in ~150 lines. ~$100 of hardware. No tokens.

Issue #2Apr 12, 2026

Gemma 4 changes local LLM — and the first killer use case is Claude Code

88% accuracy at 175 tok/s, 17GB VRAM, and how to cut your Claude Code bill with one env var

Issue #1Apr 11, 2026

Your local AI stack is already being scanned

113K requests, a Raspberry Pi honeypot, and the attack surface you didn't know you had