ブログ
週次イシューの精選版 — モデルリリース、ベンチマーク、Ollama ワークフロー。Substack でも公開中。
FLUX 級の 4B 画像モデルを 1.21 GB に圧縮 — しかもブラウザで動く
PrismML の Bonsai Image は、FLUX.2 派生の diffusion transformer を sub-2-bit まで量子化し、MLX・CUDA・WebGPU・iPhone 向けビルドを揃えた。フットプリントと速度は本物だが、品質の代償もまた本物だ。公式数値ベースの正直なトレードオフを示す(自前ベンチマークは追って計測)。
Claude が出した『Agent Skills』、r/LocalLLaMA は1ヶ月前に正準4本を書いていた
plan-first / test-first / refactor-with-constraint / debug-loop。すべてのローカル LLM コーディングエージェントが標準搭載すべき4つの skill ファイル ── Qwen3.6 27B を本当に daily driver にしたスレッドから抽出。
Stop trying to use Cline locally. r/LocalLLaMA's real answer for daily-driving Qwen3.6 27B + MTP.
Cloud agents fall apart on local models. Three scaffold-first tools the community is actually shipping with — SmallCode, PI Coding Agent, and little-coder — plus a decision matrix by VRAM.
llama.cpp + MTP is here: Qwen3.6 27B hits 2.17× on an RTX 3090 — should you upgrade tonight?
Multi-Token Prediction landed in mainline llama.cpp this week. The real numbers across RTX 5090 / 3090 / Strix Halo / 8GB cards, and a one-table answer to whether MTP is worth your weekend.
AI coding agents compared (2026): Claude Code vs Cursor vs OpenCode vs OpenClaw vs Gemini CLI vs Cluely vs z.ai — and the best local models on Ollama
Seven cloud agents, four local models, one decision matrix. Pricing, underlying models, benchmark numbers, and the cases where each one is the right pick.
OpenClaw + Ollama: the complete setup after Anthropic's third-party clampdown
Search interest in `openclaw ollama` is up 507,000% in 30 days. Here's the full migration path off Claude Pro — install, model pick, common errors, and what you actually give up.
The inference-engine wave: MTP, DFlash, PAGED MoE
Single-card 600 tok/s. 397B on a 64GB Mac. 85 tok/s at 524k context. Three weeks of runtime breakthroughs reset what 'local' means.
Omni-modal locals land, GPT-5.5 resets the frontier
Nemotron-3 Nano Omni puts four modalities on a 24GB card. OpenAI ships GPT-5.5 to everyone. Xiaomi enters the open-weight flagship tier.
The local-AI map redrawn in 7 days
Qwen 3.6 27B beats a 397B predecessor. Gemma 4 26B-A4B lands with 22 quants. Kimi K2.6 hits Opus parity at 1T params.
Persistent AI memory on a Raspberry Pi 5
Local embeddings + ChromaDB + Ollama in ~150 lines. ~$100 of hardware. No tokens.
Gemma 4 changes local LLM — and the first killer use case is Claude Code
88% accuracy at 175 tok/s, 17GB VRAM, and how to cut your Claude Code bill with one env var
Your local AI stack is already being scanned
113K requests, a Raspberry Pi honeypot, and the attack surface you didn't know you had