ブログ

週次イシューの精選版 — モデルリリース、ベンチマーク、Ollama ワークフロー。Substack でも公開中。

Issue #122026年5月27日

FLUX 級の 4B 画像モデルを 1.21 GB に圧縮 — しかもブラウザで動く

PrismML の Bonsai Image は、FLUX.2 派生の diffusion transformer を sub-2-bit まで量子化し、MLX・CUDA・WebGPU・iPhone 向けビルドを揃えた。フットプリントと速度は本物だが、品質の代償もまた本物だ。公式数値ベースの正直なトレードオフを示す（自前ベンチマークは追って計測）。

Issue #112026年5月21日

Claude が出した『Agent Skills』、r/LocalLLaMA は1ヶ月前に正準4本を書いていた

plan-first / test-first / refactor-with-constraint / debug-loop。すべてのローカル LLM コーディングエージェントが標準搭載すべき4つの skill ファイル ── Qwen3.6 27B を本当に daily driver にしたスレッドから抽出。

Issue #102026年5月21日

Stop trying to use Cline locally. r/LocalLLaMA's real answer for daily-driving Qwen3.6 27B + MTP.

Cloud agents fall apart on local models. Three scaffold-first tools the community is actually shipping with — SmallCode, PI Coding Agent, and little-coder — plus a decision matrix by VRAM.

Issue #92026年5月21日

llama.cpp + MTP is here: Qwen3.6 27B hits 2.17× on an RTX 3090 — should you upgrade tonight?

Multi-Token Prediction landed in mainline llama.cpp this week. The real numbers across RTX 5090 / 3090 / Strix Halo / 8GB cards, and a one-table answer to whether MTP is worth your weekend.

Issue #82026年5月20日

AI coding agents compared (2026): Claude Code vs Cursor vs OpenCode vs OpenClaw vs Gemini CLI vs Cluely vs z.ai — and the best local models on Ollama

Seven cloud agents, four local models, one decision matrix. Pricing, underlying models, benchmark numbers, and the cases where each one is the right pick.

Issue #72026年5月20日

OpenClaw + Ollama: the complete setup after Anthropic's third-party clampdown

Search interest in `openclaw ollama` is up 507,000% in 30 days. Here's the full migration path off Claude Pro — install, model pick, common errors, and what you actually give up.

Issue #62026年5月19日

The inference-engine wave: MTP, DFlash, PAGED MoE

Single-card 600 tok/s. 397B on a 64GB Mac. 85 tok/s at 524k context. Three weeks of runtime breakthroughs reset what 'local' means.

Issue #52026年5月8日

Omni-modal locals land, GPT-5.5 resets the frontier

Nemotron-3 Nano Omni puts four modalities on a 24GB card. OpenAI ships GPT-5.5 to everyone. Xiaomi enters the open-weight flagship tier.

Issue #42026年4月25日

The local-AI map redrawn in 7 days

Qwen 3.6 27B beats a 397B predecessor. Gemma 4 26B-A4B lands with 22 quants. Kimi K2.6 hits Opus parity at 1T params.

Issue #32026年4月12日

Persistent AI memory on a Raspberry Pi 5

Local embeddings + ChromaDB + Ollama in ~150 lines. ~$100 of hardware. No tokens.

Issue #22026年4月12日

Gemma 4 changes local LLM — and the first killer use case is Claude Code

88% accuracy at 175 tok/s, 17GB VRAM, and how to cut your Claude Code bill with one env var

Issue #12026年4月11日

Your local AI stack is already being scanned

113K requests, a Raspberry Pi honeypot, and the attack surface you didn't know you had