runlocal.cc
Check My GPU →
Issue #8May 20, 2026

AI coding agents compared (2026): Claude Code vs Cursor vs OpenCode vs OpenClaw vs Gemini CLI vs Cluely vs z.ai — and the best local models on Ollama

Seven cloud agents, four local models, one decision matrix. Pricing, underlying models, benchmark numbers, and the cases where each one is the right pick.

There is no single best coding agent in 2026. Pick the one that matches your budget tier, privacy stance, and model preference. This page gives you the matrix, the numbers, and the local fallback for each tier.

The 30-second answer

If you want… Pick
Strongest reasoning, money no object Claude Code (Anthropic Max)
Best IDE-integrated experience Cursor
Open-source CLI, terminal-first OpenCode
Open-source agent + local model + chat UIs OpenClaw + Ollama
Free with a Google account Gemini CLI
Floating overlay that works in any app Cluely
Strong cloud model at the lowest paid tier z.ai (GLM) coding plan
Fully offline / no telemetry OpenClaw or OpenCode + Ollama

The rest of this page justifies each pick.

Why this comparison exists right now

Two structural events in 2026 made every developer reconsider their coding stack:

  1. Anthropic restricted third-party Claude wrappers (April 2026). OpenClaw and OpenCode users running on Claude Pro/Max started getting throttled. Migration searches followed: openclaw ollama +507,000%, cluely pricing +92,600%, gemini cli +34,800%, z ai coding plan +8,500% (Google Trends, last 12 months).
  2. Local coding models finally caught up enough to be usable. Qwen 3.6 27B posts SWE-bench Verified 77.2 at 17 GB Q4. The runtime-engine wave we covered in issue #6 — MTP, DFlash, PAGED MoE — closed the practical throughput gap for agentic loops.

The lineup below reflects this market, not the one from twelve months ago.

Side-by-side

Agent Pricing (USD, 2026-05) Underlying model(s) Runs locally? Open source? UX
Claude Code $20/mo Pro · $200/mo Max Claude Sonnet 4.6 / Opus 4.7 No No CLI + IDE plugins
Cursor $20/mo Pro · $40/mo Business Claude / GPT / proprietary routing No No VS Code fork
OpenCode Free (BYOK or local) Any (Ollama, Anthropic, OpenAI…) Yes Yes Terminal TUI
OpenClaw Free (BYOK or local) Any (Ollama, vLLM, LM Studio, llama.cpp, Anthropic) Yes Yes — Node.js CLI, Telegram, Discord, web
Gemini CLI Free tier · Vertex AI paid Gemini 2.5 Pro / Flash No Partially (client OSS) CLI
Cluely ~$20/mo Pro Proprietary multi-model routing No No Floating overlay
z.ai coding plan from ~$3/mo entry GLM 4.6 / GLM 5 / GLM 5.1 No (cloud); weights partially open Partially CLI + IDE

Prices verified on vendor pages 2026-05-20. They move — confirm at checkout.

Deep dive per tool

1. Claude Code

Strongest raw reasoning, longest practical context (200k), most reliable tool-use loop, native Skills + sub-agents.

The downside is cost at scale and the policy risk you saw in April. No local fallback. No offline mode.

Pick it if: you're a senior engineer or tech lead doing complex refactors, on a team where developer-hour cost dominates token cost.

2. Cursor

The tightest IDE integration on the market — it is a VS Code fork. Autocomplete latency is best-in-class. Multi-file context understanding is genuinely good.

Cloud-only. Lock-in to Cursor's editor. The Composer agent loop is still weaker than Claude Code on multi-step tasks.

Pick it if: you live in your IDE and want completion + chat in one tool.

3. OpenCode

Open source, terminal-native, BYOM. The source is readable and extensible if you want to audit or fork. Privacy-first.

Smaller Skills ecosystem than OpenClaw, no native chat-platform UI, learning curve if you're not a CLI native.

Pick it if: you're a backend or infra engineer comfortable in the terminal, especially if you want to read or extend the source.

4. OpenClaw

One of the largest communities in the open-agent space — tens of thousands of GitHub stars, 800+ community Skills. Works with Telegram and Discord as a UI, which is uniquely useful for triggering coding work from your phone. Broadest model backend support of any agent here.

The Node.js dependency tree is heavy. Skills quality varies. The original Claude piggyback is now restricted — you need a local model to use it cost-effectively.

Pick it if: you're an indie hacker or side-project tinkerer who wants to fire off coding tasks from anywhere. See our setup guide.

5. Gemini CLI

Generous free tier with a Google account. Gemini 2.5 Pro is genuinely competitive on reasoning. The 1M+ context window is real and usable.

Tool-use reliability still lags Claude. The ecosystem is maturing but small. The free quota exists but isn't unlimited.

Pick it if: you're cost-sensitive, a student, a hobbyist, or already inside the Google Cloud ecosystem.

6. Cluely

Lightweight floating overlay — works in any app, not just an IDE. Strong pitch for the non-IDE coding contexts: interviews, learning, pair-debugging meetings, browser dev tools, Notion, Linear. Well-funded and the most-discussed new entrant of 2026cluely valuation is up 119,400%, cluely pricing 92,600%, cluely jobs 99,700% on Google Trends (last 12 months).

Closed model routing (you don't choose). Always-on screen capture has obvious privacy implications. $20/mo is steep for the breadth.

Pick it if: you code across many apps, not just an IDE, and want one assistant everywhere.

7. z.ai (GLM) coding plan

The GLM family — GLM 4.6, GLM 5, GLM 5.1 — is genuinely strong. GLM 5.1 posts MMLU 87.2 and HumanEval 91.5. The entry plan undercuts Claude Pro by roughly 80%.

Western developer awareness is still low (z ai coding plan is rising but from a tiny base). Data residency is in China for the cloud tier. The weights are partially open but you need a 200 GB+ VRAM cluster to self-host — not a consumer GPU target. See our model pages for the specifics.

Pick it if: you're cost-sensitive, comfortable with Chinese cloud providers, and want frontier-class quality at a fraction of Anthropic pricing.

Local Ollama coding LLMs — which to run

If you've picked OpenClaw or OpenCode with a local backend, the next call is which model to pull. The four serious 2026 contenders, all benchmark numbers vendor-reported and rounded:

Model Size VRAM (Q4) HumanEval SWE-bench Verified Speed (Apple M4 Max 48 GB) Notes
Qwen 3.6 27B 27B dense 17 GB 88.5 77.2 ~45 tok/s Multimodal. Claude Code / Qwen Code tooling compatible. Default pick.
Gemma 4 27B 27B MoE (4B active) 17 GB 80.1 ~55 ~80 tok/s Faster per token, weaker on multi-step agentic
Qwen 3.5 9B 9B 6.5 GB 74.2 ~40 ~110 tok/s Best entry tier, fits 16 GB MacBooks
Qwen 3.5 27B 27B 17 GB 79.8 ~62 ~45 tok/s Prior gen, still solid; cheaper to run when quantized

Benchmarks are vendor-reported on official releases; independent re-runs (LiveCodeBench, BigCodeBench) typically come in 3–8 percentage points lower. Use these as a relative ranking, not gospel.

Practical recommendation matrix

Your machine Use case Pick
MacBook M1/M2, 16 GB RAM Daily edits, autocomplete Qwen 3.5 9B
MacBook M3/M4 Pro, 24+ GB Refactors, code review Qwen 3.6 27B Q4
Apple M4 Max 48 GB Full local agentic loop Qwen 3.6 27B Q8
Linux + RTX 4090 24 GB Best-in-class consumer Qwen 3.6 27B Q4 or Gemma 4 27B
Linux + RTX 5090 32 GB Headroom for context + KV cache Qwen 3.6 27B Q8
Anything older / Intel Mac Don't bother locally Gemini CLI free tier

Use the runlocal calculator to check what your specific card runs at what quant.

Honest local-vs-cloud reality check

Comparing publicly reported benchmarks, Qwen 3.6 27B trails Claude Sonnet 4.6 on SWE-bench Verified by roughly 10–15 percentage points and on LiveCodeBench by a similar margin. In practice the gap shows up on multi-file refactors and tricky concurrency bugs, not on routine work — test scaffolding, refactors of single files, glue code and docstrings come out usable on the first generation more often than not. We'll publish a full head-to-head with our own task set in a future issue; until then, take any "we ran X" claim from a vendor blog with the same skepticism.

The May 2026 inference engine breakthroughs we covered in issue #6 — MTP, DFlash, llama.cpp MTP merged to main — are quietly closing the throughput half of this gap. The quality half will take another generation.

Decision flowchart

Q1. Do you need offline / no-cloud / no telemetry?
    YES → OpenClaw or OpenCode + Ollama + Qwen 3.6 27B
    NO  → continue

Q2. Is your hourly rate > $50?
    YES → Claude Code (Max) or Cursor — your time dominates
    NO  → continue

Q3. Do you live inside the IDE?
    YES → Cursor
    NO  → continue

Q4. Cost-sensitive but want strong cloud models?
    YES → Gemini CLI (free) or z.ai coding plan ($3–15/mo)
    NO  → continue

Q5. Do you code across many apps, not just an IDE?
    YES → Cluely
    NO  → Claude Code Pro is the default safe pick

What we're watching next

  • Cluely vs Claude Code on multi-app workflows. The overlay model is genuinely new; head-to-head review in a future issue.
  • z.ai pricing aggression. If GLM 5.1 holds quality and z.ai keeps undercutting, expect Anthropic to respond on Pro pricing by Q3 2026.
  • Local 32B+ on consumer GPUs. Once the next-gen Qwen coder ships with broad Ollama support, the local/cloud quality gap narrows again.
  • Hermes Agent. Emerging Ollama-native agent framework, up 30,200% on Google Trends. We'll cover the install path once it stabilizes.

FAQ

Is Claude Code worth $200/month? If you bill at >$80/hour and the agent saves you ≥2.5 hours a month, yes. For hobbyists, no — start with Gemini CLI free tier or OpenClaw + Ollama.

Can I use OpenClaw without paying Anthropic? Yes — point it at Ollama. That's exactly the migration path most users took after the April 2026 restrictions. See our setup guide.

Is Cluely the same as Cursor? No. Cursor is an IDE. Cluely is a floating overlay that works on top of any app. Different category.

Cheapest paid coding agent that's actually usable? z.ai's GLM coding entry plan (~$3/mo) is the cheapest serious option in 2026-05. Gemini CLI free tier is also viable until you hit the quota.

Why is openclaw ollama searched so much? Anthropic's April 2026 third-party clampdown left thousands of OpenClaw users looking for a model backend that isn't rate-limited. Ollama was the obvious answer. We wrote the full migration guide as issue #7.