Issue #11May 21, 2026

Claude shipped 'Agent Skills'. r/LocalLLaMA already converged on the canonical 4 — copy these into your local agent today.

Plan-first, test-first, refactor-with-constraint, debug-loop. Four skill files that should ship with every local-LLM coding agent — sourced from the threads that actually got Qwen3.6 27B to daily-driver quality.

Skills are mainstream now. The local version is different.

Claude's "Agent Skills" docs went live this spring and the term spiked: Google Trends for Claude skill is sitting at 88.5 out of 100 as of last week. Anthropic, Microsoft, OpenAI, VS Code, and a handful of "agent skill library" repos are all racing for the SERP.

Meanwhile, on r/LocalLLaMA, people have been quietly trading skill files for a month. They look different from the Anthropic ones. They're shorter, more restrictive, and written for models that don't internalize patterns the way Opus does — Qwen3.6 27B, Gemma 4 26B, the new generation of small local models that issue 10's scaffold-first tools target.

Issue 10 ended on a promise: the canonical skill library. This is it — four files, sourced from the highest-signal r/LocalLLaMA threads of the past 30 days, with the matrix of which model and VRAM tier each one is for.

Why local models need skill files more than cloud models do

Cloud models have eaten enough public code to internalize patterns like "before you write a function, write the test." When you ask Sonnet to do a refactor, the plan-then-execute loop is partially baked into the weights.

Local models smaller than ~70B haven't. They'll happily start writing code on turn 1 of a 12-step refactor. They'll forget the constraint you stated three messages ago. They'll invent dependencies that don't exist because nothing in the prompt anchored them.

A skill file fixes that by re-declaring the workflow each session with explicit phases, explicit forbidden actions, and a deterministic shape the model can follow without having to remember anything. Anthropic discovered this is useful even for Opus. For Qwen3.6 27B it's the difference between "viable daily driver" and "frustrating toy."

The four canonical skills below all share the same shape:

---
name: <slug>
description: <when to invoke this skill>
---

# <Title>

## Rules
- NEVER ...
- NEVER ...

## Phase 1 — ...
## Phase 2 — ...
...

That's it. No tools, no callbacks, no SDK. A markdown file the scaffold reads at the start of every task.

1. `plan-first` — the one everyone has rediscovered

Source: r/LocalLLaMA 1stjwg5, 256 upvotes, 106 comments. SoAp9035 ran Qwen3.6 35B-A3B Q4_K_XL inside PI Coding Agent for a month with this skill file, on actual production tickets. The reproduction rate in the comments is overwhelming.

The full file (reproduced with the author's framing — the value here is the rule set, not the prose):

---
name: plan-first
description: Structured planning workflow for any coding task. Use at the start of every new feature, bug fix, refactor, or implementation request. Analyzes the project, asks up to 5 clarifying questions, creates a TODO.md, gets user approval, then executes task by task. Never writes code before a plan is approved.
---

# Plan-First Workflow

## Rules
- NEVER write code, create files, or run commands before a TODO.md is approved.
- NEVER assume missing information. Ask instead.
- NEVER skip steps. Follow phases in order.
- NEVER go off-plan. If new work is discovered, add it to TODO.md and ask for approval before doing it.

## Phase 1 — Analyze the Project
Read silently. Check: directory structure (top 2 levels), the relevant manifest (`package.json` / `go.mod` / `pubspec.yaml` / `Cargo.toml` / ...), existing dependencies, build scripts, README, any existing TODO.md or open issues.

## Phase 2 — Ask Clarifying Questions (one round only, ≤5)
After analysis, identify gaps that would block correct implementation. Ask at most 5 numbered questions in a single message. Only ask what is critical and cannot be inferred from the codebase.

## Phase 3 — Create TODO.md
Write TODO.md in the project root with Goal (one sentence), Tasks (grouped by phase, small and independently verifiable, ordered by dependency), and Notes (constraints / risks). Show the full file to the user and ask: "Does this plan look correct? Reply YES to start, or tell me what to change."

## Phase 4 — Revision Loop
If the user requests changes: ask targeted follow-ups, rewrite TODO.md, show updated plan, repeat until approved.

## Phase 5 — Execute
Work tasks in order, one at a time. After each task, change `- [ ]` to `- [x]`. State which task you are starting before beginning. If unlisted work is required, stop, add it under `## Discovered Tasks`, and ask before continuing. When all tasks are `[x]`, say: "All tasks in TODO.md are complete."

Why this works on small models: every transition is gated by a file write the model can see in the next turn. The model isn't asked to remember it has a plan; the plan is on disk. The 5-question cap stops the over-asking failure mode where small models stall.

2. `test-first` — for the "I trust the LLM less than I trust the compiler" workflow

No single thread owns this one — it's the pattern that surfaces repeatedly in 1th5t1b ("favorite Agentic Coding Harness", 62 upvotes), 1thnnjs (the pacman benchmark, 48), and SmallCode's "improvement loop" documented in issue 10. The takeaway: small models write better code when the test is fixed first and the implementation is iterated against it.

---
name: test-first
description: Iterative test-driven workflow for small local models. Use when a behavior change has a verifiable success criterion. Writes the failing test first, runs it to confirm it fails for the right reason, then iterates implementation until the test passes — with a hard retry budget.
---

# Test-First Workflow

## Rules
- NEVER write the implementation before a failing test exists and has been run.
- NEVER edit the test to make it pass. If the test is wrong, stop and ask.
- NEVER exceed 5 implementation attempts on a single test. After 5 failures, decompose the task and ask for guidance.

## Phase 1 — Locate or Create the Test Harness
Find the project's test runner (`pytest`, `go test`, `npm test`, `cargo test`, ...). If none exists, ask the user before adding one. Find the closest existing test file to the change.

## Phase 2 — Write the Failing Test
State in one sentence what behavior the test verifies. Write the smallest possible test that captures it. Run the test. Confirm it fails *and* fails for the reason you expect (not import error, not syntax error). Report the failure message verbatim.

## Phase 3 — Implement Against the Test
Make the smallest possible change to pass the test. Run the test. If it passes, go to Phase 4. If it fails, log: attempt N / 5, the failure delta, and your next hypothesis. Repeat. If attempt 5 fails, stop and produce a decomposition: "This task needs to be split into A, B, C. Which should I tackle first?"

## Phase 4 — Verify No Regression
Run the full test suite (or the nearest scoped subset). If any previously-passing test now fails, stop and report — do not auto-fix.

## Phase 5 — Report
Output: the test that was added, the implementation diff, the test run result, the regression-check result. Do not commit; let the user review.

Why this works on small models: the implementation is constrained by a binary signal the model didn't generate. There's no room for the "looks plausible therefore done" failure mode. The 5-attempt cap stops infinite-loop death spirals — a real failure pattern with 7B-class models.

3. `refactor-with-constraint` — the one that fixes the "future is fictional" bug

Source signal: r/LocalLLaMA 1tcrrfq, 100 upvotes — Gemma 4 26B confidently labels real 2026 events as "fictional" because they're past its training cutoff. The fix the thread converged on: inject the date and project facts in the system prompt as ground truth. Generalize that into a skill, and you get the discipline that prevents the most common refactor failure: the model assuming a dependency or API exists when it doesn't.

---
name: refactor-with-constraint
description: Refactor workflow that pins facts the model cannot infer. Use when changing existing code where the model might invent dependencies, APIs, language versions, or "current best practice" that no longer matches the codebase. Loads project ground truth before touching any file.
---

# Refactor-With-Constraint Workflow

## Rules
- NEVER call an API, import a package, or use a language feature without first verifying it exists in this project.
- NEVER introduce a "modern" pattern just because it's newer. Match the codebase's existing style unless the user explicitly asks to modernize.
- NEVER assume the current date or the freshness of any documentation. Ground truth comes from the project files, not the model's prior.

## Phase 1 — Load Ground Truth
Read and report back: (a) today's date from the system, (b) the language version pinned in the manifest, (c) the top 10 direct dependencies and their pinned versions, (d) the linter / formatter config if present, (e) any `.editorconfig` or style guide file. This becomes the constraint block for the rest of the session.

## Phase 2 — Scope the Refactor
State in one paragraph: what is being refactored, why, and what is *out of scope*. List the files that will be touched. Read each of them in full. Read the call-sites of any function being changed.

## Phase 3 — Propose the Diff (no writes yet)
Output the proposed diff as a unified patch. Annotate each non-trivial change with: which constraint from Phase 1 it respects (or why a constraint needs to relax). Ask: "Apply this patch?"

## Phase 4 — Apply and Verify
On user approval, apply. Then run: build, lint, and the smallest test subset that exercises the touched code. Report each result individually.

## Phase 5 — Roll Back On Failure
If build or lint fail, do not attempt a fix. Revert the patch, summarize what failed, and ask for direction.

Why this works on small models: it forces the model to read the project before proposing changes — defeating the failure mode where 7B–30B models pattern-match on "modern Go" or "modern React" from training data instead of the codebase's actual conventions. The forced rollback on failure prevents the "fix the fix of the fix" spiral.

4. `debug-loop` — for tasks too big to fit in context

Source: r/LocalLLaMA 1tftaaa by DeltaSqueezer, 144 upvotes — Qwen3.5 9B running structured workflows with map-reduce, parallel execution, checkpointing, and recovery. The author's quote that anchors this: "my custom agent has replaced Claude Code for 99% of tasks." The pattern, written as a skill:

---
name: debug-loop
description: Bounded debug workflow for context-limited local models. Use when investigating a bug whose scope exceeds the context window — multi-file searches, long logs, large datasets. Breaks the investigation into chunks, persists state to disk between iterations, and stops on a hard iteration budget.
---

# Debug-Loop Workflow

## Rules
- NEVER hold more than one file or one log chunk in working context at a time.
- NEVER continue past 10 iterations without checking in.
- NEVER discard findings between iterations. Write them to `DEBUG.md` immediately.

## Phase 1 — Frame the Bug
Write `DEBUG.md` with: Symptom (observable behavior), Expected (what should happen), Reproduction (exact steps or input), Hypotheses (numbered list, smallest first).

## Phase 2 — Iterate
For each iteration N (max 10):
1. Pick the smallest untested hypothesis from `DEBUG.md`.
2. Identify the single file, function, or log slice that would confirm or kill it.
3. Read only that slice. Run only the smallest probe (one test, one log query, one print).
4. Append to `DEBUG.md` under `## Iteration N`: hypothesis tested, evidence found, conclusion (confirmed / killed / inconclusive), next hypothesis.

## Phase 3 — Checkpoint Every 3 Iterations
After iterations 3, 6, 9: summarize the state of the investigation in 5 lines at the bottom of `DEBUG.md` and ask: "Continue or change direction?"

## Phase 4 — Resolve or Decompose
If a hypothesis is confirmed and the fix is ≤3 lines: propose the diff, ask before applying. If the fix is larger or the root cause is elsewhere: stop, output a decomposition into smaller bugs, hand off to `plan-first` for the actual fix.

Why this works on small models: the working context is bounded — one slice, one probe, one append. The model never has to juggle a large file in its head. Findings persist on disk; the iteration loop can resume mid-investigation if the session restarts. The hard cap prevents the "rabbit hole" failure mode that wastes a whole afternoon of GPU time.

Compatibility matrix

Which skill is worth installing for which model and what task shape:

Skill	Min model	Min VRAM	Best task shape	Skip if...
plan-first	Qwen3.6 27B / Gemma 4 26B	24GB	Any new feature, refactor, multi-file fix	Single-file 1-line change
test-first	Qwen3.6 27B / 35B-A3B	24GB	Behavior change with a clear test boundary	No test runner exists yet
refactor-with-constraint	Qwen3.6 27B+	24GB	Touching old code, version-sensitive APIs	Greenfield project, no prior code
debug-loop	Qwen3.6 27B+	24GB	Bug whose scope exceeds one file/log	You already know the file and line

The 8GB tier (Gemma 4 4B in SmallCode) can run plan-first and debug-loop but struggles with refactor-with-constraint — too much project file reading for the context window. Stick to the first two on small cards.

Installing them in each scaffold

SmallCode — drop the files into ~/.smallcode/skills/, then reference by name in your prompt: use the plan-first skill. SmallCode auto-injects when the name matches.
PI Coding Agent — same idea, ~/.pi/skills/ directory. SoAp9035's thread shows the file format works as-is.
little-coder — skills go into the agent's system-prompt routing table. Map task shape → skill in the routing policy file the README documents.
Generic (any agent that reads system prompts) — concatenate the relevant skill file at the top of your system prompt manually. Crude but works.

What to do this weekend

Copy plan-first into your local agent today. Highest leverage 10 minutes you'll spend this week.
Pick one real ticket from your backlog. Run it through plan-first end-to-end. Note where the small model breaks down — that's where the next skill in your library should go.
If you've already got plan-first working, add test-first next. The compounding effect is real: the same Qwen3.6 27B passes tasks at a noticeably higher rate with both installed.
Verify VRAM headroom on the runlocal.dev calculator — skill files don't change VRAM cost, but if you're running close to the line, MTP overhead can still push you over.

Japan corner: skills inside the Hermes router

The Hermes Agent + Kiro CLI stack popular in the JP community treats which model fires for which task as the routing problem. A skill file is the same idea one level down: which workflow shape fires for which task. The natural integration is a skill registry the router pulls from. We'll cover this when we write the Hermes deep-dive — but if you're already on that stack, the canonical 4 above plug directly into your router's per-route system prompt slot.

Why this matters past this week

The story so far in 2026 has been:

Q1: Models got good enough (Qwen3.6, Gemma 4).
Spring: MTP + llama.cpp made them fast enough.
Now: Scaffold-first tools made them usable (issue 10).

The missing piece is portable workflow shapes — skill files that move between scaffolds and stay consistent across model swaps. Anthropic figured out the format is valuable. r/LocalLLaMA figured out which four are load-bearing.

What to watch in the next 60 days:

Does someone publish a community-curated awesome-local-llm-skills repo? The demand is there; SERP shows Claude skill local LLM with one forum result at #1 and an awesome-llm-skills GitHub at #4 — but the small-model-specific canonical set hasn't been packaged yet.
Does PI Coding Agent or SmallCode ship a built-in skill registry? The first one that does locks in users.
Does someone wire skill selection into a model router (Hermes-style)? That's the natural next abstraction.

If you write a fifth skill that earns daily-driver status on your local stack, send it. We'll feature the community-curated ones in a later issue.

Next issue: the routing pattern — how Hermes Agent decides which model fires for which task, and what a unified skill+router config looks like.

← All posts