Bring ~/.claude config under bombadil management across both machines:
- claude/shared/: converged settings.json (union of both hosts) and a single
Catppuccin-powerline statusline merged from the two machines' versions
- claude/xps, claude/desktop: per-host agents/skills behind [profiles.xps]/
[profiles.desktop]; each host links only its own via `bombadil link -p <theme> <host>`
Linked at file granularity because bombadil 4.2.0 can't create directory
symlinks for new targets, and to keep ~/.claude/{agents,skills} real dirs.
Add a Justfile (symlinked to ~/.justfile, usable via `just -g`) with link/
dark/light/watch/unlink/update/status/edit recipes; host auto-detected from
hostname. Recipes use exported shell vars to avoid bombadil's Tera engine
mis-parsing just's double-brace interpolation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
11 KiB
name, description
| name | description |
|---|---|
| local-delegate | Decide when and how to delegate a focused coding subtask to the local Qwen3-Coder-30B model via the `local-coder` subagent. Use when the task is well-scoped, mechanical, fits in ~16K context, and doesn't need top-tier reasoning — saves Anthropic API tokens for the orchestrator's harder work. Anti-patterns — cross-file architectural changes, ambiguous requirements, performance tuning, anything requiring `cargo run`. |
/local-delegate
A decision-support skill for the orchestrator. Triage whether a subtask fits the local Qwen3-Coder model, and if so, hand it off via the local-coder subagent with a properly-shaped prompt.
The local stack
| Layer | What | Where |
|---|---|---|
| Model | Qwen3-Coder-30B-A3B-Instruct, UD-Q5_K_XL quant | ~/llm/models/ |
| Inference | llama.cpp 9200 with Vulkan/RADV on AMD Radeon RX 7900 XTX | systemctl --user … llama-server (port 8080) |
| API translator | claude-code-router (Anthropic ↔ OpenAI) | ccr (port 3456) |
| Wrapper | One-shot claude --print with ANTHROPIC_BASE_URL=ccr |
~/llm/scripts/local-coder-task.sh |
| Subagent | Haiku transport layer that drives the wrapper + verifies | ~/.claude/agents/local-coder.md |
Performance: ~135-140 tok/s decode, ~100-200 ms TTFT, 32K context (practical task budget ~16-20K leaves room for output).
✅ Good fits for the local model
- Mechanical refactors — rename, extract helper, inline a constant, hoist a binding.
- Boilerplate scaffolding — new test file modeled on an existing one, getter/setter pairs, a CLI subcommand stub.
- Format normalization — rewrite docstrings to a target style, normalize import order, convert log macros.
- Single-file changes where the surrounding context fits in ~10K tokens.
- Cross-language translation — port a function from Python to Rust, convert XML config to TOML, etc.
- Lint-driven fixes where the lint message names the change ("inline this
format!", "remove unused import"). - Read-only inspection — "summarize what module X does", "list all callers of function Y" (model can use Read/Grep/Glob).
❌ Bad fits — keep on real Claude
- Cross-file architectural changes — local model can't hold enough context to reason about ripple effects.
- Ambiguous requirements — anything needing "well, depends on…" judgment.
- Performance work — needs bench data, knowledge of the existing perf budget, system-level reasoning.
- Web research / external lookups — local model has no web access through this pipe.
cargo run/ interactive smoke testing — same TTY constraint as remote subagents; the local model can't verify visual output either.- PR creation, git commits, branch ops — wrapper's Bash allowlist is read-only for safety. Have the orchestrator handle git after the subagent returns.
- Anything novel — local 30B is fluent but doesn't have the depth on niche libraries / rare patterns.
How to invoke
Shape A — task that writes/edits files
Use this when the local model should produce a file as its primary output. The subagent will verify each touched file with one Read call (≤3 files).
Agent({
subagent_type: "local-coder",
description: "<3-5 word summary>",
prompt: "
Task: <one-paragraph description, imperative mood>
Files in scope:
- <path>:<optional line range>
- <path>
Context (paste relevant snippets — keep under 8K tokens):
```<lang>
<relevant code>
```
Acceptance criteria:
- <bullet>
- <bullet>
Out of scope:
- <bullet — what NOT to touch>
- Don't run compile/test/lint checks — orchestrator will do that after you return.
"
})
Shape B — task that returns text only (no file writes)
Use this when you want analysis, an explanation, a code snippet for the orchestrator to apply itself, or a summary. The subagent skips the Read verification entirely (0 Read calls), so it's the fastest shape — typically ~5-10 s end-to-end on warm stack.
Agent({
subagent_type: "local-coder",
description: "<3-5 word summary>",
prompt: "
Task: <one-paragraph description, imperative mood>
Context (paste relevant snippets — keep under 8K tokens):
```<lang>
<relevant code>
```
Output format:
- <e.g., 'one Rust function, no markdown fences, no explanation'>
- <e.g., 'bullet list of files that match the pattern, one per line'>
Out of scope:
- Don't write any files. Return your answer as plain text only.
- Don't run any commands.
"
})
Concrete no-edit examples:
- Explain: "Explain in 4 sentences what
crates/zemyna_terrain/src/chunked.rsdoes. Output: 4 sentences, plain text, no headings." - Snippet for orchestrator to paste: "Write a Rust closure equivalent to this Python lambda:
lambda x, y: x * 2 + y. Output: only the closure, one line, noletbinding." - Listing: "Read
Cargo.toml. List every workspace member crate, one per line, no other text." - Translation: "Translate this SQL
WHEREclause to aserde_json::Valuefilter expression. Output: only the Rust expression."
Both shapes invoke the same wrapper via a single Bash heredoc — no temp file involved. The subagent returns a structured report with exit:, files:, verified:, and verbatim wrapper output.
Shape C — direct Bash, no subagent (guaranteed routing)
Use this when you need a hard guarantee the local model actually ran — typically because the task is trivial enough that the subagent might decide to answer it directly from its own knowledge instead of invoking the wrapper. (Sonnet-as-subagent follows multi-paragraph rules ~95% of the time, but trivial one-liner tasks tempt any model to shortcut.)
You give up the subagent's verification + structured report; you pay one Bash call's worth of orchestrator context for the wrapper's raw output.
Bash({
command: "~/llm/scripts/local-coder-task.sh <<'TASK'\n<your task here>\nTASK\n",
description: "force-route through local model"
})
The wrapper's stdout becomes the Bash result. You parse it yourself.
When Shape C is the right call:
- The task is one-liner-trivial (e.g., "convert this Python lambda to Rust").
- You're benchmarking the local model and need every call to actually hit it.
- You're testing the local stack (smoke test, latency measurement, output-format check).
- You suspect the subagent will shortcut because the task is too easy.
When Shape A or B is still better:
- Real coding subtasks (refactor, scaffold, format-cleanup) — the subagent's verification step catches hallucinated file edits.
- Tasks where you want a structured report (
exit:,files:,verified:) for the orchestrator's downstream handling. - Multi-file tasks where the verification of each file is non-trivial.
Quick decision tree
Task fits the local model? ── no ──> keep on real Claude
│
yes
│
Will the model write files? ── yes ──> Shape A (subagent, file verification)
│
no
│
Is the task trivial enough that the
subagent might answer directly? ── yes ──> Shape C (direct Bash, guaranteed)
│
no (e.g., needs the local model's
actual code-gen style, length,
or vocabulary)
│
└──> Shape B (subagent, no file verification)
Pre-flight checklist (orchestrator side)
Before invoking, mentally check:
- Sizing — can the task be described in <500 tokens + ≤8K tokens of context? If no, scope-split or keep on real Claude.
- Cohesion — is the task contained to 1-3 files? If it sprawls, keep on real Claude.
- Verifiability — can you state an objective acceptance criterion (a passing test, a successful build, a grep returning N hits)? If you can't state how you'd know it worked, don't delegate.
- Recoverability — if the local model produces wrong output, can you
git checkout -- <files>and try again on real Claude? If not (e.g., it's a brand-new file), reduce blast radius first.
Stack health (drop into a Bash if unsure)
curl -sf http://127.0.0.1:8080/health # llama-server (loads model on first start, ~65 s cold)
ccr status # CCR
systemctl --user status llama-server # if either above fails
The wrapper auto-starts both if missing. But on cold start, the first call takes ~65 s for model load. Subsequent calls (within the 30-min keep-alive) are warm.
Failure handling
| Symptom | Likely cause | Action |
|---|---|---|
| Wrapper exit 2, stderr says "llama-server failed health check" | Model load failed (GPU contention, OOM) | Check journalctl --user -u llama-server --since '5 min ago'. Often: another GPU consumer started. Run ~/llm/scripts/use-llama-server.sh to force-restart clean. |
| Wrapper exit 1, claude session error | CCR translation issue or context overflow | Check ~/.claude-code-router/ logs. Shrink the prompt context, retry. |
| Clean exit, output references edits that aren't there | Local model hallucinated the edit | Subagent's verification step catches this. Fall back to real Claude. |
| Clean exit, output is mid-sentence cut | Hit max_tokens or context overflow | Reduce prompt size and retry, OR raise max_tokens in the wrapper. |
| Repeated/looping output | Sampling broke (rare with our config) | Retry on real Claude — don't iterate on local. |
Anti-patterns
- Don't retry the same task on local. If first attempt fails, fall back. Iterating burns wall clock without fixing the underlying capability gap.
- Don't chain local subagents. Sequential local calls compound error rate. Use real Claude as the connecting tissue.
- Don't pass the orchestrator's full CLAUDE.md / rules context. Wrapper uses
--bareprecisely to avoid this — the local model gets a clean context. Pass only the task-relevant context inline. - Don't delegate work you wouldn't trust a junior dev to do with the same brief. If the brief itself requires deep project knowledge to write correctly, the implementer needs it too.
CLI usage (outside Claude Code)
Useful for testing the stack without spawning a subagent:
echo "Write a Rust function that reverses a string in-place." \
| ~/llm/scripts/local-coder-task.sh
Output goes to stdout. Same env, same flags as what the subagent uses.
See also
~/.claude/agents/local-coder.md— the subagent profile~/llm/scripts/local-coder-task.sh— the wrapper this skill invokes~/.claude-code-router/config.json— CCR routing~/llm/scripts/claude-local.sh— interactiveclaude codeagainst the local stack (different use case: full claude session vs one-shot subtask)