e884e4a88f
Bring ~/.claude config under bombadil management across both machines:
- claude/shared/: converged settings.json (union of both hosts) and a single
Catppuccin-powerline statusline merged from the two machines' versions
- claude/xps, claude/desktop: per-host agents/skills behind [profiles.xps]/
[profiles.desktop]; each host links only its own via `bombadil link -p <theme> <host>`
Linked at file granularity because bombadil 4.2.0 can't create directory
symlinks for new targets, and to keep ~/.claude/{agents,skills} real dirs.
Add a Justfile (symlinked to ~/.justfile, usable via `just -g`) with link/
dark/light/watch/unlink/update/status/edit recipes; host auto-detected from
hostname. Recipes use exported shell vars to avoid bombadil's Tera engine
mis-parsing just's double-brace interpolation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
211 lines
11 KiB
Markdown
211 lines
11 KiB
Markdown
---
|
|
name: local-delegate
|
|
description: Decide when and how to delegate a focused coding subtask to the local Qwen3-Coder-30B model via the `local-coder` subagent. Use when the task is well-scoped, mechanical, fits in ~16K context, and doesn't need top-tier reasoning — saves Anthropic API tokens for the orchestrator's harder work. Anti-patterns — cross-file architectural changes, ambiguous requirements, performance tuning, anything requiring `cargo run`.
|
|
---
|
|
|
|
# /local-delegate
|
|
|
|
A decision-support skill for the orchestrator. Triage whether a subtask fits the local Qwen3-Coder model, and if so, hand it off via the `local-coder` subagent with a properly-shaped prompt.
|
|
|
|
## The local stack
|
|
|
|
| Layer | What | Where |
|
|
|---|---|---|
|
|
| Model | Qwen3-Coder-30B-A3B-Instruct, UD-Q5_K_XL quant | `~/llm/models/` |
|
|
| Inference | llama.cpp 9200 with Vulkan/RADV on AMD Radeon RX 7900 XTX | `systemctl --user … llama-server` (port 8080) |
|
|
| API translator | claude-code-router (Anthropic ↔ OpenAI) | `ccr` (port 3456) |
|
|
| Wrapper | One-shot `claude --print` with `ANTHROPIC_BASE_URL=ccr` | `~/llm/scripts/local-coder-task.sh` |
|
|
| Subagent | Haiku transport layer that drives the wrapper + verifies | `~/.claude/agents/local-coder.md` |
|
|
|
|
**Performance**: ~135-140 tok/s decode, ~100-200 ms TTFT, 32K context (practical task budget ~16-20K leaves room for output).
|
|
|
|
## ✅ Good fits for the local model
|
|
|
|
- **Mechanical refactors** — rename, extract helper, inline a constant, hoist a binding.
|
|
- **Boilerplate scaffolding** — new test file modeled on an existing one, getter/setter pairs, a CLI subcommand stub.
|
|
- **Format normalization** — rewrite docstrings to a target style, normalize import order, convert log macros.
|
|
- **Single-file changes** where the surrounding context fits in ~10K tokens.
|
|
- **Cross-language translation** — port a function from Python to Rust, convert XML config to TOML, etc.
|
|
- **Lint-driven fixes** where the lint message names the change ("inline this `format!`", "remove unused import").
|
|
- **Read-only inspection** — "summarize what module X does", "list all callers of function Y" (model can use Read/Grep/Glob).
|
|
|
|
## ❌ Bad fits — keep on real Claude
|
|
|
|
- **Cross-file architectural changes** — local model can't hold enough context to reason about ripple effects.
|
|
- **Ambiguous requirements** — anything needing "well, depends on…" judgment.
|
|
- **Performance work** — needs bench data, knowledge of the existing perf budget, system-level reasoning.
|
|
- **Web research / external lookups** — local model has no web access through this pipe.
|
|
- **`cargo run` / interactive smoke testing** — same TTY constraint as remote subagents; the local model can't verify visual output either.
|
|
- **PR creation, git commits, branch ops** — wrapper's Bash allowlist is read-only for safety. Have the orchestrator handle git after the subagent returns.
|
|
- **Anything novel** — local 30B is fluent but doesn't have the depth on niche libraries / rare patterns.
|
|
|
|
## How to invoke
|
|
|
|
### Shape A — task that writes/edits files
|
|
|
|
Use this when the local model should produce a file as its primary output. The subagent will verify each touched file with one `Read` call (≤3 files).
|
|
|
|
```
|
|
Agent({
|
|
subagent_type: "local-coder",
|
|
description: "<3-5 word summary>",
|
|
prompt: "
|
|
Task: <one-paragraph description, imperative mood>
|
|
|
|
Files in scope:
|
|
- <path>:<optional line range>
|
|
- <path>
|
|
|
|
Context (paste relevant snippets — keep under 8K tokens):
|
|
```<lang>
|
|
<relevant code>
|
|
```
|
|
|
|
Acceptance criteria:
|
|
- <bullet>
|
|
- <bullet>
|
|
|
|
Out of scope:
|
|
- <bullet — what NOT to touch>
|
|
- Don't run compile/test/lint checks — orchestrator will do that after you return.
|
|
"
|
|
})
|
|
```
|
|
|
|
### Shape B — task that returns text only (no file writes)
|
|
|
|
Use this when you want analysis, an explanation, a code snippet for the orchestrator to apply itself, or a summary. The subagent skips the `Read` verification entirely (0 Read calls), so it's the fastest shape — typically ~5-10 s end-to-end on warm stack.
|
|
|
|
```
|
|
Agent({
|
|
subagent_type: "local-coder",
|
|
description: "<3-5 word summary>",
|
|
prompt: "
|
|
Task: <one-paragraph description, imperative mood>
|
|
|
|
Context (paste relevant snippets — keep under 8K tokens):
|
|
```<lang>
|
|
<relevant code>
|
|
```
|
|
|
|
Output format:
|
|
- <e.g., 'one Rust function, no markdown fences, no explanation'>
|
|
- <e.g., 'bullet list of files that match the pattern, one per line'>
|
|
|
|
Out of scope:
|
|
- Don't write any files. Return your answer as plain text only.
|
|
- Don't run any commands.
|
|
"
|
|
})
|
|
```
|
|
|
|
**Concrete no-edit examples**:
|
|
|
|
- *Explain*: "Explain in 4 sentences what `crates/zemyna_terrain/src/chunked.rs` does. Output: 4 sentences, plain text, no headings."
|
|
- *Snippet for orchestrator to paste*: "Write a Rust closure equivalent to this Python lambda: `lambda x, y: x * 2 + y`. Output: only the closure, one line, no `let` binding."
|
|
- *Listing*: "Read `Cargo.toml`. List every workspace member crate, one per line, no other text."
|
|
- *Translation*: "Translate this SQL `WHERE` clause to a `serde_json::Value` filter expression. Output: only the Rust expression."
|
|
|
|
Both shapes invoke the same wrapper via a single Bash heredoc — no temp file involved. The subagent returns a structured report with `exit:`, `files:`, `verified:`, and verbatim wrapper output.
|
|
|
|
### Shape C — direct Bash, no subagent (guaranteed routing)
|
|
|
|
Use this when you need a **hard guarantee** the local model actually ran — typically because the task is trivial enough that the subagent might decide to answer it directly from its own knowledge instead of invoking the wrapper. (Sonnet-as-subagent follows multi-paragraph rules ~95% of the time, but trivial one-liner tasks tempt any model to shortcut.)
|
|
|
|
You give up the subagent's verification + structured report; you pay one Bash call's worth of orchestrator context for the wrapper's raw output.
|
|
|
|
```
|
|
Bash({
|
|
command: "~/llm/scripts/local-coder-task.sh <<'TASK'\n<your task here>\nTASK\n",
|
|
description: "force-route through local model"
|
|
})
|
|
```
|
|
|
|
The wrapper's stdout becomes the Bash result. You parse it yourself.
|
|
|
|
**When Shape C is the right call**:
|
|
- The task is one-liner-trivial (e.g., "convert this Python lambda to Rust").
|
|
- You're benchmarking the local model and need every call to actually hit it.
|
|
- You're testing the local stack (smoke test, latency measurement, output-format check).
|
|
- You suspect the subagent will shortcut because the task is too easy.
|
|
|
|
**When Shape A or B is still better**:
|
|
- Real coding subtasks (refactor, scaffold, format-cleanup) — the subagent's verification step catches hallucinated file edits.
|
|
- Tasks where you want a structured report (`exit:`, `files:`, `verified:`) for the orchestrator's downstream handling.
|
|
- Multi-file tasks where the verification of each file is non-trivial.
|
|
|
|
### Quick decision tree
|
|
|
|
```
|
|
Task fits the local model? ── no ──> keep on real Claude
|
|
│
|
|
yes
|
|
│
|
|
Will the model write files? ── yes ──> Shape A (subagent, file verification)
|
|
│
|
|
no
|
|
│
|
|
Is the task trivial enough that the
|
|
subagent might answer directly? ── yes ──> Shape C (direct Bash, guaranteed)
|
|
│
|
|
no (e.g., needs the local model's
|
|
actual code-gen style, length,
|
|
or vocabulary)
|
|
│
|
|
└──> Shape B (subagent, no file verification)
|
|
```
|
|
|
|
## Pre-flight checklist (orchestrator side)
|
|
|
|
Before invoking, mentally check:
|
|
|
|
1. **Sizing** — can the task be described in <500 tokens + ≤8K tokens of context? If no, scope-split or keep on real Claude.
|
|
2. **Cohesion** — is the task contained to 1-3 files? If it sprawls, keep on real Claude.
|
|
3. **Verifiability** — can you state an objective acceptance criterion (a passing test, a successful build, a grep returning N hits)? If you can't state how you'd know it worked, don't delegate.
|
|
4. **Recoverability** — if the local model produces wrong output, can you `git checkout -- <files>` and try again on real Claude? If not (e.g., it's a brand-new file), reduce blast radius first.
|
|
|
|
## Stack health (drop into a Bash if unsure)
|
|
|
|
```bash
|
|
curl -sf http://127.0.0.1:8080/health # llama-server (loads model on first start, ~65 s cold)
|
|
ccr status # CCR
|
|
systemctl --user status llama-server # if either above fails
|
|
```
|
|
|
|
The wrapper auto-starts both if missing. But on cold start, the first call takes ~65 s for model load. Subsequent calls (within the 30-min keep-alive) are warm.
|
|
|
|
## Failure handling
|
|
|
|
| Symptom | Likely cause | Action |
|
|
|---|---|---|
|
|
| Wrapper exit 2, stderr says "llama-server failed health check" | Model load failed (GPU contention, OOM) | Check `journalctl --user -u llama-server --since '5 min ago'`. Often: another GPU consumer started. Run `~/llm/scripts/use-llama-server.sh` to force-restart clean. |
|
|
| Wrapper exit 1, claude session error | CCR translation issue or context overflow | Check `~/.claude-code-router/` logs. Shrink the prompt context, retry. |
|
|
| Clean exit, output references edits that aren't there | Local model hallucinated the edit | Subagent's verification step catches this. Fall back to real Claude. |
|
|
| Clean exit, output is mid-sentence cut | Hit max_tokens or context overflow | Reduce prompt size and retry, OR raise max_tokens in the wrapper. |
|
|
| Repeated/looping output | Sampling broke (rare with our config) | Retry on real Claude — don't iterate on local. |
|
|
|
|
## Anti-patterns
|
|
|
|
- **Don't retry the same task on local.** If first attempt fails, fall back. Iterating burns wall clock without fixing the underlying capability gap.
|
|
- **Don't chain local subagents.** Sequential local calls compound error rate. Use real Claude as the connecting tissue.
|
|
- **Don't pass the orchestrator's full CLAUDE.md / rules context.** Wrapper uses `--bare` precisely to avoid this — the local model gets a clean context. Pass only the task-relevant context inline.
|
|
- **Don't delegate work you wouldn't trust a junior dev to do with the same brief.** If the brief itself requires deep project knowledge to write correctly, the implementer needs it too.
|
|
|
|
## CLI usage (outside Claude Code)
|
|
|
|
Useful for testing the stack without spawning a subagent:
|
|
|
|
```bash
|
|
echo "Write a Rust function that reverses a string in-place." \
|
|
| ~/llm/scripts/local-coder-task.sh
|
|
```
|
|
|
|
Output goes to stdout. Same env, same flags as what the subagent uses.
|
|
|
|
## See also
|
|
|
|
- `~/.claude/agents/local-coder.md` — the subagent profile
|
|
- `~/llm/scripts/local-coder-task.sh` — the wrapper this skill invokes
|
|
- `~/.claude-code-router/config.json` — CCR routing
|
|
- `~/llm/scripts/claude-local.sh` — interactive `claude code` against the local stack (different use case: full claude session vs one-shot subtask)
|