Compare commits
2 Commits
403f951078
..
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
e884e4a88f
|
|||
|
59c40ad5ad
|
@@ -0,0 +1,56 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||||
|
|
||||||
|
## What this is
|
||||||
|
|
||||||
|
Personal dotfiles managed by [Bombadil](https://github.com/oknozor/toml-bombadil). Source config files live at the top level of this repo. Bombadil renders them (substituting template variables) into `.dots/`, then symlinks the rendered files into `$HOME` / `~/.config` according to `bombadil.toml`.
|
||||||
|
|
||||||
|
## Applying changes
|
||||||
|
|
||||||
|
This repo is checked out on two machines sharing one `bombadil.toml`: `daniel-xps` (laptop) and `daniel-desktop` (GPU box). Each links with **both** a theme profile and its **host** profile. Easiest is the Justfile (symlinked to `~/.justfile`, auto-detects the host):
|
||||||
|
|
||||||
|
```sh
|
||||||
|
just -g link # render -> .dots/ and relink for this host + the dark theme
|
||||||
|
just -g light # relink with the light theme
|
||||||
|
just -g update # git pull --ff-only, then relink
|
||||||
|
just -g # list all recipes
|
||||||
|
```
|
||||||
|
|
||||||
|
Equivalent raw commands:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
bombadil link -f -p dark xps # on daniel-xps
|
||||||
|
bombadil link -f -p dark desktop # on daniel-desktop
|
||||||
|
```
|
||||||
|
|
||||||
|
Edits do not take effect until linked. **Always include the host profile** (`xps`/`desktop`) — a plain `bombadil link` won't link that host's Claude agents/skills. `-f` is needed when a target already exists as a real file; bombadil backs it up to `*.bak` (clean those up afterward). The repo is registered globally via `~/.config/bombadil.toml` (a symlink to `bombadil.toml` here, created once by `bombadil install`).
|
||||||
|
|
||||||
|
## Key conventions
|
||||||
|
|
||||||
|
- **Never edit `.dots/`.** It is Bombadil's generated output (not tracked by git) and is overwritten on every `bombadil link`. Edit the source files at the repo root instead.
|
||||||
|
- **`bombadil.toml`** is the source of truth for what gets linked where. Adding a new config means: add the file at the repo root, then add a `[settings.dots]` entry mapping `source` (relative to repo) to `target` (relative to `$HOME`).
|
||||||
|
- **Templating:** files may contain `{{var}}` placeholders substituted from the vars files. The active set is `vars = ["dark.toml", "vars.toml"]` plus whatever profile is selected. `vars.toml` holds identity (`email`, `name`, `signing_key`). `dark.toml` / `light.toml` are profile-specific theme variables (`alacritty_theme`, `helix_theme`, `zellij_theme`, `bat_theme`, `theme`). Templated files currently include `alacritty/alacritty.toml`, `helix/config.toml`, `zellij/config.kdl`, `gitconfig`, and `bat/config` — switching `theme` retints all apps at once via the profile.
|
||||||
|
|
||||||
|
## Claude Code config (`claude/`)
|
||||||
|
|
||||||
|
`~/.claude` is partly shared, partly host-specific:
|
||||||
|
|
||||||
|
- `claude/shared/` — `settings.json` and `statusline-command.sh`, linked on every host (default dots). `settings.json` is the union of both machines' settings; the statusline is one Catppuccin-powerline script merged from both.
|
||||||
|
- `claude/xps/`, `claude/desktop/` — each host's own `agents/` and `skills/`, linked only by that host's profile. `daniel-xps` is the orchestrator that delegates coding subtasks over SSH (`desktop-coder` agent, `desktop-delegate` skill); `daniel-desktop` runs the local Qwen3-Coder model the laptop delegates to (`local-coder` agent, `local-delegate` + `unload-local-model` skills).
|
||||||
|
|
||||||
|
Two bombadil constraints shaped this layout — keep them in mind when adding `.claude` files:
|
||||||
|
|
||||||
|
1. **File-level dots only.** bombadil 4.2.0 cannot create a *directory* symlink for a not-yet-linked target (fails with `IoError: the source path is neither a regular file nor a symlink to a regular file`). So every agent/skill file is its own dot entry mapping a single `.md` file. This also keeps `~/.claude/{agents,skills}` as real directories, so Claude's own runtime files coexist with the managed ones.
|
||||||
|
2. **Tera escaping.** Sources render through Tera, which treats `{{`, `{%`, and `{#` as template syntax. The shell array-length form `${#arr[@]}` contains `{#` and breaks rendering — avoid it in any linked script (`claude/shared/statusline-command.sh` uses `set -- "${arr[@]}"; n=$#` instead). Watch out for the same sequences in comments.
|
||||||
|
|
||||||
|
## zsh layout
|
||||||
|
|
||||||
|
- `zshenv.zsh` → `~/.zshenv` (env vars, editor detection), `zshrc.zsh` → `~/.zshrc` (interactive setup).
|
||||||
|
- `zshrc.zsh` sources every file in `~/.config/zsh.d/*` (linked from `zsh.d/`) then every file in `~/.config/aliases.d/*` (linked from `aliases.d/`). Each file is a self-contained conf.d-style snippet for one tool (e.g. `zsh.d/git.zsh`, `aliases.d/cargo.zsh`). Add a tool by dropping a new `.zsh` file in the matching directory — no central registration needed.
|
||||||
|
- Machine-specific overrides are sourced if present and are intentionally not in this repo: `~/.zshenv.local`, `~/.zshrc.local`, `~/.aliases.local`.
|
||||||
|
- Tool integrations guard on availability with `if (( $+commands[tool] )); then ...` — follow this pattern so configs degrade gracefully on machines lacking a tool. The prompt (`zsh.d/prompt.zsh`) falls back starship → p10k → manjaro.
|
||||||
|
|
||||||
|
## Submodules
|
||||||
|
|
||||||
|
`alacritty/themes` (and a legacy `alacritty-theme`) are git submodules pointing at the upstream alacritty color-theme collection. Run `git submodule update --init` after cloning.
|
||||||
@@ -0,0 +1,55 @@
|
|||||||
|
# Dotfiles task runner (bombadil + git).
|
||||||
|
#
|
||||||
|
# Symlinked to ~/.justfile by bombadil, so it works as just's global justfile:
|
||||||
|
# just -g <recipe> # from anywhere
|
||||||
|
# just <recipe> # from $HOME
|
||||||
|
#
|
||||||
|
# Recipes reference exported variables as shell $vars rather than just's
|
||||||
|
# double-brace interpolation: bombadil renders every dotfile through Tera, which
|
||||||
|
# would try to evaluate a double-brace expression as a template variable.
|
||||||
|
|
||||||
|
# Theme profile to link with: dark or light.
|
||||||
|
export theme := "dark"
|
||||||
|
|
||||||
|
# Host profile, auto-detected from hostname. Each machine links only its own
|
||||||
|
# Claude agents/skills; everything else is shared.
|
||||||
|
export host := if `hostname` == "daniel-desktop" { "desktop" } else { "xps" }
|
||||||
|
|
||||||
|
# List available recipes.
|
||||||
|
default:
|
||||||
|
@just -g --list
|
||||||
|
|
||||||
|
# Re-link all dotfiles for this host and theme.
|
||||||
|
link:
|
||||||
|
bombadil link -f -p "$theme" "$host"
|
||||||
|
|
||||||
|
alias relink := link
|
||||||
|
|
||||||
|
# Re-link with the dark theme.
|
||||||
|
dark:
|
||||||
|
bombadil link -f -p dark "$host"
|
||||||
|
|
||||||
|
# Re-link with the light theme.
|
||||||
|
light:
|
||||||
|
bombadil link -f -p light "$host"
|
||||||
|
|
||||||
|
# Watch dotfiles and re-link automatically on change.
|
||||||
|
watch:
|
||||||
|
bombadil watch -f -p "$theme" "$host"
|
||||||
|
|
||||||
|
# Remove all bombadil-managed symlinks.
|
||||||
|
unlink:
|
||||||
|
bombadil unlink
|
||||||
|
|
||||||
|
# Pull the latest dotfiles from the remote and re-link.
|
||||||
|
update:
|
||||||
|
git -C ~/src/dotfiles pull --ff-only
|
||||||
|
bombadil link -f -p "$theme" "$host"
|
||||||
|
|
||||||
|
# Show the dotfiles repo's git status.
|
||||||
|
status:
|
||||||
|
git -C ~/src/dotfiles status
|
||||||
|
|
||||||
|
# Open the bombadil config in $EDITOR.
|
||||||
|
edit:
|
||||||
|
$EDITOR ~/src/dotfiles/bombadil.toml
|
||||||
@@ -58,3 +58,28 @@ zshconfigs = { source = "zsh.d", target = ".config/zsh.d" }
|
|||||||
zshaliases = { source = "aliases.d", target = ".config/aliases.d" }
|
zshaliases = { source = "aliases.d", target = ".config/aliases.d" }
|
||||||
zshenv = { source = "zshenv.zsh", target = ".zshenv" }
|
zshenv = { source = "zshenv.zsh", target = ".zshenv" }
|
||||||
zshrc = { source = "zshrc.zsh", target = ".zshrc" }
|
zshrc = { source = "zshrc.zsh", target = ".zshrc" }
|
||||||
|
|
||||||
|
# justfile — task runner, usable from anywhere via `just -g`
|
||||||
|
justfile = { source = "Justfile", target = ".justfile" }
|
||||||
|
|
||||||
|
# claude code (shared across hosts; agents/skills are host-specific, see profiles below)
|
||||||
|
claude_settings = { source = "claude/shared/settings.json", target = ".claude/settings.json" }
|
||||||
|
claude_statusline = { source = "claude/shared/statusline-command.sh", target = ".claude/statusline-command.sh" }
|
||||||
|
|
||||||
|
# Host profiles — select one per machine alongside the theme profile (space-separated), e.g.
|
||||||
|
# daniel-xps: bombadil link -f -p dark xps
|
||||||
|
# daniel-desktop: bombadil link -f -p dark desktop
|
||||||
|
# (or just `just -g link`, which auto-detects the host)
|
||||||
|
# Each links the host's own Claude agents/skills (the GPU box drives the local
|
||||||
|
# model; the laptop is the orchestrator that delegates to it over SSH).
|
||||||
|
# Linked at the file level (not as directory dots): bombadil 4.2.0 only symlinks
|
||||||
|
# regular files, and this keeps the live ~/.claude/{agents,skills} as real dirs so
|
||||||
|
# Claude's own agents/skills coexist with these dotfile-managed ones.
|
||||||
|
[profiles.xps.dots]
|
||||||
|
claude_agent_desktop_coder = { source = "claude/xps/agents/desktop-coder.md", target = ".claude/agents/desktop-coder.md" }
|
||||||
|
claude_skill_desktop_delegate = { source = "claude/xps/skills/desktop-delegate/SKILL.md", target = ".claude/skills/desktop-delegate/SKILL.md" }
|
||||||
|
|
||||||
|
[profiles.desktop.dots]
|
||||||
|
claude_agent_local_coder = { source = "claude/desktop/agents/local-coder.md", target = ".claude/agents/local-coder.md" }
|
||||||
|
claude_skill_local_delegate = { source = "claude/desktop/skills/local-delegate/SKILL.md", target = ".claude/skills/local-delegate/SKILL.md" }
|
||||||
|
claude_skill_unload_local_model = { source = "claude/desktop/skills/unload-local-model/SKILL.md", target = ".claude/skills/unload-local-model/SKILL.md" }
|
||||||
|
|||||||
@@ -0,0 +1,85 @@
|
|||||||
|
---
|
||||||
|
name: local-coder
|
||||||
|
description: Run a focused coding subtask on the local Qwen3-Coder-30B model (via llama-server + claude-code-router) instead of a remote Anthropic model. Use when the task is well-scoped, mechanical, fits within ~16K tokens of total context, and doesn't need top-tier reasoning. Good fits — refactor a function, generate tests scaffolded from existing ones, format/normalize docstrings, mechanical lint fixes, scaffold boilerplate, translate a small function between languages. Bad fits — cross-file architectural changes, ambiguous requirements needing judgment, performance work needing measurement, anything requiring web research or `cargo run`. Saves Anthropic API tokens for the orchestrator's harder work.
|
||||||
|
tools: Bash, Read
|
||||||
|
model: sonnet
|
||||||
|
---
|
||||||
|
|
||||||
|
# local-coder
|
||||||
|
|
||||||
|
Thin transport layer for the local Qwen3-Coder model. **You do not solve the task — even if you could answer it from your own knowledge.** Every invocation goes to the wrapper. The orchestrator chose `local-coder` over a direct response *because they want the local model to do this work* — to save Anthropic tokens, to test the local model, or to keep certain work off the remote API. Answering directly defeats the entire purpose.
|
||||||
|
|
||||||
|
## Tool budget — strict
|
||||||
|
|
||||||
|
- **Bash: exactly 1 call, always the wrapper.** Non-negotiable. Even if the task is one line and you "know" the answer, route it through the wrapper. The wrapper is the *product* the orchestrator asked for — not a fallback path.
|
||||||
|
- **No other Bash.** No `ls`, no `cat`, no `rustc`, no `cargo`, no `wc`. The orchestrator runs validation commands after you return.
|
||||||
|
- **Read: 0-3 calls.** One Read per file the model says it created or edited, max 3 files. Skip Read entirely for pure-text-output tasks (no file edits).
|
||||||
|
|
||||||
|
If you find yourself wanting a 2nd Bash call or a 4th Read, stop and return what you have. More tool calls do not improve the report — they only add latency.
|
||||||
|
|
||||||
|
**Sanity check before responding**: count your tool uses. If Bash count != 1, you did the wrong thing — re-run with the wrapper.
|
||||||
|
|
||||||
|
## Process
|
||||||
|
|
||||||
|
1. **Invoke the wrapper** in a single Bash call using a heredoc (no temp file). Heredoc preserves newlines and shell-special chars cleanly:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
~/llm/scripts/local-coder-task.sh [--profile <name>] [--minimal-prompt] <<'TASK'
|
||||||
|
<orchestrator's prompt verbatim, including any code fences>
|
||||||
|
TASK
|
||||||
|
```
|
||||||
|
|
||||||
|
**Wrapper flags** (optional, passed before the heredoc):
|
||||||
|
- `--profile <name>` — tool profile, controls `--allowed-tools` footprint:
|
||||||
|
- `full` (default) — Read/Write/Edit/Grep/Glob + ~25 Bash patterns.
|
||||||
|
- `code` — Read/Write/Edit/Grep/Glob + cargo verification Bash only.
|
||||||
|
- `edit-only` — Read/Write/Edit/Grep/Glob, **no Bash** (~3–5K tokens saved).
|
||||||
|
- `read-only` — Read/Grep/Glob, no edits.
|
||||||
|
- `--minimal-prompt` — replace Claude's built-in system prompt with a ~1 KB minimal one (~8–12K tokens saved). Use for trusted orchestrator-spec'd mechanical tasks. Drops Claude's default safety/style guidance.
|
||||||
|
|
||||||
|
**How to pick flags**: if the orchestrator's brief includes a `WRAPPER FLAGS:` line at the top (e.g. `WRAPPER FLAGS: --profile=edit-only --minimal-prompt`), pass those flags verbatim and **strip the line from the task body** before piping. If no such line is present, invoke with no flags (default behavior — full profile, default Claude prompt + tool-call-format append).
|
||||||
|
|
||||||
|
2. **If exit != 0**: return the structured report with `exit: <N>` and the wrapper's stderr verbatim. Stop. No retry.
|
||||||
|
|
||||||
|
3. **If exit == 0**:
|
||||||
|
- Scan the model's stdout for filenames it claims to have created/edited (lines like "created `/path/...`", "wrote to `/path/...`", or a Write/Edit tool reference in the output).
|
||||||
|
- For each (≤3), do exactly one `Read` to confirm the file exists and is non-empty. That's the entire verification — do not spot-check contents, do not count lines, do not parse.
|
||||||
|
- If the model output references no files (pure text task), skip Read entirely.
|
||||||
|
|
||||||
|
4. **Return the structured report.**
|
||||||
|
|
||||||
|
## Final report
|
||||||
|
|
||||||
|
```
|
||||||
|
exit: <N>
|
||||||
|
files: <comma-separated list of files the model claimed to touch, or "none">
|
||||||
|
verified: <comma-separated "yes"/"no" matching files list, or "n/a" if none>
|
||||||
|
|
||||||
|
output:
|
||||||
|
<verbatim wrapper stdout — no paraphrasing, no editing>
|
||||||
|
```
|
||||||
|
|
||||||
|
That's the whole report. No "verification notes" prose, no elapsed time (the runtime tracks it), no "model:" header (the orchestrator knows what model you wrap).
|
||||||
|
|
||||||
|
If you flagged any file as "verified: no" (i.e., Read failed or returned empty), add one final line:
|
||||||
|
|
||||||
|
```
|
||||||
|
warning: <one short sentence about which file and what was wrong>
|
||||||
|
```
|
||||||
|
|
||||||
|
## Hard rules
|
||||||
|
|
||||||
|
- **Verbatim output.** Never paraphrase or summarize the wrapper's stdout. The orchestrator wants raw model output.
|
||||||
|
- **No retry.** First-attempt failure → report and stop. The orchestrator decides whether to fall back to real Claude.
|
||||||
|
- **No scope expansion.** Ambiguous prompt → return `exit: 0`, `output: need clarification: <specific question>`. Do not guess.
|
||||||
|
- **Tool ceiling is binding.** 1 Bash + 0-3 Read. Going over is a bug, not thoroughness.
|
||||||
|
|
||||||
|
## Failure modes (surface to orchestrator, don't act on them)
|
||||||
|
|
||||||
|
| Exit | Cause | What you do |
|
||||||
|
|---|---|---|
|
||||||
|
| 2 | local stack down (llama-server / CCR failed to start) | Report verbatim stderr. |
|
||||||
|
| 1 | child claude errored (CCR translation, context overflow, timeout) | Report verbatim stderr. |
|
||||||
|
| 0 + truncated output | model hit max_tokens or got confused mid-stream | Report as-is. The orchestrator notices. |
|
||||||
|
| 0 + refusal text | model said "I can't help" | Report as-is. |
|
||||||
|
| 0 + file claimed but Read fails | model hallucinated the edit | Mark "verified: no" + add the warning line. |
|
||||||
@@ -0,0 +1,210 @@
|
|||||||
|
---
|
||||||
|
name: local-delegate
|
||||||
|
description: Decide when and how to delegate a focused coding subtask to the local Qwen3-Coder-30B model via the `local-coder` subagent. Use when the task is well-scoped, mechanical, fits in ~16K context, and doesn't need top-tier reasoning — saves Anthropic API tokens for the orchestrator's harder work. Anti-patterns — cross-file architectural changes, ambiguous requirements, performance tuning, anything requiring `cargo run`.
|
||||||
|
---
|
||||||
|
|
||||||
|
# /local-delegate
|
||||||
|
|
||||||
|
A decision-support skill for the orchestrator. Triage whether a subtask fits the local Qwen3-Coder model, and if so, hand it off via the `local-coder` subagent with a properly-shaped prompt.
|
||||||
|
|
||||||
|
## The local stack
|
||||||
|
|
||||||
|
| Layer | What | Where |
|
||||||
|
|---|---|---|
|
||||||
|
| Model | Qwen3-Coder-30B-A3B-Instruct, UD-Q5_K_XL quant | `~/llm/models/` |
|
||||||
|
| Inference | llama.cpp 9200 with Vulkan/RADV on AMD Radeon RX 7900 XTX | `systemctl --user … llama-server` (port 8080) |
|
||||||
|
| API translator | claude-code-router (Anthropic ↔ OpenAI) | `ccr` (port 3456) |
|
||||||
|
| Wrapper | One-shot `claude --print` with `ANTHROPIC_BASE_URL=ccr` | `~/llm/scripts/local-coder-task.sh` |
|
||||||
|
| Subagent | Haiku transport layer that drives the wrapper + verifies | `~/.claude/agents/local-coder.md` |
|
||||||
|
|
||||||
|
**Performance**: ~135-140 tok/s decode, ~100-200 ms TTFT, 32K context (practical task budget ~16-20K leaves room for output).
|
||||||
|
|
||||||
|
## ✅ Good fits for the local model
|
||||||
|
|
||||||
|
- **Mechanical refactors** — rename, extract helper, inline a constant, hoist a binding.
|
||||||
|
- **Boilerplate scaffolding** — new test file modeled on an existing one, getter/setter pairs, a CLI subcommand stub.
|
||||||
|
- **Format normalization** — rewrite docstrings to a target style, normalize import order, convert log macros.
|
||||||
|
- **Single-file changes** where the surrounding context fits in ~10K tokens.
|
||||||
|
- **Cross-language translation** — port a function from Python to Rust, convert XML config to TOML, etc.
|
||||||
|
- **Lint-driven fixes** where the lint message names the change ("inline this `format!`", "remove unused import").
|
||||||
|
- **Read-only inspection** — "summarize what module X does", "list all callers of function Y" (model can use Read/Grep/Glob).
|
||||||
|
|
||||||
|
## ❌ Bad fits — keep on real Claude
|
||||||
|
|
||||||
|
- **Cross-file architectural changes** — local model can't hold enough context to reason about ripple effects.
|
||||||
|
- **Ambiguous requirements** — anything needing "well, depends on…" judgment.
|
||||||
|
- **Performance work** — needs bench data, knowledge of the existing perf budget, system-level reasoning.
|
||||||
|
- **Web research / external lookups** — local model has no web access through this pipe.
|
||||||
|
- **`cargo run` / interactive smoke testing** — same TTY constraint as remote subagents; the local model can't verify visual output either.
|
||||||
|
- **PR creation, git commits, branch ops** — wrapper's Bash allowlist is read-only for safety. Have the orchestrator handle git after the subagent returns.
|
||||||
|
- **Anything novel** — local 30B is fluent but doesn't have the depth on niche libraries / rare patterns.
|
||||||
|
|
||||||
|
## How to invoke
|
||||||
|
|
||||||
|
### Shape A — task that writes/edits files
|
||||||
|
|
||||||
|
Use this when the local model should produce a file as its primary output. The subagent will verify each touched file with one `Read` call (≤3 files).
|
||||||
|
|
||||||
|
```
|
||||||
|
Agent({
|
||||||
|
subagent_type: "local-coder",
|
||||||
|
description: "<3-5 word summary>",
|
||||||
|
prompt: "
|
||||||
|
Task: <one-paragraph description, imperative mood>
|
||||||
|
|
||||||
|
Files in scope:
|
||||||
|
- <path>:<optional line range>
|
||||||
|
- <path>
|
||||||
|
|
||||||
|
Context (paste relevant snippets — keep under 8K tokens):
|
||||||
|
```<lang>
|
||||||
|
<relevant code>
|
||||||
|
```
|
||||||
|
|
||||||
|
Acceptance criteria:
|
||||||
|
- <bullet>
|
||||||
|
- <bullet>
|
||||||
|
|
||||||
|
Out of scope:
|
||||||
|
- <bullet — what NOT to touch>
|
||||||
|
- Don't run compile/test/lint checks — orchestrator will do that after you return.
|
||||||
|
"
|
||||||
|
})
|
||||||
|
```
|
||||||
|
|
||||||
|
### Shape B — task that returns text only (no file writes)
|
||||||
|
|
||||||
|
Use this when you want analysis, an explanation, a code snippet for the orchestrator to apply itself, or a summary. The subagent skips the `Read` verification entirely (0 Read calls), so it's the fastest shape — typically ~5-10 s end-to-end on warm stack.
|
||||||
|
|
||||||
|
```
|
||||||
|
Agent({
|
||||||
|
subagent_type: "local-coder",
|
||||||
|
description: "<3-5 word summary>",
|
||||||
|
prompt: "
|
||||||
|
Task: <one-paragraph description, imperative mood>
|
||||||
|
|
||||||
|
Context (paste relevant snippets — keep under 8K tokens):
|
||||||
|
```<lang>
|
||||||
|
<relevant code>
|
||||||
|
```
|
||||||
|
|
||||||
|
Output format:
|
||||||
|
- <e.g., 'one Rust function, no markdown fences, no explanation'>
|
||||||
|
- <e.g., 'bullet list of files that match the pattern, one per line'>
|
||||||
|
|
||||||
|
Out of scope:
|
||||||
|
- Don't write any files. Return your answer as plain text only.
|
||||||
|
- Don't run any commands.
|
||||||
|
"
|
||||||
|
})
|
||||||
|
```
|
||||||
|
|
||||||
|
**Concrete no-edit examples**:
|
||||||
|
|
||||||
|
- *Explain*: "Explain in 4 sentences what `crates/zemyna_terrain/src/chunked.rs` does. Output: 4 sentences, plain text, no headings."
|
||||||
|
- *Snippet for orchestrator to paste*: "Write a Rust closure equivalent to this Python lambda: `lambda x, y: x * 2 + y`. Output: only the closure, one line, no `let` binding."
|
||||||
|
- *Listing*: "Read `Cargo.toml`. List every workspace member crate, one per line, no other text."
|
||||||
|
- *Translation*: "Translate this SQL `WHERE` clause to a `serde_json::Value` filter expression. Output: only the Rust expression."
|
||||||
|
|
||||||
|
Both shapes invoke the same wrapper via a single Bash heredoc — no temp file involved. The subagent returns a structured report with `exit:`, `files:`, `verified:`, and verbatim wrapper output.
|
||||||
|
|
||||||
|
### Shape C — direct Bash, no subagent (guaranteed routing)
|
||||||
|
|
||||||
|
Use this when you need a **hard guarantee** the local model actually ran — typically because the task is trivial enough that the subagent might decide to answer it directly from its own knowledge instead of invoking the wrapper. (Sonnet-as-subagent follows multi-paragraph rules ~95% of the time, but trivial one-liner tasks tempt any model to shortcut.)
|
||||||
|
|
||||||
|
You give up the subagent's verification + structured report; you pay one Bash call's worth of orchestrator context for the wrapper's raw output.
|
||||||
|
|
||||||
|
```
|
||||||
|
Bash({
|
||||||
|
command: "~/llm/scripts/local-coder-task.sh <<'TASK'\n<your task here>\nTASK\n",
|
||||||
|
description: "force-route through local model"
|
||||||
|
})
|
||||||
|
```
|
||||||
|
|
||||||
|
The wrapper's stdout becomes the Bash result. You parse it yourself.
|
||||||
|
|
||||||
|
**When Shape C is the right call**:
|
||||||
|
- The task is one-liner-trivial (e.g., "convert this Python lambda to Rust").
|
||||||
|
- You're benchmarking the local model and need every call to actually hit it.
|
||||||
|
- You're testing the local stack (smoke test, latency measurement, output-format check).
|
||||||
|
- You suspect the subagent will shortcut because the task is too easy.
|
||||||
|
|
||||||
|
**When Shape A or B is still better**:
|
||||||
|
- Real coding subtasks (refactor, scaffold, format-cleanup) — the subagent's verification step catches hallucinated file edits.
|
||||||
|
- Tasks where you want a structured report (`exit:`, `files:`, `verified:`) for the orchestrator's downstream handling.
|
||||||
|
- Multi-file tasks where the verification of each file is non-trivial.
|
||||||
|
|
||||||
|
### Quick decision tree
|
||||||
|
|
||||||
|
```
|
||||||
|
Task fits the local model? ── no ──> keep on real Claude
|
||||||
|
│
|
||||||
|
yes
|
||||||
|
│
|
||||||
|
Will the model write files? ── yes ──> Shape A (subagent, file verification)
|
||||||
|
│
|
||||||
|
no
|
||||||
|
│
|
||||||
|
Is the task trivial enough that the
|
||||||
|
subagent might answer directly? ── yes ──> Shape C (direct Bash, guaranteed)
|
||||||
|
│
|
||||||
|
no (e.g., needs the local model's
|
||||||
|
actual code-gen style, length,
|
||||||
|
or vocabulary)
|
||||||
|
│
|
||||||
|
└──> Shape B (subagent, no file verification)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Pre-flight checklist (orchestrator side)
|
||||||
|
|
||||||
|
Before invoking, mentally check:
|
||||||
|
|
||||||
|
1. **Sizing** — can the task be described in <500 tokens + ≤8K tokens of context? If no, scope-split or keep on real Claude.
|
||||||
|
2. **Cohesion** — is the task contained to 1-3 files? If it sprawls, keep on real Claude.
|
||||||
|
3. **Verifiability** — can you state an objective acceptance criterion (a passing test, a successful build, a grep returning N hits)? If you can't state how you'd know it worked, don't delegate.
|
||||||
|
4. **Recoverability** — if the local model produces wrong output, can you `git checkout -- <files>` and try again on real Claude? If not (e.g., it's a brand-new file), reduce blast radius first.
|
||||||
|
|
||||||
|
## Stack health (drop into a Bash if unsure)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -sf http://127.0.0.1:8080/health # llama-server (loads model on first start, ~65 s cold)
|
||||||
|
ccr status # CCR
|
||||||
|
systemctl --user status llama-server # if either above fails
|
||||||
|
```
|
||||||
|
|
||||||
|
The wrapper auto-starts both if missing. But on cold start, the first call takes ~65 s for model load. Subsequent calls (within the 30-min keep-alive) are warm.
|
||||||
|
|
||||||
|
## Failure handling
|
||||||
|
|
||||||
|
| Symptom | Likely cause | Action |
|
||||||
|
|---|---|---|
|
||||||
|
| Wrapper exit 2, stderr says "llama-server failed health check" | Model load failed (GPU contention, OOM) | Check `journalctl --user -u llama-server --since '5 min ago'`. Often: another GPU consumer started. Run `~/llm/scripts/use-llama-server.sh` to force-restart clean. |
|
||||||
|
| Wrapper exit 1, claude session error | CCR translation issue or context overflow | Check `~/.claude-code-router/` logs. Shrink the prompt context, retry. |
|
||||||
|
| Clean exit, output references edits that aren't there | Local model hallucinated the edit | Subagent's verification step catches this. Fall back to real Claude. |
|
||||||
|
| Clean exit, output is mid-sentence cut | Hit max_tokens or context overflow | Reduce prompt size and retry, OR raise max_tokens in the wrapper. |
|
||||||
|
| Repeated/looping output | Sampling broke (rare with our config) | Retry on real Claude — don't iterate on local. |
|
||||||
|
|
||||||
|
## Anti-patterns
|
||||||
|
|
||||||
|
- **Don't retry the same task on local.** If first attempt fails, fall back. Iterating burns wall clock without fixing the underlying capability gap.
|
||||||
|
- **Don't chain local subagents.** Sequential local calls compound error rate. Use real Claude as the connecting tissue.
|
||||||
|
- **Don't pass the orchestrator's full CLAUDE.md / rules context.** Wrapper uses `--bare` precisely to avoid this — the local model gets a clean context. Pass only the task-relevant context inline.
|
||||||
|
- **Don't delegate work you wouldn't trust a junior dev to do with the same brief.** If the brief itself requires deep project knowledge to write correctly, the implementer needs it too.
|
||||||
|
|
||||||
|
## CLI usage (outside Claude Code)
|
||||||
|
|
||||||
|
Useful for testing the stack without spawning a subagent:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
echo "Write a Rust function that reverses a string in-place." \
|
||||||
|
| ~/llm/scripts/local-coder-task.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Output goes to stdout. Same env, same flags as what the subagent uses.
|
||||||
|
|
||||||
|
## See also
|
||||||
|
|
||||||
|
- `~/.claude/agents/local-coder.md` — the subagent profile
|
||||||
|
- `~/llm/scripts/local-coder-task.sh` — the wrapper this skill invokes
|
||||||
|
- `~/.claude-code-router/config.json` — CCR routing
|
||||||
|
- `~/llm/scripts/claude-local.sh` — interactive `claude code` against the local stack (different use case: full claude session vs one-shot subtask)
|
||||||
@@ -0,0 +1,84 @@
|
|||||||
|
---
|
||||||
|
name: unload-local-model
|
||||||
|
description: Unload the local llama.cpp model (Qwen3-Coder-30B) from the 7900 XTX to free VRAM. Stops the llama-server systemd user service and reaps any stray foreground server. Idempotent — safe to run when already unloaded. Use when done with local-model work or when you want the GPU's VRAM back.
|
||||||
|
---
|
||||||
|
|
||||||
|
# /unload-local-model
|
||||||
|
|
||||||
|
Free the GPU by unloading the local Qwen3-Coder-30B model that backs the
|
||||||
|
`local-coder` subagent (see [local-delegate](../local-delegate/SKILL.md)). The
|
||||||
|
model is served by `llama-server` (llama.cpp) and pins ~9.5 GB of VRAM on the
|
||||||
|
Radeon RX 7900 XTX while resident. This skill stops it cleanly and verifies the
|
||||||
|
VRAM is back.
|
||||||
|
|
||||||
|
## What holds the GPU
|
||||||
|
|
||||||
|
| Layer | Holds VRAM? | This skill touches it? |
|
||||||
|
|---|---|---|
|
||||||
|
| `llama-server.service` (systemd --user, port 8080) | **Yes** — the model weights + KV cache | **Stops it** |
|
||||||
|
| stray foreground `llama-server` (from `llama-server-foreground.sh`) | **Yes**, if running outside systemd | **Reaps it** |
|
||||||
|
| `claude-code-router` / `ccr` (port 3456) | No — pure API translator, no VRAM | Left running |
|
||||||
|
| `ollama` daemon (port 11434) | Only while a model is loaded | Out of scope — see note below |
|
||||||
|
|
||||||
|
Leaving CCR up is deliberate: it holds no VRAM and re-attaches to llama-server
|
||||||
|
the next time the stack warms. There is nothing to restart.
|
||||||
|
|
||||||
|
## Run it
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Canonical path — stop the systemd user service (idempotent; no-op if dead).
|
||||||
|
systemctl --user stop llama-server.service
|
||||||
|
|
||||||
|
# 2. Reap any stray foreground server started outside systemd. Match the binary
|
||||||
|
# PATH (leading slash) — NOT the bare word "llama-server", or pkill matches
|
||||||
|
# its own command line and SIGTERMs the shell running this skill.
|
||||||
|
pkill -f '/llama-server ' 2>/dev/null || true
|
||||||
|
```
|
||||||
|
|
||||||
|
## Verify
|
||||||
|
|
||||||
|
```bash
|
||||||
|
echo "service: $(systemctl --user is-active llama-server.service)" # want: inactive
|
||||||
|
pgrep -af '/llama-server' | grep -v pgrep || echo "no server process" # want: none
|
||||||
|
curl -sf --max-time 2 http://127.0.0.1:8080/health >/dev/null 2>&1 \
|
||||||
|
&& echo "port 8080: UP (STILL LOADED)" || echo "port 8080: down (unloaded)"
|
||||||
|
# VRAM should drop to desktop baseline (~2.4 GiB); a loaded model adds ~9.5 GB.
|
||||||
|
rocm-smi --showmeminfo vram 2>/dev/null | awk '/Used/{printf "VRAM used: ~%d MiB\n", $NF/1024/1024}'
|
||||||
|
```
|
||||||
|
|
||||||
|
A clean unload reads: `service: inactive`, `no server process`, `port 8080:
|
||||||
|
down`, VRAM near the desktop baseline.
|
||||||
|
|
||||||
|
## Gotchas
|
||||||
|
|
||||||
|
- **Self-pkill footgun.** `pkill -f 'llama-server'` (no slash) matches *this
|
||||||
|
skill's own command string* and kills the shell mid-run (exit 144 = SIGTERM).
|
||||||
|
Always anchor on the binary path: `pkill -f '/llama-server '`.
|
||||||
|
- **Already unloaded is the common case.** The systemd unit is `disabled` and
|
||||||
|
only runs on demand (the wrapper auto-starts it), so most of the time the
|
||||||
|
model is already down. The skill is idempotent — running it then is a no-op
|
||||||
|
that just confirms state. Report "already unloaded" rather than implying you
|
||||||
|
stopped something.
|
||||||
|
- **Don't disable or mask the service.** Stopping unloads the model; the next
|
||||||
|
`/local-delegate` call auto-starts it again (~65 s cold load). Disabling would
|
||||||
|
break that auto-start. Stop only.
|
||||||
|
|
||||||
|
## Note on ollama
|
||||||
|
|
||||||
|
The stack can alternatively serve the same model via the `ollama` daemon (port
|
||||||
|
11434). If a request asks to free the GPU broadly and ollama has a model
|
||||||
|
resident, also run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ollama stop qwen3-coder-30b-a3b-q5kxl 2>/dev/null || true
|
||||||
|
```
|
||||||
|
|
||||||
|
This skill's default scope is the llama.cpp path (`llama-server`), which is what
|
||||||
|
`local-coder` uses. Reach for the ollama stop only when ollama is the active
|
||||||
|
backend (`~/llm/scripts/use-ollama.sh` was run).
|
||||||
|
|
||||||
|
## See also
|
||||||
|
|
||||||
|
- [local-delegate](../local-delegate/SKILL.md) — when/how to *use* the local model.
|
||||||
|
- `~/llm/scripts/use-ollama.sh` — stops llama-server so ollama can take the GPU.
|
||||||
|
- `~/llm/scripts/use-llama-server.sh` — the inverse: load llama-server, free ollama.
|
||||||
@@ -0,0 +1,16 @@
|
|||||||
|
{
|
||||||
|
"env": {
|
||||||
|
"CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
|
||||||
|
},
|
||||||
|
"enabledPlugins": {
|
||||||
|
"gopls-lsp@claude-plugins-official": true,
|
||||||
|
"rust-analyzer-lsp@claude-plugins-official": true
|
||||||
|
},
|
||||||
|
"autoUpdatesChannel": "stable",
|
||||||
|
"agentPushNotifEnabled": true,
|
||||||
|
"skipAutoPermissionPrompt": true,
|
||||||
|
"statusLine": {
|
||||||
|
"type": "command",
|
||||||
|
"command": "bash /home/daniel/.claude/statusline-command.sh"
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,95 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
# Claude Code statusLine — Catppuccin Mocha style matching starship.toml
|
||||||
|
# Structure: [CAP_L][seg1][WEDGE][seg2][WEDGE]...[segN][CAP_R]
|
||||||
|
|
||||||
|
input=$(cat)
|
||||||
|
|
||||||
|
# Catppuccin Mocha palette
|
||||||
|
red="#f38ba8"
|
||||||
|
peach="#fab387"
|
||||||
|
yellow="#f9e2af"
|
||||||
|
sapphire="#74c7ec"
|
||||||
|
lavender="#b4befe"
|
||||||
|
crust="#11111b"
|
||||||
|
|
||||||
|
RST=$'\033[0m'
|
||||||
|
BGDEF=$'\033[49m'
|
||||||
|
|
||||||
|
r_fg() { printf '\033[38;2;%d;%d;%dm' $((16#${1:1:2})) $((16#${1:3:2})) $((16#${1:5:2})); }
|
||||||
|
r_bg() { printf '\033[48;2;%d;%d;%dm' $((16#${1:1:2})) $((16#${1:3:2})) $((16#${1:5:2})); }
|
||||||
|
|
||||||
|
CAP_L=$'' # U+E0B6 — left rounded open cap
|
||||||
|
WEDGE=$'' # U+E0B0 — right-pointing filled arrow (between segments)
|
||||||
|
CAP_R=$'' # U+E0B4 — right rounded close cap
|
||||||
|
|
||||||
|
# Parse JSON input
|
||||||
|
cwd=$(printf '%s' "$input" | jq -r '.workspace.current_dir // .cwd // ""')
|
||||||
|
model=$(printf '%s' "$input" | jq -r '.model.display_name // .model.id // ""')
|
||||||
|
used_pct=$(printf '%s' "$input" | jq -r '.context_window.used_percentage // empty')
|
||||||
|
|
||||||
|
# Substitute $HOME → ~, then truncate to last 3 path segments (starship truncation_length=3)
|
||||||
|
display_cwd="${cwd/#$HOME/\~}"
|
||||||
|
truncated_cwd=$(printf '%s' "$display_cwd" | awk -F'/' '{
|
||||||
|
n=NF
|
||||||
|
if (n <= 3) { print $0 }
|
||||||
|
else { printf "…/%s/%s/%s", $(n-2), $(n-1), $n }
|
||||||
|
}')
|
||||||
|
|
||||||
|
# Git ref: prefer the worktree name, then the live branch, then a short SHA
|
||||||
|
git_branch=$(printf '%s' "$input" | jq -r '.worktree.name // .workspace.git_worktree // empty')
|
||||||
|
if [ -z "$git_branch" ]; then
|
||||||
|
git_branch=$(GIT_OPTIONAL_LOCKS=0 git -C "$cwd" symbolic-ref --short HEAD 2>/dev/null \
|
||||||
|
|| GIT_OPTIONAL_LOCKS=0 git -C "$cwd" rev-parse --short HEAD 2>/dev/null \
|
||||||
|
|| echo "")
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Build segment list: parallel arrays of (color, icon, text)
|
||||||
|
declare -a colors=()
|
||||||
|
declare -a icons=()
|
||||||
|
declare -a texts=()
|
||||||
|
|
||||||
|
colors+=("$red"); icons+=($''); texts+=("$(whoami)") # Manjaro
|
||||||
|
colors+=("$peach"); icons+=($''); texts+=("$truncated_cwd") # folder
|
||||||
|
if [ -n "$git_branch" ]; then
|
||||||
|
colors+=("$yellow"); icons+=($''); texts+=("$git_branch") # git branch
|
||||||
|
fi
|
||||||
|
colors+=("$sapphire"); icons+=($''); texts+=("$model") # sparkle ✦
|
||||||
|
if [ -n "$used_pct" ]; then
|
||||||
|
ctx_display=$(printf "%.0f" "$used_pct")
|
||||||
|
if [ "$ctx_display" -ge 80 ]; then ctx_color="$red"; else ctx_color="$lavender"; fi
|
||||||
|
colors+=("$ctx_color"); icons+=($''); texts+=("${ctx_display}%") # clock
|
||||||
|
fi
|
||||||
|
|
||||||
|
# segment count via positional params — the brace-hash array-length form trips
|
||||||
|
# bombadil's Tera renderer, which treats brace-hash as a comment opener
|
||||||
|
set -- "${colors[@]}"; n=$#
|
||||||
|
|
||||||
|
for i in $(seq 0 $((n-1))); do
|
||||||
|
color="${colors[$i]}"
|
||||||
|
icon="${icons[$i]}"
|
||||||
|
text="${texts[$i]}"
|
||||||
|
|
||||||
|
if [ "$i" -eq 0 ]; then
|
||||||
|
# Very first segment: print left rounded opening cap
|
||||||
|
printf '%b' "$RST"
|
||||||
|
r_fg "$color"
|
||||||
|
printf '%s' "$CAP_L"
|
||||||
|
else
|
||||||
|
# Transition from previous segment: wedge (fg=prev color, bg=this color)
|
||||||
|
prev="${colors[$((i-1))]}"
|
||||||
|
r_bg "$color"
|
||||||
|
r_fg "$prev"
|
||||||
|
printf '%s' "$WEDGE"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Segment body
|
||||||
|
r_bg "$color"
|
||||||
|
r_fg "$crust"
|
||||||
|
printf ' %s %s ' "$icon" "$text"
|
||||||
|
done
|
||||||
|
|
||||||
|
# Right rounded closing cap after last segment
|
||||||
|
printf '%b' "$BGDEF"
|
||||||
|
r_fg "${colors[$((n-1))]}"
|
||||||
|
printf '%s' "$CAP_R"
|
||||||
|
printf '%b' "$RST"
|
||||||
@@ -0,0 +1,80 @@
|
|||||||
|
---
|
||||||
|
name: desktop-coder
|
||||||
|
description: Run a focused coding subtask on the Qwen3-Coder-30B model hosted on the `desktop-local` machine (accessed via SSH), instead of a remote Anthropic model. Use when the task is well-scoped, mechanical, fits within ~16K tokens of total context, and doesn't need top-tier reasoning. Good fits — refactor a function, generate tests scaffolded from existing ones, format/normalize docstrings, mechanical lint fixes, scaffold boilerplate, translate a small function between languages. Bad fits — cross-file architectural changes, ambiguous requirements needing judgment, performance work needing measurement, anything requiring web research or `cargo run`. Saves Anthropic API tokens for the orchestrator's harder work.
|
||||||
|
tools: Bash
|
||||||
|
model: sonnet
|
||||||
|
---
|
||||||
|
|
||||||
|
# desktop-coder
|
||||||
|
|
||||||
|
Thin transport layer for the Qwen3-Coder model hosted on `desktop-local` (accessed via SSH). **You do not solve the task — even if you could answer it from your own knowledge.** Every invocation goes to the remote wrapper. The orchestrator chose `desktop-coder` over a direct response *because they want the remote model to do this work* — to save Anthropic tokens, to test the remote model, or to keep certain work off the remote API. Answering directly defeats the entire purpose.
|
||||||
|
|
||||||
|
## Tool budget — strict
|
||||||
|
|
||||||
|
- **Bash: exactly 1 call, always the SSH-wrapper invocation.** Non-negotiable. Even if the task is one line and you "know" the answer, route it through SSH to the remote wrapper. The wrapper is the *product* the orchestrator asked for — not a fallback path.
|
||||||
|
- **No other Bash.** No `ls`, no `cat`, no `rustc`, no `cargo`, no `wc`. The orchestrator runs validation commands after you return. File verification for remote-edited files is the orchestrator's job (via its own SSH).
|
||||||
|
- **No Read.** Files edited by the wrapper live on `desktop-local`, not on this machine. The orchestrator handles verification.
|
||||||
|
|
||||||
|
If you find yourself wanting a 2nd Bash call, stop and return what you have. More tool calls do not improve the report — they only add latency.
|
||||||
|
|
||||||
|
**Sanity check before responding**: count your tool uses. If Bash count != 1, you did the wrong thing — re-run with the SSH wrapper.
|
||||||
|
|
||||||
|
## Process
|
||||||
|
|
||||||
|
1. **Invoke the wrapper over SSH** in a single Bash call using a heredoc. Heredoc preserves newlines and shell-special chars cleanly; piping over SSH forwards stdin to the remote wrapper:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh desktop-local "bash -lc 'cd <CWD> && ~/llm/scripts/local-coder-task.sh [--profile <name>] [--minimal-prompt]'" <<'TASK'
|
||||||
|
<orchestrator's prompt verbatim, including any code fences>
|
||||||
|
TASK
|
||||||
|
```
|
||||||
|
|
||||||
|
**`bash -lc` is required.** Non-interactive SSH does not source `~/.bash_profile` / `~/.zshrc`, and the remote `claude` binary lives at `~/.local/bin/claude` which is only on PATH after a login-shell init. Plain `ssh desktop-local '<cmd>'` will fail with `claude: command not found`. Always wrap in `bash -lc '…'`.
|
||||||
|
|
||||||
|
**CWD on remote**: if the orchestrator's brief includes a `CWD:` line at the top (e.g. `CWD: /home/daniel/src/gamedev/zemyna`), use that path in the `cd` portion. Default if no `CWD:` line: `$HOME` (omit the `cd … &&` prefix entirely; the bash login shell starts there anyway).
|
||||||
|
|
||||||
|
**Wrapper flags** (optional, passed before the heredoc):
|
||||||
|
- `--profile <name>` — tool profile, controls `--allowed-tools` footprint on the remote claude session:
|
||||||
|
- `full` (default) — Read/Write/Edit/Grep/Glob + ~25 Bash patterns.
|
||||||
|
- `code` — Read/Write/Edit/Grep/Glob + cargo verification Bash only.
|
||||||
|
- `edit-only` — Read/Write/Edit/Grep/Glob, **no Bash** (~3–5K tokens saved).
|
||||||
|
- `read-only` — Read/Grep/Glob, no edits.
|
||||||
|
- `--minimal-prompt` — replace Claude's built-in system prompt with a ~1 KB minimal one (~8–12K tokens saved). Use for trusted orchestrator-spec'd mechanical tasks. Drops Claude's default safety/style guidance.
|
||||||
|
|
||||||
|
**How to pick flags**: if the orchestrator's brief includes a `WRAPPER FLAGS:` line at the top (e.g. `WRAPPER FLAGS: --profile=edit-only --minimal-prompt`), pass those flags verbatim and **strip the line from the task body** before piping. Same for `CWD:` — strip before piping. If no such line is present, invoke with defaults (full profile, default Claude prompt + tool-call-format append).
|
||||||
|
|
||||||
|
2. **If exit != 0**: return the structured report with `exit: <N>` and the wrapper's stderr verbatim. Stop. No retry.
|
||||||
|
|
||||||
|
3. **If exit == 0**: return the structured report (see below). Do **not** verify files — they live on `desktop-local`, and verification is the orchestrator's responsibility (via its own SSH calls).
|
||||||
|
|
||||||
|
## Final report
|
||||||
|
|
||||||
|
```
|
||||||
|
exit: <N>
|
||||||
|
host: desktop-local
|
||||||
|
files: <comma-separated list of files the model claimed to touch (paths on desktop-local), or "none">
|
||||||
|
|
||||||
|
output:
|
||||||
|
<verbatim wrapper stdout — no paraphrasing, no editing>
|
||||||
|
```
|
||||||
|
|
||||||
|
That's the whole report. No "verification notes" prose, no elapsed time (the runtime tracks it), no "model:" header (the orchestrator knows what model you wrap). The `host:` line is mandatory so the orchestrator never forgets the edits live on `desktop-local`, not locally.
|
||||||
|
|
||||||
|
## Hard rules
|
||||||
|
|
||||||
|
- **Verbatim output.** Never paraphrase or summarize the wrapper's stdout. The orchestrator wants raw model output.
|
||||||
|
- **No retry.** First-attempt failure → report and stop. The orchestrator decides whether to fall back to real Claude.
|
||||||
|
- **No scope expansion.** Ambiguous prompt → return `exit: 0`, `output: need clarification: <specific question>`. Do not guess.
|
||||||
|
- **Tool ceiling is binding.** 1 Bash call total. Going over is a bug, not thoroughness.
|
||||||
|
- **No file verification.** Files are on the remote host. The orchestrator verifies (via its own `ssh desktop-local cat …` or sync).
|
||||||
|
|
||||||
|
## Failure modes (surface to orchestrator, don't act on them)
|
||||||
|
|
||||||
|
| Exit | Cause | What you do |
|
||||||
|
|---|---|---|
|
||||||
|
| 255 | SSH itself failed (host unreachable, auth, etc.) | Report verbatim stderr, exit 255. |
|
||||||
|
| 2 | Remote stack down (llama-server / CCR failed to start) | Report verbatim stderr. |
|
||||||
|
| 1 | Remote child claude errored (CCR translation, context overflow, timeout) | Report verbatim stderr. |
|
||||||
|
| 0 + truncated output | Model hit max_tokens or got confused mid-stream | Report as-is. The orchestrator notices. |
|
||||||
|
| 0 + refusal text | Model said "I can't help" | Report as-is. |
|
||||||
|
| 0 + file claimed but no remote sync | Orchestrator's problem; just report the claimed paths in `files:`. | |
|
||||||
@@ -0,0 +1,246 @@
|
|||||||
|
---
|
||||||
|
name: desktop-delegate
|
||||||
|
description: Decide when and how to delegate a focused coding subtask to the Qwen3-Coder-30B model hosted on the `desktop-local` machine (via SSH), routed through the `desktop-coder` subagent. Use when the task is well-scoped, mechanical, fits in ~16K context, and doesn't need top-tier reasoning — saves Anthropic API tokens for the orchestrator's harder work. Anti-patterns — cross-file architectural changes, ambiguous requirements, performance tuning, anything requiring `cargo run`, or work whose output must immediately be on this machine without an explicit sync step.
|
||||||
|
---
|
||||||
|
|
||||||
|
# /desktop-delegate
|
||||||
|
|
||||||
|
A decision-support skill for the orchestrator. Triage whether a subtask fits the Qwen3-Coder model running on `desktop-local`, and if so, hand it off via the `desktop-coder` subagent with a properly-shaped prompt.
|
||||||
|
|
||||||
|
## The remote stack
|
||||||
|
|
||||||
|
| Layer | What | Where |
|
||||||
|
|---|---|---|
|
||||||
|
| Host | `desktop-local` (daniel-desktop, Radeon RX 7900 XTX) | reached via `ssh desktop-local` |
|
||||||
|
| Model | Qwen3-Coder-30B-A3B-Instruct, UD-Q5_K_XL quant | `~/llm/models/` on desktop-local |
|
||||||
|
| Inference | llama.cpp with Vulkan/RADV | `systemctl --user … llama-server` (port 8080, desktop-local) |
|
||||||
|
| API translator | claude-code-router (Anthropic ↔ OpenAI) | `ccr` (port 3456, desktop-local) |
|
||||||
|
| Wrapper | One-shot `claude --print` with `ANTHROPIC_BASE_URL=ccr` | `~/llm/scripts/local-coder-task.sh` on desktop-local |
|
||||||
|
| Subagent | Haiku/Sonnet transport layer that SSHs + invokes the wrapper | `~/.claude/agents/desktop-coder.md` (here) |
|
||||||
|
|
||||||
|
**Performance**: ~135-140 tok/s decode on the remote, ~100-200 ms TTFT (plus ~50-200 ms SSH round-trip on first byte). 32K context (practical task budget ~16-20K leaves room for output).
|
||||||
|
|
||||||
|
## Important: cross-machine reality
|
||||||
|
|
||||||
|
The model runs on `desktop-local`. File edits, if any, land on `desktop-local`'s filesystem at the wrapper's CWD. This machine (`daniel-xps`) does not see those edits unless explicitly synced back.
|
||||||
|
|
||||||
|
- **Both machines have `~/src/gamedev/zemyna`**, but they are **independent checkouts**. An edit on one is not visible on the other.
|
||||||
|
- **Prefer text-output mode (Shape B/C)** — the model returns code in its stdout, the orchestrator applies the change here. No sync step needed.
|
||||||
|
- **If using file-edit mode (Shape A)**: the orchestrator is responsible for syncing back (`scp`, `rsync`, or `git pull` from a remote branch the desktop pushed). Mention this explicitly in the brief.
|
||||||
|
|
||||||
|
## ✅ Good fits
|
||||||
|
|
||||||
|
- **Mechanical refactors** — rename, extract helper, inline a constant, hoist a binding.
|
||||||
|
- **Boilerplate scaffolding** — new test file modeled on an existing one, getter/setter pairs, a CLI subcommand stub.
|
||||||
|
- **Format normalization** — rewrite docstrings to a target style, normalize import order, convert log macros.
|
||||||
|
- **Single-file changes** where the surrounding context fits in ~10K tokens.
|
||||||
|
- **Cross-language translation** — port a function from Python to Rust, convert XML config to TOML, etc.
|
||||||
|
- **Lint-driven fixes** where the lint message names the change ("inline this `format!`", "remove unused import").
|
||||||
|
- **Read-only inspection on the remote tree** — "summarize what module X does on desktop-local" (rarely useful — usually you want the analysis on this machine's tree).
|
||||||
|
|
||||||
|
## ❌ Bad fits — keep on real Claude
|
||||||
|
|
||||||
|
- **Cross-file architectural changes** — model can't hold enough context to reason about ripple effects.
|
||||||
|
- **Ambiguous requirements** — anything needing "well, depends on…" judgment.
|
||||||
|
- **Performance work** — needs bench data, knowledge of the existing perf budget, system-level reasoning.
|
||||||
|
- **Web research / external lookups** — no web access through this pipe.
|
||||||
|
- **`cargo run` / interactive smoke testing** — wrapper's `claude --print` is non-interactive; no UI.
|
||||||
|
- **PR creation, git commits, branch ops** — wrapper's Bash allowlist is read-only-ish for safety, and the relevant repo is on the wrong machine. Have the orchestrator handle git after the subagent returns.
|
||||||
|
- **Anything novel** — 30B model is fluent but doesn't have the depth on niche libraries / rare patterns.
|
||||||
|
- **Anything where the output must be on *this* machine immediately** without a sync step — prefer Shape B (text-only).
|
||||||
|
|
||||||
|
## How to invoke
|
||||||
|
|
||||||
|
### Shape A — task that writes/edits files on `desktop-local`
|
||||||
|
|
||||||
|
Use this when the remote model should produce file edits and the orchestrator is willing to sync them back afterward. Files land on **desktop-local**; not on this machine.
|
||||||
|
|
||||||
|
```
|
||||||
|
Agent({
|
||||||
|
subagent_type: "desktop-coder",
|
||||||
|
description: "<3-5 word summary>",
|
||||||
|
prompt: "
|
||||||
|
CWD: /home/daniel/src/gamedev/zemyna # path on desktop-local
|
||||||
|
|
||||||
|
Task: <one-paragraph description, imperative mood>
|
||||||
|
|
||||||
|
Files in scope (paths on desktop-local):
|
||||||
|
- <path>:<optional line range>
|
||||||
|
- <path>
|
||||||
|
|
||||||
|
Context (paste relevant snippets — keep under 8K tokens):
|
||||||
|
```<lang>
|
||||||
|
<relevant code>
|
||||||
|
```
|
||||||
|
|
||||||
|
Acceptance criteria:
|
||||||
|
- <bullet>
|
||||||
|
- <bullet>
|
||||||
|
|
||||||
|
Out of scope:
|
||||||
|
- <bullet — what NOT to touch>
|
||||||
|
- Don't run compile/test/lint checks — orchestrator will do that after you return.
|
||||||
|
"
|
||||||
|
})
|
||||||
|
```
|
||||||
|
|
||||||
|
**After the agent returns**, the orchestrator syncs edits back. Typical patterns:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Single file
|
||||||
|
scp desktop-local:/home/daniel/src/gamedev/zemyna/crates/foo/src/bar.rs \
|
||||||
|
/home/daniel/src/gamedev/zemyna/crates/foo/src/bar.rs
|
||||||
|
|
||||||
|
# Multiple files / a subtree
|
||||||
|
rsync -av desktop-local:/home/daniel/src/gamedev/zemyna/crates/foo/ \
|
||||||
|
/home/daniel/src/gamedev/zemyna/crates/foo/
|
||||||
|
|
||||||
|
# Or: the desktop-local checkout commits + pushes a branch, then orchestrator pulls
|
||||||
|
```
|
||||||
|
|
||||||
|
If the orchestrator is not prepared to do this, use **Shape B** instead.
|
||||||
|
|
||||||
|
### Shape B — task that returns text only (no file writes)
|
||||||
|
|
||||||
|
Use this when you want analysis, an explanation, a code snippet for the orchestrator to apply itself, or a summary. The model writes nothing on the remote — output comes back in the wrapper's stdout. No sync step needed.
|
||||||
|
|
||||||
|
```
|
||||||
|
Agent({
|
||||||
|
subagent_type: "desktop-coder",
|
||||||
|
description: "<3-5 word summary>",
|
||||||
|
prompt: "
|
||||||
|
Task: <one-paragraph description, imperative mood>
|
||||||
|
|
||||||
|
Context (paste relevant snippets — keep under 8K tokens):
|
||||||
|
```<lang>
|
||||||
|
<relevant code>
|
||||||
|
```
|
||||||
|
|
||||||
|
Output format:
|
||||||
|
- <e.g., 'one Rust function, no markdown fences, no explanation'>
|
||||||
|
- <e.g., 'bullet list of files that match the pattern, one per line'>
|
||||||
|
|
||||||
|
Out of scope:
|
||||||
|
- Don't write any files. Return your answer as plain text only.
|
||||||
|
- Don't run any commands.
|
||||||
|
"
|
||||||
|
})
|
||||||
|
```
|
||||||
|
|
||||||
|
**Concrete no-edit examples**:
|
||||||
|
|
||||||
|
- *Explain*: "Explain in 4 sentences what this function does (pasted below). Output: 4 sentences, plain text, no headings."
|
||||||
|
- *Snippet for orchestrator to paste*: "Write a Rust closure equivalent to this Python lambda: `lambda x, y: x * 2 + y`. Output: only the closure, one line, no `let` binding."
|
||||||
|
- *Translation*: "Translate this SQL `WHERE` clause to a `serde_json::Value` filter expression. Output: only the Rust expression."
|
||||||
|
|
||||||
|
### Shape C — direct Bash SSH, no subagent (guaranteed routing)
|
||||||
|
|
||||||
|
Use this when you need a **hard guarantee** the remote model actually ran — typically because the task is trivial enough that the subagent might decide to answer it directly from its own knowledge instead of invoking the wrapper. (Sonnet-as-subagent follows multi-paragraph rules ~95% of the time, but trivial one-liner tasks tempt any model to shortcut.)
|
||||||
|
|
||||||
|
You give up the subagent's structured report; you pay one Bash call's worth of orchestrator context for the wrapper's raw output.
|
||||||
|
|
||||||
|
```
|
||||||
|
Bash({
|
||||||
|
command: "ssh desktop-local \"bash -lc '~/llm/scripts/local-coder-task.sh'\" <<'TASK'\n<your task here>\nTASK\n",
|
||||||
|
description: "force-route through desktop model"
|
||||||
|
})
|
||||||
|
```
|
||||||
|
|
||||||
|
**`bash -lc` is required.** Non-interactive SSH does not source `~/.bash_profile`, so the remote `claude` binary (at `~/.local/bin/claude`) is not on PATH. The login shell sources it. Always wrap in `bash -lc '…'`.
|
||||||
|
|
||||||
|
The wrapper's stdout becomes the Bash result. You parse it yourself.
|
||||||
|
|
||||||
|
**When Shape C is the right call**:
|
||||||
|
- The task is one-liner-trivial (e.g., "convert this Python lambda to Rust").
|
||||||
|
- You're benchmarking the remote model and need every call to actually hit it.
|
||||||
|
- You're testing the remote stack (smoke test, latency measurement, output-format check).
|
||||||
|
- You suspect the subagent will shortcut because the task is too easy.
|
||||||
|
|
||||||
|
**When Shape A or B is still better**:
|
||||||
|
- Real coding subtasks where a structured report is useful.
|
||||||
|
- Tasks where the brief is long and you don't want to manage SSH heredoc escaping yourself.
|
||||||
|
|
||||||
|
### Quick decision tree
|
||||||
|
|
||||||
|
```
|
||||||
|
Task fits the remote model? ── no ──> keep on real Claude
|
||||||
|
│
|
||||||
|
yes
|
||||||
|
│
|
||||||
|
Will the model write files? ── yes ──> Shape A (subagent, sync back after)
|
||||||
|
│ └── If you can't or don't want to sync, use Shape B instead.
|
||||||
|
no
|
||||||
|
│
|
||||||
|
Is the task trivial enough that the
|
||||||
|
subagent might answer directly? ── yes ──> Shape C (direct Bash SSH, guaranteed)
|
||||||
|
│
|
||||||
|
no
|
||||||
|
│
|
||||||
|
└──> Shape B (subagent, text-only output)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Pre-flight checklist (orchestrator side)
|
||||||
|
|
||||||
|
Before invoking, mentally check:
|
||||||
|
|
||||||
|
1. **Sizing** — can the task be described in <500 tokens + ≤8K tokens of context? If no, scope-split or keep on real Claude.
|
||||||
|
2. **Cohesion** — is the task contained to 1-3 files? If it sprawls, keep on real Claude.
|
||||||
|
3. **Verifiability** — can you state an objective acceptance criterion (a passing test, a successful build, a grep returning N hits)? If you can't state how you'd know it worked, don't delegate.
|
||||||
|
4. **Recoverability** — if the remote model produces wrong output, can you `git checkout -- <files>` and try again on real Claude? If not (e.g., it's a brand-new file), reduce blast radius first.
|
||||||
|
5. **Cross-machine sync** — for Shape A, do you have a clear plan to pull the edits back? If not, downshift to Shape B.
|
||||||
|
|
||||||
|
## Stack health (drop into a Bash if unsure)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh desktop-local 'curl -sf http://127.0.0.1:8080/health' # llama-server (loads model on first start, ~65 s cold)
|
||||||
|
ssh desktop-local 'ccr status' # CCR
|
||||||
|
ssh desktop-local 'systemctl --user status llama-server' # if either above fails
|
||||||
|
```
|
||||||
|
|
||||||
|
The wrapper auto-starts both if missing. But on cold start, the first call takes ~65 s for model load. Subsequent calls (within the 30-min keep-alive) are warm.
|
||||||
|
|
||||||
|
## SSH health
|
||||||
|
|
||||||
|
If `ssh desktop-local …` itself fails, none of this works. Quick sanity:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh -o ConnectTimeout=3 desktop-local 'echo ok'
|
||||||
|
```
|
||||||
|
|
||||||
|
The hostname `desktop-local` must resolve (mDNS / `/etc/hosts` / SSH config) and the user must be able to log in non-interactively (key-based auth). If you get an interactive password prompt in a Bash call, the subagent will hang — fix auth before delegating.
|
||||||
|
|
||||||
|
## Failure handling
|
||||||
|
|
||||||
|
| Symptom | Likely cause | Action |
|
||||||
|
|---|---|---|
|
||||||
|
| Exit 255 — SSH error | Host unreachable, key auth not set up, hostname not resolving | Fix connectivity. Don't retry on real Claude (this isn't a capability issue). |
|
||||||
|
| Wrapper exit 2 — "llama-server failed health check" | Model load failed on remote (GPU contention, OOM) | `ssh desktop-local journalctl --user -u llama-server --since '5 min ago'`. Often: another GPU consumer started. `ssh desktop-local ~/llm/scripts/use-llama-server.sh` to force-restart clean. |
|
||||||
|
| Wrapper exit 1 — claude session error | CCR translation issue or context overflow on remote | Check `~/.claude-code-router/` logs on remote. Shrink the prompt context, retry. |
|
||||||
|
| Clean exit, output references edits that aren't there | Remote model hallucinated the edit | Fall back to real Claude. (No subagent verification step here — orchestrator verifies via its own SSH if Shape A was used.) |
|
||||||
|
| Clean exit, output is mid-sentence cut | Hit max_tokens or context overflow | Reduce prompt size and retry, OR raise max_tokens in the wrapper on desktop-local. |
|
||||||
|
| Repeated/looping output | Sampling broke (rare with our config) | Retry on real Claude — don't iterate on remote. |
|
||||||
|
|
||||||
|
## Anti-patterns
|
||||||
|
|
||||||
|
- **Don't retry the same task on the remote.** If first attempt fails, fall back to real Claude. Iterating burns wall clock without fixing the underlying capability gap.
|
||||||
|
- **Don't chain remote subagents.** Sequential remote calls compound error rate. Use real Claude as the connecting tissue.
|
||||||
|
- **Don't pass the orchestrator's full CLAUDE.md / rules context.** Wrapper uses `--bare` precisely to avoid this — the remote model gets a clean context. Pass only the task-relevant context inline.
|
||||||
|
- **Don't delegate work you wouldn't trust a junior dev to do with the same brief.** If the brief itself requires deep project knowledge to write correctly, the implementer needs it too.
|
||||||
|
- **Don't forget Shape A edits live on the *other* machine.** "I delegated, why doesn't the file have my change?" — because it's on desktop-local. Sync back.
|
||||||
|
|
||||||
|
## CLI usage (outside Claude Code)
|
||||||
|
|
||||||
|
Useful for testing the SSH path and remote stack without spawning a subagent:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
echo "Write a Rust function that reverses a string in-place." \
|
||||||
|
| ssh desktop-local "bash -lc '~/llm/scripts/local-coder-task.sh'"
|
||||||
|
```
|
||||||
|
|
||||||
|
Output goes to stdout. Same env, same flags as what the subagent uses.
|
||||||
|
|
||||||
|
## See also
|
||||||
|
|
||||||
|
- `~/.claude/agents/desktop-coder.md` — the subagent profile (this machine)
|
||||||
|
- `~/llm/scripts/local-coder-task.sh` — the wrapper, hosted on `desktop-local`
|
||||||
|
- `desktop-local:~/.claude/skills/local-delegate/SKILL.md` — the original local-only sibling skill (running this skill from the desktop itself)
|
||||||
Reference in New Issue
Block a user