--- name: desktop-delegate description: Decide when and how to delegate a focused coding subtask to the Qwen3-Coder-30B model hosted on the `desktop-local` machine (via SSH), routed through the `desktop-coder` subagent. Use when the task is well-scoped, mechanical, fits in ~16K context, and doesn't need top-tier reasoning — saves Anthropic API tokens for the orchestrator's harder work. Anti-patterns — cross-file architectural changes, ambiguous requirements, performance tuning, anything requiring `cargo run`, or work whose output must immediately be on this machine without an explicit sync step. --- # /desktop-delegate A decision-support skill for the orchestrator. Triage whether a subtask fits the Qwen3-Coder model running on `desktop-local`, and if so, hand it off via the `desktop-coder` subagent with a properly-shaped prompt. ## The remote stack | Layer | What | Where | |---|---|---| | Host | `desktop-local` (daniel-desktop, Radeon RX 7900 XTX) | reached via `ssh desktop-local` | | Model | Qwen3-Coder-30B-A3B-Instruct, UD-Q5_K_XL quant | `~/llm/models/` on desktop-local | | Inference | llama.cpp with Vulkan/RADV | `systemctl --user … llama-server` (port 8080, desktop-local) | | API translator | claude-code-router (Anthropic ↔ OpenAI) | `ccr` (port 3456, desktop-local) | | Wrapper | One-shot `claude --print` with `ANTHROPIC_BASE_URL=ccr` | `~/llm/scripts/local-coder-task.sh` on desktop-local | | Subagent | Haiku/Sonnet transport layer that SSHs + invokes the wrapper | `~/.claude/agents/desktop-coder.md` (here) | **Performance**: ~135-140 tok/s decode on the remote, ~100-200 ms TTFT (plus ~50-200 ms SSH round-trip on first byte). 32K context (practical task budget ~16-20K leaves room for output). ## Important: cross-machine reality The model runs on `desktop-local`. File edits, if any, land on `desktop-local`'s filesystem at the wrapper's CWD. This machine (`daniel-xps`) does not see those edits unless explicitly synced back. - **Both machines have `~/src/gamedev/zemyna`**, but they are **independent checkouts**. An edit on one is not visible on the other. - **Prefer text-output mode (Shape B/C)** — the model returns code in its stdout, the orchestrator applies the change here. No sync step needed. - **If using file-edit mode (Shape A)**: the orchestrator is responsible for syncing back (`scp`, `rsync`, or `git pull` from a remote branch the desktop pushed). Mention this explicitly in the brief. ## ✅ Good fits - **Mechanical refactors** — rename, extract helper, inline a constant, hoist a binding. - **Boilerplate scaffolding** — new test file modeled on an existing one, getter/setter pairs, a CLI subcommand stub. - **Format normalization** — rewrite docstrings to a target style, normalize import order, convert log macros. - **Single-file changes** where the surrounding context fits in ~10K tokens. - **Cross-language translation** — port a function from Python to Rust, convert XML config to TOML, etc. - **Lint-driven fixes** where the lint message names the change ("inline this `format!`", "remove unused import"). - **Read-only inspection on the remote tree** — "summarize what module X does on desktop-local" (rarely useful — usually you want the analysis on this machine's tree). ## ❌ Bad fits — keep on real Claude - **Cross-file architectural changes** — model can't hold enough context to reason about ripple effects. - **Ambiguous requirements** — anything needing "well, depends on…" judgment. - **Performance work** — needs bench data, knowledge of the existing perf budget, system-level reasoning. - **Web research / external lookups** — no web access through this pipe. - **`cargo run` / interactive smoke testing** — wrapper's `claude --print` is non-interactive; no UI. - **PR creation, git commits, branch ops** — wrapper's Bash allowlist is read-only-ish for safety, and the relevant repo is on the wrong machine. Have the orchestrator handle git after the subagent returns. - **Anything novel** — 30B model is fluent but doesn't have the depth on niche libraries / rare patterns. - **Anything where the output must be on *this* machine immediately** without a sync step — prefer Shape B (text-only). ## How to invoke ### Shape A — task that writes/edits files on `desktop-local` Use this when the remote model should produce file edits and the orchestrator is willing to sync them back afterward. Files land on **desktop-local**; not on this machine. ``` Agent({ subagent_type: "desktop-coder", description: "<3-5 word summary>", prompt: " CWD: /home/daniel/src/gamedev/zemyna # path on desktop-local Task: Files in scope (paths on desktop-local): - : - Context (paste relevant snippets — keep under 8K tokens): ``` ``` Acceptance criteria: - - Out of scope: - - Don't run compile/test/lint checks — orchestrator will do that after you return. " }) ``` **After the agent returns**, the orchestrator syncs edits back. Typical patterns: ```bash # Single file scp desktop-local:/home/daniel/src/gamedev/zemyna/crates/foo/src/bar.rs \ /home/daniel/src/gamedev/zemyna/crates/foo/src/bar.rs # Multiple files / a subtree rsync -av desktop-local:/home/daniel/src/gamedev/zemyna/crates/foo/ \ /home/daniel/src/gamedev/zemyna/crates/foo/ # Or: the desktop-local checkout commits + pushes a branch, then orchestrator pulls ``` If the orchestrator is not prepared to do this, use **Shape B** instead. ### Shape B — task that returns text only (no file writes) Use this when you want analysis, an explanation, a code snippet for the orchestrator to apply itself, or a summary. The model writes nothing on the remote — output comes back in the wrapper's stdout. No sync step needed. ``` Agent({ subagent_type: "desktop-coder", description: "<3-5 word summary>", prompt: " Task: Context (paste relevant snippets — keep under 8K tokens): ``` ``` Output format: - - Out of scope: - Don't write any files. Return your answer as plain text only. - Don't run any commands. " }) ``` **Concrete no-edit examples**: - *Explain*: "Explain in 4 sentences what this function does (pasted below). Output: 4 sentences, plain text, no headings." - *Snippet for orchestrator to paste*: "Write a Rust closure equivalent to this Python lambda: `lambda x, y: x * 2 + y`. Output: only the closure, one line, no `let` binding." - *Translation*: "Translate this SQL `WHERE` clause to a `serde_json::Value` filter expression. Output: only the Rust expression." ### Shape C — direct Bash SSH, no subagent (guaranteed routing) Use this when you need a **hard guarantee** the remote model actually ran — typically because the task is trivial enough that the subagent might decide to answer it directly from its own knowledge instead of invoking the wrapper. (Sonnet-as-subagent follows multi-paragraph rules ~95% of the time, but trivial one-liner tasks tempt any model to shortcut.) You give up the subagent's structured report; you pay one Bash call's worth of orchestrator context for the wrapper's raw output. ``` Bash({ command: "ssh desktop-local \"bash -lc '~/llm/scripts/local-coder-task.sh'\" <<'TASK'\n\nTASK\n", description: "force-route through desktop model" }) ``` **`bash -lc` is required.** Non-interactive SSH does not source `~/.bash_profile`, so the remote `claude` binary (at `~/.local/bin/claude`) is not on PATH. The login shell sources it. Always wrap in `bash -lc '…'`. The wrapper's stdout becomes the Bash result. You parse it yourself. **When Shape C is the right call**: - The task is one-liner-trivial (e.g., "convert this Python lambda to Rust"). - You're benchmarking the remote model and need every call to actually hit it. - You're testing the remote stack (smoke test, latency measurement, output-format check). - You suspect the subagent will shortcut because the task is too easy. **When Shape A or B is still better**: - Real coding subtasks where a structured report is useful. - Tasks where the brief is long and you don't want to manage SSH heredoc escaping yourself. ### Quick decision tree ``` Task fits the remote model? ── no ──> keep on real Claude │ yes │ Will the model write files? ── yes ──> Shape A (subagent, sync back after) │ └── If you can't or don't want to sync, use Shape B instead. no │ Is the task trivial enough that the subagent might answer directly? ── yes ──> Shape C (direct Bash SSH, guaranteed) │ no │ └──> Shape B (subagent, text-only output) ``` ## Pre-flight checklist (orchestrator side) Before invoking, mentally check: 1. **Sizing** — can the task be described in <500 tokens + ≤8K tokens of context? If no, scope-split or keep on real Claude. 2. **Cohesion** — is the task contained to 1-3 files? If it sprawls, keep on real Claude. 3. **Verifiability** — can you state an objective acceptance criterion (a passing test, a successful build, a grep returning N hits)? If you can't state how you'd know it worked, don't delegate. 4. **Recoverability** — if the remote model produces wrong output, can you `git checkout -- ` and try again on real Claude? If not (e.g., it's a brand-new file), reduce blast radius first. 5. **Cross-machine sync** — for Shape A, do you have a clear plan to pull the edits back? If not, downshift to Shape B. ## Stack health (drop into a Bash if unsure) ```bash ssh desktop-local 'curl -sf http://127.0.0.1:8080/health' # llama-server (loads model on first start, ~65 s cold) ssh desktop-local 'ccr status' # CCR ssh desktop-local 'systemctl --user status llama-server' # if either above fails ``` The wrapper auto-starts both if missing. But on cold start, the first call takes ~65 s for model load. Subsequent calls (within the 30-min keep-alive) are warm. ## SSH health If `ssh desktop-local …` itself fails, none of this works. Quick sanity: ```bash ssh -o ConnectTimeout=3 desktop-local 'echo ok' ``` The hostname `desktop-local` must resolve (mDNS / `/etc/hosts` / SSH config) and the user must be able to log in non-interactively (key-based auth). If you get an interactive password prompt in a Bash call, the subagent will hang — fix auth before delegating. ## Failure handling | Symptom | Likely cause | Action | |---|---|---| | Exit 255 — SSH error | Host unreachable, key auth not set up, hostname not resolving | Fix connectivity. Don't retry on real Claude (this isn't a capability issue). | | Wrapper exit 2 — "llama-server failed health check" | Model load failed on remote (GPU contention, OOM) | `ssh desktop-local journalctl --user -u llama-server --since '5 min ago'`. Often: another GPU consumer started. `ssh desktop-local ~/llm/scripts/use-llama-server.sh` to force-restart clean. | | Wrapper exit 1 — claude session error | CCR translation issue or context overflow on remote | Check `~/.claude-code-router/` logs on remote. Shrink the prompt context, retry. | | Clean exit, output references edits that aren't there | Remote model hallucinated the edit | Fall back to real Claude. (No subagent verification step here — orchestrator verifies via its own SSH if Shape A was used.) | | Clean exit, output is mid-sentence cut | Hit max_tokens or context overflow | Reduce prompt size and retry, OR raise max_tokens in the wrapper on desktop-local. | | Repeated/looping output | Sampling broke (rare with our config) | Retry on real Claude — don't iterate on remote. | ## Anti-patterns - **Don't retry the same task on the remote.** If first attempt fails, fall back to real Claude. Iterating burns wall clock without fixing the underlying capability gap. - **Don't chain remote subagents.** Sequential remote calls compound error rate. Use real Claude as the connecting tissue. - **Don't pass the orchestrator's full CLAUDE.md / rules context.** Wrapper uses `--bare` precisely to avoid this — the remote model gets a clean context. Pass only the task-relevant context inline. - **Don't delegate work you wouldn't trust a junior dev to do with the same brief.** If the brief itself requires deep project knowledge to write correctly, the implementer needs it too. - **Don't forget Shape A edits live on the *other* machine.** "I delegated, why doesn't the file have my change?" — because it's on desktop-local. Sync back. ## CLI usage (outside Claude Code) Useful for testing the SSH path and remote stack without spawning a subagent: ```bash echo "Write a Rust function that reverses a string in-place." \ | ssh desktop-local "bash -lc '~/llm/scripts/local-coder-task.sh'" ``` Output goes to stdout. Same env, same flags as what the subagent uses. ## See also - `~/.claude/agents/desktop-coder.md` — the subagent profile (this machine) - `~/llm/scripts/local-coder-task.sh` — the wrapper, hosted on `desktop-local` - `desktop-local:~/.claude/skills/local-delegate/SKILL.md` — the original local-only sibling skill (running this skill from the desktop itself)