Bring ~/.claude config under bombadil management across both machines:
- claude/shared/: converged settings.json (union of both hosts) and a single
Catppuccin-powerline statusline merged from the two machines' versions
- claude/xps, claude/desktop: per-host agents/skills behind [profiles.xps]/
[profiles.desktop]; each host links only its own via `bombadil link -p <theme> <host>`
Linked at file granularity because bombadil 4.2.0 can't create directory
symlinks for new targets, and to keep ~/.claude/{agents,skills} real dirs.
Add a Justfile (symlinked to ~/.justfile, usable via `just -g`) with link/
dark/light/watch/unlink/update/status/edit recipes; host auto-detected from
hostname. Recipes use exported shell vars to avoid bombadil's Tera engine
mis-parsing just's double-brace interpolation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
13 KiB
name, description
| name | description |
|---|---|
| desktop-delegate | Decide when and how to delegate a focused coding subtask to the Qwen3-Coder-30B model hosted on the `desktop-local` machine (via SSH), routed through the `desktop-coder` subagent. Use when the task is well-scoped, mechanical, fits in ~16K context, and doesn't need top-tier reasoning — saves Anthropic API tokens for the orchestrator's harder work. Anti-patterns — cross-file architectural changes, ambiguous requirements, performance tuning, anything requiring `cargo run`, or work whose output must immediately be on this machine without an explicit sync step. |
/desktop-delegate
A decision-support skill for the orchestrator. Triage whether a subtask fits the Qwen3-Coder model running on desktop-local, and if so, hand it off via the desktop-coder subagent with a properly-shaped prompt.
The remote stack
| Layer | What | Where |
|---|---|---|
| Host | desktop-local (daniel-desktop, Radeon RX 7900 XTX) |
reached via ssh desktop-local |
| Model | Qwen3-Coder-30B-A3B-Instruct, UD-Q5_K_XL quant | ~/llm/models/ on desktop-local |
| Inference | llama.cpp with Vulkan/RADV | systemctl --user … llama-server (port 8080, desktop-local) |
| API translator | claude-code-router (Anthropic ↔ OpenAI) | ccr (port 3456, desktop-local) |
| Wrapper | One-shot claude --print with ANTHROPIC_BASE_URL=ccr |
~/llm/scripts/local-coder-task.sh on desktop-local |
| Subagent | Haiku/Sonnet transport layer that SSHs + invokes the wrapper | ~/.claude/agents/desktop-coder.md (here) |
Performance: ~135-140 tok/s decode on the remote, ~100-200 ms TTFT (plus ~50-200 ms SSH round-trip on first byte). 32K context (practical task budget ~16-20K leaves room for output).
Important: cross-machine reality
The model runs on desktop-local. File edits, if any, land on desktop-local's filesystem at the wrapper's CWD. This machine (daniel-xps) does not see those edits unless explicitly synced back.
- Both machines have
~/src/gamedev/zemyna, but they are independent checkouts. An edit on one is not visible on the other. - Prefer text-output mode (Shape B/C) — the model returns code in its stdout, the orchestrator applies the change here. No sync step needed.
- If using file-edit mode (Shape A): the orchestrator is responsible for syncing back (
scp,rsync, orgit pullfrom a remote branch the desktop pushed). Mention this explicitly in the brief.
✅ Good fits
- Mechanical refactors — rename, extract helper, inline a constant, hoist a binding.
- Boilerplate scaffolding — new test file modeled on an existing one, getter/setter pairs, a CLI subcommand stub.
- Format normalization — rewrite docstrings to a target style, normalize import order, convert log macros.
- Single-file changes where the surrounding context fits in ~10K tokens.
- Cross-language translation — port a function from Python to Rust, convert XML config to TOML, etc.
- Lint-driven fixes where the lint message names the change ("inline this
format!", "remove unused import"). - Read-only inspection on the remote tree — "summarize what module X does on desktop-local" (rarely useful — usually you want the analysis on this machine's tree).
❌ Bad fits — keep on real Claude
- Cross-file architectural changes — model can't hold enough context to reason about ripple effects.
- Ambiguous requirements — anything needing "well, depends on…" judgment.
- Performance work — needs bench data, knowledge of the existing perf budget, system-level reasoning.
- Web research / external lookups — no web access through this pipe.
cargo run/ interactive smoke testing — wrapper'sclaude --printis non-interactive; no UI.- PR creation, git commits, branch ops — wrapper's Bash allowlist is read-only-ish for safety, and the relevant repo is on the wrong machine. Have the orchestrator handle git after the subagent returns.
- Anything novel — 30B model is fluent but doesn't have the depth on niche libraries / rare patterns.
- Anything where the output must be on this machine immediately without a sync step — prefer Shape B (text-only).
How to invoke
Shape A — task that writes/edits files on desktop-local
Use this when the remote model should produce file edits and the orchestrator is willing to sync them back afterward. Files land on desktop-local; not on this machine.
Agent({
subagent_type: "desktop-coder",
description: "<3-5 word summary>",
prompt: "
CWD: /home/daniel/src/gamedev/zemyna # path on desktop-local
Task: <one-paragraph description, imperative mood>
Files in scope (paths on desktop-local):
- <path>:<optional line range>
- <path>
Context (paste relevant snippets — keep under 8K tokens):
```<lang>
<relevant code>
```
Acceptance criteria:
- <bullet>
- <bullet>
Out of scope:
- <bullet — what NOT to touch>
- Don't run compile/test/lint checks — orchestrator will do that after you return.
"
})
After the agent returns, the orchestrator syncs edits back. Typical patterns:
# Single file
scp desktop-local:/home/daniel/src/gamedev/zemyna/crates/foo/src/bar.rs \
/home/daniel/src/gamedev/zemyna/crates/foo/src/bar.rs
# Multiple files / a subtree
rsync -av desktop-local:/home/daniel/src/gamedev/zemyna/crates/foo/ \
/home/daniel/src/gamedev/zemyna/crates/foo/
# Or: the desktop-local checkout commits + pushes a branch, then orchestrator pulls
If the orchestrator is not prepared to do this, use Shape B instead.
Shape B — task that returns text only (no file writes)
Use this when you want analysis, an explanation, a code snippet for the orchestrator to apply itself, or a summary. The model writes nothing on the remote — output comes back in the wrapper's stdout. No sync step needed.
Agent({
subagent_type: "desktop-coder",
description: "<3-5 word summary>",
prompt: "
Task: <one-paragraph description, imperative mood>
Context (paste relevant snippets — keep under 8K tokens):
```<lang>
<relevant code>
```
Output format:
- <e.g., 'one Rust function, no markdown fences, no explanation'>
- <e.g., 'bullet list of files that match the pattern, one per line'>
Out of scope:
- Don't write any files. Return your answer as plain text only.
- Don't run any commands.
"
})
Concrete no-edit examples:
- Explain: "Explain in 4 sentences what this function does (pasted below). Output: 4 sentences, plain text, no headings."
- Snippet for orchestrator to paste: "Write a Rust closure equivalent to this Python lambda:
lambda x, y: x * 2 + y. Output: only the closure, one line, noletbinding." - Translation: "Translate this SQL
WHEREclause to aserde_json::Valuefilter expression. Output: only the Rust expression."
Shape C — direct Bash SSH, no subagent (guaranteed routing)
Use this when you need a hard guarantee the remote model actually ran — typically because the task is trivial enough that the subagent might decide to answer it directly from its own knowledge instead of invoking the wrapper. (Sonnet-as-subagent follows multi-paragraph rules ~95% of the time, but trivial one-liner tasks tempt any model to shortcut.)
You give up the subagent's structured report; you pay one Bash call's worth of orchestrator context for the wrapper's raw output.
Bash({
command: "ssh desktop-local \"bash -lc '~/llm/scripts/local-coder-task.sh'\" <<'TASK'\n<your task here>\nTASK\n",
description: "force-route through desktop model"
})
bash -lc is required. Non-interactive SSH does not source ~/.bash_profile, so the remote claude binary (at ~/.local/bin/claude) is not on PATH. The login shell sources it. Always wrap in bash -lc '…'.
The wrapper's stdout becomes the Bash result. You parse it yourself.
When Shape C is the right call:
- The task is one-liner-trivial (e.g., "convert this Python lambda to Rust").
- You're benchmarking the remote model and need every call to actually hit it.
- You're testing the remote stack (smoke test, latency measurement, output-format check).
- You suspect the subagent will shortcut because the task is too easy.
When Shape A or B is still better:
- Real coding subtasks where a structured report is useful.
- Tasks where the brief is long and you don't want to manage SSH heredoc escaping yourself.
Quick decision tree
Task fits the remote model? ── no ──> keep on real Claude
│
yes
│
Will the model write files? ── yes ──> Shape A (subagent, sync back after)
│ └── If you can't or don't want to sync, use Shape B instead.
no
│
Is the task trivial enough that the
subagent might answer directly? ── yes ──> Shape C (direct Bash SSH, guaranteed)
│
no
│
└──> Shape B (subagent, text-only output)
Pre-flight checklist (orchestrator side)
Before invoking, mentally check:
- Sizing — can the task be described in <500 tokens + ≤8K tokens of context? If no, scope-split or keep on real Claude.
- Cohesion — is the task contained to 1-3 files? If it sprawls, keep on real Claude.
- Verifiability — can you state an objective acceptance criterion (a passing test, a successful build, a grep returning N hits)? If you can't state how you'd know it worked, don't delegate.
- Recoverability — if the remote model produces wrong output, can you
git checkout -- <files>and try again on real Claude? If not (e.g., it's a brand-new file), reduce blast radius first. - Cross-machine sync — for Shape A, do you have a clear plan to pull the edits back? If not, downshift to Shape B.
Stack health (drop into a Bash if unsure)
ssh desktop-local 'curl -sf http://127.0.0.1:8080/health' # llama-server (loads model on first start, ~65 s cold)
ssh desktop-local 'ccr status' # CCR
ssh desktop-local 'systemctl --user status llama-server' # if either above fails
The wrapper auto-starts both if missing. But on cold start, the first call takes ~65 s for model load. Subsequent calls (within the 30-min keep-alive) are warm.
SSH health
If ssh desktop-local … itself fails, none of this works. Quick sanity:
ssh -o ConnectTimeout=3 desktop-local 'echo ok'
The hostname desktop-local must resolve (mDNS / /etc/hosts / SSH config) and the user must be able to log in non-interactively (key-based auth). If you get an interactive password prompt in a Bash call, the subagent will hang — fix auth before delegating.
Failure handling
| Symptom | Likely cause | Action |
|---|---|---|
| Exit 255 — SSH error | Host unreachable, key auth not set up, hostname not resolving | Fix connectivity. Don't retry on real Claude (this isn't a capability issue). |
| Wrapper exit 2 — "llama-server failed health check" | Model load failed on remote (GPU contention, OOM) | ssh desktop-local journalctl --user -u llama-server --since '5 min ago'. Often: another GPU consumer started. ssh desktop-local ~/llm/scripts/use-llama-server.sh to force-restart clean. |
| Wrapper exit 1 — claude session error | CCR translation issue or context overflow on remote | Check ~/.claude-code-router/ logs on remote. Shrink the prompt context, retry. |
| Clean exit, output references edits that aren't there | Remote model hallucinated the edit | Fall back to real Claude. (No subagent verification step here — orchestrator verifies via its own SSH if Shape A was used.) |
| Clean exit, output is mid-sentence cut | Hit max_tokens or context overflow | Reduce prompt size and retry, OR raise max_tokens in the wrapper on desktop-local. |
| Repeated/looping output | Sampling broke (rare with our config) | Retry on real Claude — don't iterate on remote. |
Anti-patterns
- Don't retry the same task on the remote. If first attempt fails, fall back to real Claude. Iterating burns wall clock without fixing the underlying capability gap.
- Don't chain remote subagents. Sequential remote calls compound error rate. Use real Claude as the connecting tissue.
- Don't pass the orchestrator's full CLAUDE.md / rules context. Wrapper uses
--bareprecisely to avoid this — the remote model gets a clean context. Pass only the task-relevant context inline. - Don't delegate work you wouldn't trust a junior dev to do with the same brief. If the brief itself requires deep project knowledge to write correctly, the implementer needs it too.
- Don't forget Shape A edits live on the other machine. "I delegated, why doesn't the file have my change?" — because it's on desktop-local. Sync back.
CLI usage (outside Claude Code)
Useful for testing the SSH path and remote stack without spawning a subagent:
echo "Write a Rust function that reverses a string in-place." \
| ssh desktop-local "bash -lc '~/llm/scripts/local-coder-task.sh'"
Output goes to stdout. Same env, same flags as what the subagent uses.
See also
~/.claude/agents/desktop-coder.md— the subagent profile (this machine)~/llm/scripts/local-coder-task.sh— the wrapper, hosted ondesktop-localdesktop-local:~/.claude/skills/local-delegate/SKILL.md— the original local-only sibling skill (running this skill from the desktop itself)