Files
daniel e884e4a88f Manage Claude Code config + add Justfile via bombadil
Bring ~/.claude config under bombadil management across both machines:
- claude/shared/: converged settings.json (union of both hosts) and a single
  Catppuccin-powerline statusline merged from the two machines' versions
- claude/xps, claude/desktop: per-host agents/skills behind [profiles.xps]/
  [profiles.desktop]; each host links only its own via `bombadil link -p <theme> <host>`

Linked at file granularity because bombadil 4.2.0 can't create directory
symlinks for new targets, and to keep ~/.claude/{agents,skills} real dirs.

Add a Justfile (symlinked to ~/.justfile, usable via `just -g`) with link/
dark/light/watch/unlink/update/status/edit recipes; host auto-detected from
hostname. Recipes use exported shell vars to avoid bombadil's Tera engine
mis-parsing just's double-brace interpolation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 17:17:17 -05:00

13 KiB

name, description
name description
desktop-delegate Decide when and how to delegate a focused coding subtask to the Qwen3-Coder-30B model hosted on the `desktop-local` machine (via SSH), routed through the `desktop-coder` subagent. Use when the task is well-scoped, mechanical, fits in ~16K context, and doesn't need top-tier reasoning — saves Anthropic API tokens for the orchestrator's harder work. Anti-patterns — cross-file architectural changes, ambiguous requirements, performance tuning, anything requiring `cargo run`, or work whose output must immediately be on this machine without an explicit sync step.

/desktop-delegate

A decision-support skill for the orchestrator. Triage whether a subtask fits the Qwen3-Coder model running on desktop-local, and if so, hand it off via the desktop-coder subagent with a properly-shaped prompt.

The remote stack

Layer What Where
Host desktop-local (daniel-desktop, Radeon RX 7900 XTX) reached via ssh desktop-local
Model Qwen3-Coder-30B-A3B-Instruct, UD-Q5_K_XL quant ~/llm/models/ on desktop-local
Inference llama.cpp with Vulkan/RADV systemctl --user … llama-server (port 8080, desktop-local)
API translator claude-code-router (Anthropic ↔ OpenAI) ccr (port 3456, desktop-local)
Wrapper One-shot claude --print with ANTHROPIC_BASE_URL=ccr ~/llm/scripts/local-coder-task.sh on desktop-local
Subagent Haiku/Sonnet transport layer that SSHs + invokes the wrapper ~/.claude/agents/desktop-coder.md (here)

Performance: ~135-140 tok/s decode on the remote, ~100-200 ms TTFT (plus ~50-200 ms SSH round-trip on first byte). 32K context (practical task budget ~16-20K leaves room for output).

Important: cross-machine reality

The model runs on desktop-local. File edits, if any, land on desktop-local's filesystem at the wrapper's CWD. This machine (daniel-xps) does not see those edits unless explicitly synced back.

  • Both machines have ~/src/gamedev/zemyna, but they are independent checkouts. An edit on one is not visible on the other.
  • Prefer text-output mode (Shape B/C) — the model returns code in its stdout, the orchestrator applies the change here. No sync step needed.
  • If using file-edit mode (Shape A): the orchestrator is responsible for syncing back (scp, rsync, or git pull from a remote branch the desktop pushed). Mention this explicitly in the brief.

Good fits

  • Mechanical refactors — rename, extract helper, inline a constant, hoist a binding.
  • Boilerplate scaffolding — new test file modeled on an existing one, getter/setter pairs, a CLI subcommand stub.
  • Format normalization — rewrite docstrings to a target style, normalize import order, convert log macros.
  • Single-file changes where the surrounding context fits in ~10K tokens.
  • Cross-language translation — port a function from Python to Rust, convert XML config to TOML, etc.
  • Lint-driven fixes where the lint message names the change ("inline this format!", "remove unused import").
  • Read-only inspection on the remote tree — "summarize what module X does on desktop-local" (rarely useful — usually you want the analysis on this machine's tree).

Bad fits — keep on real Claude

  • Cross-file architectural changes — model can't hold enough context to reason about ripple effects.
  • Ambiguous requirements — anything needing "well, depends on…" judgment.
  • Performance work — needs bench data, knowledge of the existing perf budget, system-level reasoning.
  • Web research / external lookups — no web access through this pipe.
  • cargo run / interactive smoke testing — wrapper's claude --print is non-interactive; no UI.
  • PR creation, git commits, branch ops — wrapper's Bash allowlist is read-only-ish for safety, and the relevant repo is on the wrong machine. Have the orchestrator handle git after the subagent returns.
  • Anything novel — 30B model is fluent but doesn't have the depth on niche libraries / rare patterns.
  • Anything where the output must be on this machine immediately without a sync step — prefer Shape B (text-only).

How to invoke

Shape A — task that writes/edits files on desktop-local

Use this when the remote model should produce file edits and the orchestrator is willing to sync them back afterward. Files land on desktop-local; not on this machine.

Agent({
  subagent_type: "desktop-coder",
  description: "<3-5 word summary>",
  prompt: "
    CWD: /home/daniel/src/gamedev/zemyna   # path on desktop-local

    Task: <one-paragraph description, imperative mood>

    Files in scope (paths on desktop-local):
      - <path>:<optional line range>
      - <path>

    Context (paste relevant snippets — keep under 8K tokens):
      ```<lang>
      <relevant code>
      ```

    Acceptance criteria:
      - <bullet>
      - <bullet>

    Out of scope:
      - <bullet — what NOT to touch>
      - Don't run compile/test/lint checks — orchestrator will do that after you return.
  "
})

After the agent returns, the orchestrator syncs edits back. Typical patterns:

# Single file
scp desktop-local:/home/daniel/src/gamedev/zemyna/crates/foo/src/bar.rs \
    /home/daniel/src/gamedev/zemyna/crates/foo/src/bar.rs

# Multiple files / a subtree
rsync -av desktop-local:/home/daniel/src/gamedev/zemyna/crates/foo/ \
          /home/daniel/src/gamedev/zemyna/crates/foo/

# Or: the desktop-local checkout commits + pushes a branch, then orchestrator pulls

If the orchestrator is not prepared to do this, use Shape B instead.

Shape B — task that returns text only (no file writes)

Use this when you want analysis, an explanation, a code snippet for the orchestrator to apply itself, or a summary. The model writes nothing on the remote — output comes back in the wrapper's stdout. No sync step needed.

Agent({
  subagent_type: "desktop-coder",
  description: "<3-5 word summary>",
  prompt: "
    Task: <one-paragraph description, imperative mood>

    Context (paste relevant snippets — keep under 8K tokens):
      ```<lang>
      <relevant code>
      ```

    Output format:
      - <e.g., 'one Rust function, no markdown fences, no explanation'>
      - <e.g., 'bullet list of files that match the pattern, one per line'>

    Out of scope:
      - Don't write any files. Return your answer as plain text only.
      - Don't run any commands.
  "
})

Concrete no-edit examples:

  • Explain: "Explain in 4 sentences what this function does (pasted below). Output: 4 sentences, plain text, no headings."
  • Snippet for orchestrator to paste: "Write a Rust closure equivalent to this Python lambda: lambda x, y: x * 2 + y. Output: only the closure, one line, no let binding."
  • Translation: "Translate this SQL WHERE clause to a serde_json::Value filter expression. Output: only the Rust expression."

Shape C — direct Bash SSH, no subagent (guaranteed routing)

Use this when you need a hard guarantee the remote model actually ran — typically because the task is trivial enough that the subagent might decide to answer it directly from its own knowledge instead of invoking the wrapper. (Sonnet-as-subagent follows multi-paragraph rules ~95% of the time, but trivial one-liner tasks tempt any model to shortcut.)

You give up the subagent's structured report; you pay one Bash call's worth of orchestrator context for the wrapper's raw output.

Bash({
  command: "ssh desktop-local \"bash -lc '~/llm/scripts/local-coder-task.sh'\" <<'TASK'\n<your task here>\nTASK\n",
  description: "force-route through desktop model"
})

bash -lc is required. Non-interactive SSH does not source ~/.bash_profile, so the remote claude binary (at ~/.local/bin/claude) is not on PATH. The login shell sources it. Always wrap in bash -lc '…'.

The wrapper's stdout becomes the Bash result. You parse it yourself.

When Shape C is the right call:

  • The task is one-liner-trivial (e.g., "convert this Python lambda to Rust").
  • You're benchmarking the remote model and need every call to actually hit it.
  • You're testing the remote stack (smoke test, latency measurement, output-format check).
  • You suspect the subagent will shortcut because the task is too easy.

When Shape A or B is still better:

  • Real coding subtasks where a structured report is useful.
  • Tasks where the brief is long and you don't want to manage SSH heredoc escaping yourself.

Quick decision tree

Task fits the remote model? ── no ──> keep on real Claude
        │
        yes
        │
Will the model write files? ── yes ──> Shape A (subagent, sync back after)
        │                              └── If you can't or don't want to sync, use Shape B instead.
        no
        │
Is the task trivial enough that the
subagent might answer directly?    ── yes ──> Shape C (direct Bash SSH, guaranteed)
        │
        no
        │
        └──> Shape B (subagent, text-only output)

Pre-flight checklist (orchestrator side)

Before invoking, mentally check:

  1. Sizing — can the task be described in <500 tokens + ≤8K tokens of context? If no, scope-split or keep on real Claude.
  2. Cohesion — is the task contained to 1-3 files? If it sprawls, keep on real Claude.
  3. Verifiability — can you state an objective acceptance criterion (a passing test, a successful build, a grep returning N hits)? If you can't state how you'd know it worked, don't delegate.
  4. Recoverability — if the remote model produces wrong output, can you git checkout -- <files> and try again on real Claude? If not (e.g., it's a brand-new file), reduce blast radius first.
  5. Cross-machine sync — for Shape A, do you have a clear plan to pull the edits back? If not, downshift to Shape B.

Stack health (drop into a Bash if unsure)

ssh desktop-local 'curl -sf http://127.0.0.1:8080/health'   # llama-server (loads model on first start, ~65 s cold)
ssh desktop-local 'ccr status'                              # CCR
ssh desktop-local 'systemctl --user status llama-server'    # if either above fails

The wrapper auto-starts both if missing. But on cold start, the first call takes ~65 s for model load. Subsequent calls (within the 30-min keep-alive) are warm.

SSH health

If ssh desktop-local … itself fails, none of this works. Quick sanity:

ssh -o ConnectTimeout=3 desktop-local 'echo ok'

The hostname desktop-local must resolve (mDNS / /etc/hosts / SSH config) and the user must be able to log in non-interactively (key-based auth). If you get an interactive password prompt in a Bash call, the subagent will hang — fix auth before delegating.

Failure handling

Symptom Likely cause Action
Exit 255 — SSH error Host unreachable, key auth not set up, hostname not resolving Fix connectivity. Don't retry on real Claude (this isn't a capability issue).
Wrapper exit 2 — "llama-server failed health check" Model load failed on remote (GPU contention, OOM) ssh desktop-local journalctl --user -u llama-server --since '5 min ago'. Often: another GPU consumer started. ssh desktop-local ~/llm/scripts/use-llama-server.sh to force-restart clean.
Wrapper exit 1 — claude session error CCR translation issue or context overflow on remote Check ~/.claude-code-router/ logs on remote. Shrink the prompt context, retry.
Clean exit, output references edits that aren't there Remote model hallucinated the edit Fall back to real Claude. (No subagent verification step here — orchestrator verifies via its own SSH if Shape A was used.)
Clean exit, output is mid-sentence cut Hit max_tokens or context overflow Reduce prompt size and retry, OR raise max_tokens in the wrapper on desktop-local.
Repeated/looping output Sampling broke (rare with our config) Retry on real Claude — don't iterate on remote.

Anti-patterns

  • Don't retry the same task on the remote. If first attempt fails, fall back to real Claude. Iterating burns wall clock without fixing the underlying capability gap.
  • Don't chain remote subagents. Sequential remote calls compound error rate. Use real Claude as the connecting tissue.
  • Don't pass the orchestrator's full CLAUDE.md / rules context. Wrapper uses --bare precisely to avoid this — the remote model gets a clean context. Pass only the task-relevant context inline.
  • Don't delegate work you wouldn't trust a junior dev to do with the same brief. If the brief itself requires deep project knowledge to write correctly, the implementer needs it too.
  • Don't forget Shape A edits live on the other machine. "I delegated, why doesn't the file have my change?" — because it's on desktop-local. Sync back.

CLI usage (outside Claude Code)

Useful for testing the SSH path and remote stack without spawning a subagent:

echo "Write a Rust function that reverses a string in-place." \
  | ssh desktop-local "bash -lc '~/llm/scripts/local-coder-task.sh'"

Output goes to stdout. Same env, same flags as what the subagent uses.

See also

  • ~/.claude/agents/desktop-coder.md — the subagent profile (this machine)
  • ~/llm/scripts/local-coder-task.sh — the wrapper, hosted on desktop-local
  • desktop-local:~/.claude/skills/local-delegate/SKILL.md — the original local-only sibling skill (running this skill from the desktop itself)