Bring ~/.claude config under bombadil management across both machines:
- claude/shared/: converged settings.json (union of both hosts) and a single
Catppuccin-powerline statusline merged from the two machines' versions
- claude/xps, claude/desktop: per-host agents/skills behind [profiles.xps]/
[profiles.desktop]; each host links only its own via `bombadil link -p <theme> <host>`
Linked at file granularity because bombadil 4.2.0 can't create directory
symlinks for new targets, and to keep ~/.claude/{agents,skills} real dirs.
Add a Justfile (symlinked to ~/.justfile, usable via `just -g`) with link/
dark/light/watch/unlink/update/status/edit recipes; host auto-detected from
hostname. Recipes use exported shell vars to avoid bombadil's Tera engine
mis-parsing just's double-brace interpolation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3.8 KiB
name, description
| name | description |
|---|---|
| unload-local-model | Unload the local llama.cpp model (Qwen3-Coder-30B) from the 7900 XTX to free VRAM. Stops the llama-server systemd user service and reaps any stray foreground server. Idempotent — safe to run when already unloaded. Use when done with local-model work or when you want the GPU's VRAM back. |
/unload-local-model
Free the GPU by unloading the local Qwen3-Coder-30B model that backs the
local-coder subagent (see local-delegate). The
model is served by llama-server (llama.cpp) and pins ~9.5 GB of VRAM on the
Radeon RX 7900 XTX while resident. This skill stops it cleanly and verifies the
VRAM is back.
What holds the GPU
| Layer | Holds VRAM? | This skill touches it? |
|---|---|---|
llama-server.service (systemd --user, port 8080) |
Yes — the model weights + KV cache | Stops it |
stray foreground llama-server (from llama-server-foreground.sh) |
Yes, if running outside systemd | Reaps it |
claude-code-router / ccr (port 3456) |
No — pure API translator, no VRAM | Left running |
ollama daemon (port 11434) |
Only while a model is loaded | Out of scope — see note below |
Leaving CCR up is deliberate: it holds no VRAM and re-attaches to llama-server the next time the stack warms. There is nothing to restart.
Run it
# 1. Canonical path — stop the systemd user service (idempotent; no-op if dead).
systemctl --user stop llama-server.service
# 2. Reap any stray foreground server started outside systemd. Match the binary
# PATH (leading slash) — NOT the bare word "llama-server", or pkill matches
# its own command line and SIGTERMs the shell running this skill.
pkill -f '/llama-server ' 2>/dev/null || true
Verify
echo "service: $(systemctl --user is-active llama-server.service)" # want: inactive
pgrep -af '/llama-server' | grep -v pgrep || echo "no server process" # want: none
curl -sf --max-time 2 http://127.0.0.1:8080/health >/dev/null 2>&1 \
&& echo "port 8080: UP (STILL LOADED)" || echo "port 8080: down (unloaded)"
# VRAM should drop to desktop baseline (~2.4 GiB); a loaded model adds ~9.5 GB.
rocm-smi --showmeminfo vram 2>/dev/null | awk '/Used/{printf "VRAM used: ~%d MiB\n", $NF/1024/1024}'
A clean unload reads: service: inactive, no server process, port 8080: down, VRAM near the desktop baseline.
Gotchas
- Self-pkill footgun.
pkill -f 'llama-server'(no slash) matches this skill's own command string and kills the shell mid-run (exit 144 = SIGTERM). Always anchor on the binary path:pkill -f '/llama-server '. - Already unloaded is the common case. The systemd unit is
disabledand only runs on demand (the wrapper auto-starts it), so most of the time the model is already down. The skill is idempotent — running it then is a no-op that just confirms state. Report "already unloaded" rather than implying you stopped something. - Don't disable or mask the service. Stopping unloads the model; the next
/local-delegatecall auto-starts it again (~65 s cold load). Disabling would break that auto-start. Stop only.
Note on ollama
The stack can alternatively serve the same model via the ollama daemon (port
11434). If a request asks to free the GPU broadly and ollama has a model
resident, also run:
ollama stop qwen3-coder-30b-a3b-q5kxl 2>/dev/null || true
This skill's default scope is the llama.cpp path (llama-server), which is what
local-coder uses. Reach for the ollama stop only when ollama is the active
backend (~/llm/scripts/use-ollama.sh was run).
See also
- local-delegate — when/how to use the local model.
~/llm/scripts/use-ollama.sh— stops llama-server so ollama can take the GPU.~/llm/scripts/use-llama-server.sh— the inverse: load llama-server, free ollama.