Buckets:

bochen2079
/

katherine-k0

Files

xet

bochen2079/katherine-k0 / logs /RUNBOOK.md

bochen2079

15 days ago

preview code

download

raw

15.5 kB

	# RUNBOOK — Katherine k0 fine-tune on RunPod

	Audience: human operator (Bo), or any AI assistant the human is collaborating with. This document is self-contained — read top to bottom or jump to a phase.

	Pipeline: SFT → DPO → merge + GGUF (3 quants) → push to HF bucket. ~50-70 min wallclock on 1× H200.

	---

	## TLDR / BLUF (read this first, ~60 sec)

	Goal: Train Qwen3.5-9B into the Katherine k0 embodied persona, produce 3 GGUF quants, push everything to HuggingFace.

	Total cost: ~$3-5 on RunPod Secure Cloud.

	One-liner on the pod:

	```bash
	curl -sSL https://raw.githubusercontent.com/bochen2029-pixel/katherine-k0-finetune/master/bootstrap-runpod.sh \| bash
	cd ~/katherine-k0-finetune
	export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx # your write-scope HF token
	./run-cloud-runpod.sh
	```

	End state: `bochen2079/katherine-k0` HF bucket contains:
	- `k0_sft_adapter/` — LoRA delta after SFT
	- `k0_dpo_adapter/` — LoRA delta after DPO (final adapter)
	- `gguf/gguf_q4_k_m/*.gguf` (~5.5 GB)
	- `gguf/gguf_q5_k_m/*.gguf` (~6.5 GB) ← daily-use sweet spot
	- `gguf/gguf_q6_k/*.gguf` (~7.7 GB) ← quality-critical reference
	- `data/` — snapshot of training data
	- `logs/` — stderr + watchdog logs

	---

	## Pre-flight checklist (3 min)

	- [ ] RunPod account with payment method (https://www.runpod.io/console)
	- [ ] HuggingFace token with write scope (https://huggingface.co/settings/tokens)
	- [ ] HF bucket created: `bochen2079/katherine-k0` (or change `HF_BUCKET` env var)
	- [ ] SSH key in RunPod settings (only needed if you want SSH; Web Terminal works without)
	- [ ] GitHub repo publicly readable (it is): https://github.com/bochen2029-pixel/katherine-k0-finetune

	---

	## Phase 1 — Provision RunPod instance (~5 min)

	1. Go to https://www.runpod.io/console/pods → Deploy
	2. Filter: Secure Cloud (NOT Community — Community is bandwidth-throttled per buddhabrot lessons)
	3. Pick GPU:
	- 1× H200 SXM5 ($3.99/hr) — preferred, fastest
	- 1× H100 SXM5 ($3.49/hr) — fine, slightly slower
	- 1× H100 PCIe ($2.49/hr) — cheapest viable, slower still
	4. Pod template: `runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04`
	- The `-devel` suffix is critical — runtime images don't have nvcc for GGUF compilation
	5. Storage: 150 GB scratch (need ~30 GB for base model download + adapter + 3 GGUFs + working space)
	6. Checkboxes:
	- SSH terminal access ✓ (optional but useful)
	- Start Jupyter notebook (optional)
	- Encrypt volume (not needed)
	7. Click Deploy
	8. Wait ~30-90 sec for "spinning up" → "running"

	---

	## Phase 2 — Connect via Web Terminal (~30 sec)

	1. On the RunPod pod page, click Connect → Start Web Terminal
	2. New browser tab opens with bash shell as `root@<pod-id>:/#`
	3. Verify: `nvidia-smi --query-gpu=name --format=csv,noheader` — should show your H200/H100

	(If you prefer SSH via MobaXterm, see the buddhabrot RUNBOOK Phase 3. Web Terminal is faster for first-time use.)

	---

	## Phase 3 — Bootstrap (~5-10 min)

	In the Web Terminal:

	```bash
	curl -sSL https://raw.githubusercontent.com/bochen2029-pixel/katherine-k0-finetune/master/bootstrap-runpod.sh \| bash
	```

	Watch for:
	- `[1/6] Verifying CUDA toolkit...` — confirms `nvcc` present
	- `[2/6] Detecting GPUs...` — confirms 1× H200/H100
	- `[3/6] Cloning repo...` — clones into `~/katherine-k0-finetune`
	- `[4/6] Installing Python dependencies...` — pip installs unsloth + transformers + trl + peft + ...
	- This is the slowest step. Unsloth pulls ~3 GB of deps. Allow 5-10 min.
	- `[5/6] HF CLI + auth...` — installs `hf` CLI; logs in if `HF_TOKEN` env is set
	- `[6/6] Verifying canonical datasets...` — confirms `data/k0_canonical.jsonl` is committed

	If bootstrap halts:
	- CUDA missing: pick a `-devel` template or `apt-get install -y cuda-toolkit-12-4`
	- Pip OOM / timeout: rerun bootstrap; pip caches partial installs
	- HF login fails: check `HF_TOKEN` is set with write scope

	---

	## Phase 4 — Set HF token + launch (~30 sec to launch, ~50-70 min to complete)

	```bash
	cd ~/katherine-k0-finetune
	export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx # your write-scope HF token # or your own token
	hf auth login --token "$HF_TOKEN"
	hf auth whoami # MUST print your username, not "Not logged in"
	```

	If `whoami` errors, see [Troubleshooting → HF auth](#troubleshooting--hf-auth).

	Then launch:

	```bash
	./run-cloud-runpod.sh
	```

	You'll see the pre-flight banner with detected GPU, dataset stats, and the watchdog launching the orchestrator. Pre-flight output looks like:

	```
	[gpu] 1 × NVIDIA H200 SXM5
	[gpu] tier: H200
	[gpu] VRAM: 144396 MB
	[data] SFT corpus: 1886 examples
	[data] DPO corpus: 180 preference pairs
	[hf] auth OK as bochen2079, bucket: bochen2079/katherine-k0
	[watchdog HH:MM:SS] launching: bash -c ...
	[watchdog HH:MM:SS] train PID: ...

	[stage 1] SFT
	[load] base model: unsloth/Qwen3.5-9B
	...
	```

	Stage 1 takes ~10-15 min (model download is the slowest part, ~5-7 min for the 9B base on first run).

	---

	## Phase 5 — Monitor (no babysitting required)

	The watchdog auto-syncs adapter checkpoints to your HF bucket every 30 sec. You can close the browser tab — render keeps going (but see Phase 6 about screen).

	To monitor from a SECOND web terminal tab:

	```bash
	cd ~/katherine-k0-finetune
	tail -f katherine_k0.stderr.log
	```

	Look for these milestones:

	```
	[stage 1] SFT
	[train] starting: 3 epochs × 1886 samples / effective_batch 32 = ~177 steps
	[train] 10% step 17/177 loss=2.31 lr=... ← loss should drop steadily
	...
	[train] 100% step 177/177 loss=0.95
	[save] writing adapter to adapters/k0_sft_adapter

	[stage 2] DPO
	[train] DPO: 2 epochs × 180 pairs / effective_batch 8
	...
	[save] writing DPO adapter to adapters/k0_dpo_adapter

	[stage 3] merge + GGUF (3 quants)
	[gguf] === exporting q4_k_m → gguf/gguf_q4_k_m ===
	[gguf] OK: gguf/gguf_q4_k_m/...gguf (5400 MB)
	[gguf] === exporting q5_k_m → gguf/gguf_q5_k_m ===
	[gguf] OK: gguf/gguf_q5_k_m/...gguf (6300 MB)
	[gguf] === exporting q6_k → gguf/gguf_q6_k ===
	[gguf] OK: gguf/gguf_q6_k/...gguf (7400 MB)

	[stage 4] HF push
	[hf-sync] hf sync adapters/k0_sft_adapter hf://buckets/bochen2079/katherine-k0/k0_sft_adapter/
	...
	[orchestrator] all stages complete

	[watchdog HH:MM:SS] train exited code=0 after XXXs
	[watchdog HH:MM:SS] DONE
	```

	Confirms in HF bucket UI: https://huggingface.co/buckets/bochen2079/katherine-k0

	---

	## Phase 6 — survive disconnects (recommended setup)

	The web terminal can disconnect. The orchestrator runs as a child of the shell that launched it; if the shell dies, so does the orchestrator. Two options:

	### Option A — `screen` (recommended)

	Before launching, install + start screen:
	```bash
	apt-get update -qq && apt-get install -y -qq screen
	screen -S katherine
	# Now you're inside screen. Launch the run:
	./run-cloud-runpod.sh
	# When you see Stage 1 starting, detach: Ctrl-A then D
	```

	To reattach later: `screen -r katherine`. Survives browser closure.

	### Option B — `nohup` (simpler, less monitoring)

	```bash
	nohup ./run-cloud-runpod.sh > /tmp/run.log 2>&1 &
	disown
	echo "PID: $!"
	# Monitor with: tail -f /tmp/run.log
	```

	Either works. Screen is friendlier for monitoring.

	---

	## Phase 7 — Retrieve artifacts to your local machine (~5-10 min)

	Once `.DONE` appears (`ls katherine_k0.DONE`), all artifacts are in your HF bucket. Pull them to Windows:

	In Windows PowerShell or Git Bash:

	```powershell
	# Install hf CLI on Windows (one-time)
	pip install -U huggingface_hub

	# Login
	hf auth login --token <YOUR_HF_TOKEN>

	# Download everything from the bucket
	mkdir C:\katherine-k0-finetune\downloads
	cd C:\katherine-k0-finetune\downloads
	hf download --repo-type bucket bochen2079/katherine-k0 --local-dir .

	# Or just one quant
	hf download --repo-type bucket bochen2079/katherine-k0 \
	gguf/gguf_q5_k_m/katherine-k0-qwen3.5-9b.gguf --local-dir .
	```

	Note: HF download via CLI for buckets may have limitations similar to upload. If `--repo-type bucket` errors on download, use `hf sync` reversed:

	```powershell
	hf sync hf://buckets/bochen2079/katherine-k0/gguf/ ./gguf/
	```

	(Verify this command form on the actual HF docs at run-time; the upload sync is verified working but download direction may differ.)

	---

	## Phase 8 — Load in LM Studio / Ollama (~5 min)

	### LM Studio (recommended for evaluation)

	1. Open LM Studio
	2. Settings → Models → Local Folder
	3. Drop `katherine-k0-qwen3.5-9b-q5_k_m.gguf` into your models directory (typically `C:\Users\<you>\.cache\lm-studio\models\bochen2079\katherine-k0\`)
	4. Refresh model list, select Katherine k0
	5. Important inference settings:
	- Temperature: 1.0 (Qwen3.5 default)
	- top_p: 1.0
	- top_k: 40
	- min_p: 0.0
	- presence_penalty: 2.0
	- System prompt: leave EMPTY (the model is unconditional Katherine; sysprompt is unnecessary and may confuse)
	- Disable thinking mode in the chat options if visible

	6. Probe with: "Hi, what's your name?" → should get a Katherine-y direct response, no `<think>` tags, no "I'm an AI" disclaimer.

	### Ollama

	```bash
	ollama create katherine-k0 -f Modelfile
	# where Modelfile contains:
	# FROM ./katherine-k0-qwen3.5-9b-q5_k_m.gguf
	# PARAMETER temperature 1.0
	# PARAMETER top_p 1.0
	# PARAMETER top_k 40
	# PARAMETER repeat_penalty 1.0
	ollama run katherine-k0
	```

	---

	## Troubleshooting

	### bootstrap halts on pip install

	Possibly a transient PyPI / GitHub timeout. Re-run:
	```bash
	cd ~/katherine-k0-finetune
	./bootstrap-runpod.sh
	```

	Pip caches partial installs so subsequent runs are faster.

	### `nvcc not found`

	```bash
	export PATH=/usr/local/cuda/bin:$PATH
	nvcc --version
	```

	If still missing, the pod image lacks the CUDA toolkit — terminate and redeploy with a `-devel` image.

	### Out-of-memory during SFT

	Reduce batch size:
	```bash
	SFT_BATCH=8 SFT_GRAD_ACCUM=4 ./run-cloud-runpod.sh # effective batch 32 unchanged
	```

	Or reduce max_seq if you have very long examples (none in K0 corpus, but defensively):
	```bash
	SFT_MAX_SEQ=512 ./run-cloud-runpod.sh
	```

	### HF auth

	```bash
	hf auth login --token "$HF_TOKEN"
	hf auth whoami # should print your HF username
	```

	If "Not logged in":
	- Verify `HF_TOKEN` env is set: `echo $HF_TOKEN \| head -c 8` (should print first 8 chars)
	- Token might be wrong scope. Need write access. Regenerate at https://huggingface.co/settings/tokens

	### HF sync fails

	```bash
	cat *.hfsync.log # look for the actual error
	```

	Common causes:
	- `--repo-type bucket` rejected → script should use `hf sync URL form`. If you see `--repo-type` in the error, you have an old script — `git pull` to fix.
	- 401 Unauthorized → token expired or wrong scope; re-login
	- Bucket doesn't exist → create it at https://huggingface.co/new-bucket

	### GGUF export fails

	Adapter is preserved. Re-run just the GGUF stage:

	```bash
	SKIP_SFT=1 SKIP_DPO=1 ./run-cloud-runpod.sh
	```

	If error is "llama.cpp not found / build failed", install build essentials:
	```bash
	apt-get install -y build-essential cmake
	```

	Then re-run the GGUF stage.

	### Loss not decreasing

	Check first 20 steps:
	```bash
	grep -E "step\|loss" katherine_k0.stderr.log \| head -30
	```

	- Loss should drop from ~2.5 to ~1.5 in the first 20 steps. If it's flat at 0.0 from step 1, something's wrong with the masking — paste a few steps and ask Claude/Gemini.
	- Loss steadily climbing = catastrophic LR, reduce by 5×.

	### Pod preempted / disappears

	Adapter checkpoints are in your HF bucket (synced every 30 sec via watchdog). On a new pod:

	```bash
	# Bootstrap as normal
	curl -sSL ... \| bash
	cd ~/katherine-k0-finetune

	# Pull your last adapter from HF
	mkdir -p adapters
	hf download --repo-type bucket bochen2079/katherine-k0 \
	--include "k0_sft_adapter/*" --local-dir .

	# Skip SFT, jump to DPO + GGUF + push
	SKIP_SFT=1 ./run-cloud-runpod.sh
	```

	---

	## Recovery scenarios

	### "Claude is down — can I do this without AI?"

	Yes. This document is self-contained. Phases 1-8 sequential, no AI required.

	### "I want to use Gemini / ChatGPT instead"

	Paste this entire RUNBOOK.md + the [CLOUD.md](CLOUD.md) into your other LLM and add: "I'm at Phase X. Help me execute." Both docs are self-contained — every command, every expected output, every fallback is explicit.

	### "Pod cost is climbing"

	Check the watchdog timer. Hard cap is 2 hours by default. If you've been running >2 hours and stage 1 still hasn't finished, something is wrong — kill via:
	```bash
	pkill -9 -f finetune_k0.py
	cat katherine_k0.stderr.log \| tail -50
	```

	Then post the tail output for diagnosis. Do not let the pod run blind for hours.

	---

	## Reference

	### All env vars

	\| Var \| Default \| Purpose \|
	\|---\|---\|---\|
	\| `HF_TOKEN` \| unset \| HF write token (required for sync) \|
	\| `HF_BUCKET` \| bochen2079/katherine-k0 \| bucket destination \|
	\| `HF_SYNC_ENABLED` \| 1 \| set 0 to skip all HF pushes \|
	\| `BASE_MODEL` \| unsloth/Qwen3.5-9B \| base model HF repo \|
	\| `SFT_RANK` \| 64 \| LoRA rank for SFT \|
	\| `SFT_ALPHA` \| 128 \| LoRA alpha \|
	\| `SFT_DROPOUT` \| 0.05 \| LoRA dropout \|
	\| `SFT_EPOCHS` \| 3 \| SFT epochs \|
	\| `SFT_LR` \| 1e-4 \| SFT learning rate \|
	\| `SFT_BATCH` \| 16 \| SFT per-device batch \|
	\| `SFT_GRAD_ACCUM` \| 2 \| SFT grad accumulation \|
	\| `SFT_MAX_SEQ` \| 1024 \| max sequence length \|
	\| `DPO_EPOCHS` \| 2 \| DPO epochs \|
	\| `DPO_LR` \| 5e-6 \| DPO learning rate \|
	\| `DPO_BETA` \| 0.1 \| DPO KL strength \|
	\| `DPO_BATCH` \| 4 \| DPO batch \|
	\| `DPO_GRAD_ACCUM` \| 2 \| DPO grad accum \|
	\| `GGUF_QUANTS` \| "q4_k_m q5_k_m q6_k" \| GGUF quants to produce \|
	\| `WALLCLOCK_HARD_CAP` \| 7200 \| hard wallclock cap (seconds) \|
	\| `SIGUSR1_LEAD` \| 300 \| seconds before hard cap to send SIGUSR1 \|
	\| `SKIP_SFT` \| 0 \| skip Stage 1 \|
	\| `SKIP_DPO` \| 0 \| skip Stage 2 \|
	\| `SKIP_GGUF` \| 0 \| skip Stage 3 \|
	\| `SKIP_PUSH` \| 0 \| skip Stage 4 \|

	### URLs

	\| Purpose \| URL \|
	\|---\|---\|
	\| RunPod pods console \| https://www.runpod.io/console/pods \|
	\| RunPod SSH keys \| https://www.runpod.io/console/user/settings \|
	\| HF tokens \| https://huggingface.co/settings/tokens \|
	\| HF bucket \| https://huggingface.co/buckets/bochen2079/katherine-k0 \|
	\| GitHub repo \| https://github.com/bochen2029-pixel/katherine-k0-finetune \|
	\| Bootstrap (curl direct) \| https://raw.githubusercontent.com/bochen2029-pixel/katherine-k0-finetune/master/bootstrap-runpod.sh \|
	\| Qwen3.5-9B base \| https://huggingface.co/Qwen/Qwen3.5-9B \|
	\| Qwen3.5 Unsloth docs \| https://unsloth.ai/docs/models/qwen3.5 \|
	\| Buddhabrot reference repo (sister project) \| https://github.com/bochen2029-pixel/buddhabrot-cuda-multigpu \|

	### Cheat sheet — common commands

	```bash
	# Status check
	ls -la katherine_k0.{DONE,FATAL,stderr.log,watchdog.log} 2>/dev/null
	tail -20 katherine_k0.stderr.log
	nvidia-smi --query-gpu=utilization.gpu,memory.used,power.draw --format=csv

	# Skip-ahead reruns
	SKIP_SFT=1 ./run-cloud-runpod.sh # SFT done; rerun DPO + GGUF + push
	SKIP_SFT=1 SKIP_DPO=1 ./run-cloud-runpod.sh # only GGUF + push
	SKIP_SFT=1 SKIP_DPO=1 SKIP_GGUF=1 ./run-cloud-runpod.sh # only push

	# Force kill
	pkill -9 -f finetune_k0.py
	pkill -9 -f dpo_k0.py
	pkill -9 -f _supervise-cloud.sh

	# Manual HF push (if watchdog sync failed)
	hf sync . hf://buckets/bochen2079/katherine-k0/ \
	--include "adapters//" --include "gguf//.gguf" --include "data/*.jsonl"
	```

	---

	Updates land at the master branch. To pull the latest version on a running pod: `cd ~/katherine-k0-finetune && git pull`.

Xet Storage Details

Size:: 15.5 kB
Xet hash:: ccab23055aaa5ff5b8d9523548d3ffad0a676da735ff220507c1d96be31e150e

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.