Buckets:

bochen2079
/

katherine-k0

Files

xet

bochen2079/katherine-k0 / logs /RUNBOOK.md

bochen2079

15 days ago

preview code

download

raw

15.5 kB

RUNBOOK — Katherine k0 fine-tune on RunPod

Audience: human operator (Bo), or any AI assistant the human is collaborating with. This document is self-contained — read top to bottom or jump to a phase.

Pipeline: SFT → DPO → merge + GGUF (3 quants) → push to HF bucket. ~50-70 min wallclock on 1× H200.

TLDR / BLUF (read this first, ~60 sec)

Goal: Train Qwen3.5-9B into the Katherine k0 embodied persona, produce 3 GGUF quants, push everything to HuggingFace.

Total cost: ~$3-5 on RunPod Secure Cloud.

One-liner on the pod:

curl -sSL https://raw.githubusercontent.com/bochen2029-pixel/katherine-k0-finetune/master/bootstrap-runpod.sh | bash
cd ~/katherine-k0-finetune
export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   # your write-scope HF token
./run-cloud-runpod.sh

End state: bochen2079/katherine-k0 HF bucket contains:

k0_sft_adapter/ — LoRA delta after SFT
k0_dpo_adapter/ — LoRA delta after DPO (final adapter)
gguf/gguf_q4_k_m/*.gguf (~5.5 GB)
gguf/gguf_q5_k_m/*.gguf (~6.5 GB) ← daily-use sweet spot
gguf/gguf_q6_k/*.gguf (~7.7 GB) ← quality-critical reference
data/ — snapshot of training data
logs/ — stderr + watchdog logs

Pre-flight checklist (3 min)

RunPod account with payment method (https://www.runpod.io/console)
HuggingFace token with write scope (https://huggingface.co/settings/tokens)
HF bucket created: bochen2079/katherine-k0 (or change HF_BUCKET env var)
SSH key in RunPod settings (only needed if you want SSH; Web Terminal works without)
GitHub repo publicly readable (it is): https://github.com/bochen2029-pixel/katherine-k0-finetune

Phase 1 — Provision RunPod instance (~5 min)

Go to https://www.runpod.io/console/pods → Deploy
Filter: Secure Cloud (NOT Community — Community is bandwidth-throttled per buddhabrot lessons)
Pick GPU:
- 1× H200 SXM5 ($3.99/hr) — preferred, fastest
- 1× H100 SXM5 ($3.49/hr) — fine, slightly slower
- 1× H100 PCIe ($2.49/hr) — cheapest viable, slower still
Pod template: runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04
- The -devel suffix is critical — runtime images don't have nvcc for GGUF compilation
Storage: 150 GB scratch (need ~30 GB for base model download + adapter + 3 GGUFs + working space)
Checkboxes:
- SSH terminal access ✓ (optional but useful)
- Start Jupyter notebook (optional)
- Encrypt volume (not needed)
Click Deploy
Wait ~30-90 sec for "spinning up" → "running"

Phase 2 — Connect via Web Terminal (~30 sec)

On the RunPod pod page, click Connect → Start Web Terminal
New browser tab opens with bash shell as root@<pod-id>:/#
Verify: nvidia-smi --query-gpu=name --format=csv,noheader — should show your H200/H100

(If you prefer SSH via MobaXterm, see the buddhabrot RUNBOOK Phase 3. Web Terminal is faster for first-time use.)

Phase 3 — Bootstrap (~5-10 min)

In the Web Terminal:

curl -sSL https://raw.githubusercontent.com/bochen2029-pixel/katherine-k0-finetune/master/bootstrap-runpod.sh | bash

Watch for:

[1/6] Verifying CUDA toolkit... — confirms nvcc present
[2/6] Detecting GPUs... — confirms 1× H200/H100
[3/6] Cloning repo... — clones into ~/katherine-k0-finetune
[4/6] Installing Python dependencies... — pip installs unsloth + transformers + trl + peft + ...
- This is the slowest step. Unsloth pulls ~3 GB of deps. Allow 5-10 min.
[5/6] HF CLI + auth... — installs hf CLI; logs in if HF_TOKEN env is set
[6/6] Verifying canonical datasets... — confirms data/k0_canonical.jsonl is committed

If bootstrap halts:

CUDA missing: pick a -devel template or apt-get install -y cuda-toolkit-12-4
Pip OOM / timeout: rerun bootstrap; pip caches partial installs
HF login fails: check HF_TOKEN is set with write scope

Phase 4 — Set HF token + launch (~30 sec to launch, ~50-70 min to complete)

cd ~/katherine-k0-finetune
export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   # your write-scope HF token    # or your own token
hf auth login --token "$HF_TOKEN"
hf auth whoami         # MUST print your username, not "Not logged in"

If whoami errors, see Troubleshooting → HF auth.

Then launch:

./run-cloud-runpod.sh

You'll see the pre-flight banner with detected GPU, dataset stats, and the watchdog launching the orchestrator. Pre-flight output looks like:

[gpu] 1 × NVIDIA H200 SXM5
[gpu] tier: H200
[gpu] VRAM: 144396 MB
[data] SFT corpus: 1886 examples
[data] DPO corpus: 180 preference pairs
[hf] auth OK as bochen2079, bucket: bochen2079/katherine-k0
[watchdog HH:MM:SS] launching: bash -c ...
[watchdog HH:MM:SS] train PID: ...

[stage 1] SFT
[load] base model: unsloth/Qwen3.5-9B
...

Stage 1 takes ~10-15 min (model download is the slowest part, ~5-7 min for the 9B base on first run).

Phase 5 — Monitor (no babysitting required)

The watchdog auto-syncs adapter checkpoints to your HF bucket every 30 sec. You can close the browser tab — render keeps going (but see Phase 6 about screen).

To monitor from a SECOND web terminal tab:

cd ~/katherine-k0-finetune
tail -f katherine_k0.stderr.log

Look for these milestones:

[stage 1] SFT
[train] starting: 3 epochs × 1886 samples / effective_batch 32 = ~177 steps
[train]  10%  step 17/177  loss=2.31  lr=...    ← loss should drop steadily
...
[train] 100% step 177/177  loss=0.95
[save] writing adapter to adapters/k0_sft_adapter

[stage 2] DPO
[train] DPO: 2 epochs × 180 pairs / effective_batch 8
...
[save] writing DPO adapter to adapters/k0_dpo_adapter

[stage 3] merge + GGUF (3 quants)
[gguf] === exporting q4_k_m → gguf/gguf_q4_k_m ===
[gguf] OK: gguf/gguf_q4_k_m/...gguf  (5400 MB)
[gguf] === exporting q5_k_m → gguf/gguf_q5_k_m ===
[gguf] OK: gguf/gguf_q5_k_m/...gguf  (6300 MB)
[gguf] === exporting q6_k → gguf/gguf_q6_k ===
[gguf] OK: gguf/gguf_q6_k/...gguf  (7400 MB)

[stage 4] HF push
[hf-sync] hf sync adapters/k0_sft_adapter hf://buckets/bochen2079/katherine-k0/k0_sft_adapter/
...
[orchestrator] all stages complete

[watchdog HH:MM:SS] train exited code=0 after XXXs
[watchdog HH:MM:SS] DONE

Confirms in HF bucket UI: https://huggingface.co/buckets/bochen2079/katherine-k0

Phase 6 — survive disconnects (recommended setup)

The web terminal can disconnect. The orchestrator runs as a child of the shell that launched it; if the shell dies, so does the orchestrator. Two options:

Option A — `screen` (recommended)

Before launching, install + start screen:

apt-get update -qq && apt-get install -y -qq screen
screen -S katherine
# Now you're inside screen. Launch the run:
./run-cloud-runpod.sh
# When you see Stage 1 starting, detach: Ctrl-A then D

To reattach later: screen -r katherine. Survives browser closure.

Option B — `nohup` (simpler, less monitoring)

nohup ./run-cloud-runpod.sh > /tmp/run.log 2>&1 &
disown
echo "PID: $!"
# Monitor with: tail -f /tmp/run.log

Either works. Screen is friendlier for monitoring.

Phase 7 — Retrieve artifacts to your local machine (~5-10 min)

Once .DONE appears (ls katherine_k0.DONE), all artifacts are in your HF bucket. Pull them to Windows:

In Windows PowerShell or Git Bash:

# Install hf CLI on Windows (one-time)
pip install -U huggingface_hub

# Login
hf auth login --token <YOUR_HF_TOKEN>

# Download everything from the bucket
mkdir C:\katherine-k0-finetune\downloads
cd C:\katherine-k0-finetune\downloads
hf download --repo-type bucket bochen2079/katherine-k0 --local-dir .

# Or just one quant
hf download --repo-type bucket bochen2079/katherine-k0 \
    gguf/gguf_q5_k_m/katherine-k0-qwen3.5-9b.gguf --local-dir .

Note: HF download via CLI for buckets may have limitations similar to upload. If --repo-type bucket errors on download, use hf sync reversed:

hf sync hf://buckets/bochen2079/katherine-k0/gguf/ ./gguf/

(Verify this command form on the actual HF docs at run-time; the upload sync is verified working but download direction may differ.)

Phase 8 — Load in LM Studio / Ollama (~5 min)

LM Studio (recommended for evaluation)

Open LM Studio
Settings → Models → Local Folder
Drop katherine-k0-qwen3.5-9b-q5_k_m.gguf into your models directory (typically C:\Users\<you>\.cache\lm-studio\models\bochen2079\katherine-k0\)
Refresh model list, select Katherine k0
Important inference settings:
- Temperature: 1.0 (Qwen3.5 default)
- top_p: 1.0
- top_k: 40
- min_p: 0.0
- presence_penalty: 2.0
- System prompt: leave EMPTY (the model is unconditional Katherine; sysprompt is unnecessary and may confuse)
- Disable thinking mode in the chat options if visible
Probe with: "Hi, what's your name?" → should get a Katherine-y direct response, no <think> tags, no "I'm an AI" disclaimer.

Ollama

ollama create katherine-k0 -f Modelfile
# where Modelfile contains:
#   FROM ./katherine-k0-qwen3.5-9b-q5_k_m.gguf
#   PARAMETER temperature 1.0
#   PARAMETER top_p 1.0
#   PARAMETER top_k 40
#   PARAMETER repeat_penalty 1.0
ollama run katherine-k0

Troubleshooting

bootstrap halts on pip install

Possibly a transient PyPI / GitHub timeout. Re-run:

cd ~/katherine-k0-finetune
./bootstrap-runpod.sh

Pip caches partial installs so subsequent runs are faster.

`nvcc not found`

export PATH=/usr/local/cuda/bin:$PATH
nvcc --version

If still missing, the pod image lacks the CUDA toolkit — terminate and redeploy with a -devel image.

Out-of-memory during SFT

Reduce batch size:

SFT_BATCH=8 SFT_GRAD_ACCUM=4 ./run-cloud-runpod.sh    # effective batch 32 unchanged

Or reduce max_seq if you have very long examples (none in K0 corpus, but defensively):

SFT_MAX_SEQ=512 ./run-cloud-runpod.sh

HF auth

hf auth login --token "$HF_TOKEN"
hf auth whoami     # should print your HF username

If "Not logged in":

Verify HF_TOKEN env is set: echo $HF_TOKEN | head -c 8 (should print first 8 chars)
Token might be wrong scope. Need write access. Regenerate at https://huggingface.co/settings/tokens

HF sync fails

cat *.hfsync.log    # look for the actual error

Common causes:

--repo-type bucket rejected → script should use hf sync URL form. If you see --repo-type in the error, you have an old script — git pull to fix.
401 Unauthorized → token expired or wrong scope; re-login
Bucket doesn't exist → create it at https://huggingface.co/new-bucket

GGUF export fails

Adapter is preserved. Re-run just the GGUF stage:

SKIP_SFT=1 SKIP_DPO=1 ./run-cloud-runpod.sh

If error is "llama.cpp not found / build failed", install build essentials:

apt-get install -y build-essential cmake

Then re-run the GGUF stage.

Loss not decreasing

Check first 20 steps:

grep -E "step|loss" katherine_k0.stderr.log | head -30

Loss should drop from ~2.5 to ~1.5 in the first 20 steps. If it's flat at 0.0 from step 1, something's wrong with the masking — paste a few steps and ask Claude/Gemini.
Loss steadily climbing = catastrophic LR, reduce by 5×.

Pod preempted / disappears

Adapter checkpoints are in your HF bucket (synced every 30 sec via watchdog). On a new pod:

# Bootstrap as normal
curl -sSL ... | bash
cd ~/katherine-k0-finetune

# Pull your last adapter from HF
mkdir -p adapters
hf download --repo-type bucket bochen2079/katherine-k0 \
    --include "k0_sft_adapter/*" --local-dir .

# Skip SFT, jump to DPO + GGUF + push
SKIP_SFT=1 ./run-cloud-runpod.sh

Recovery scenarios

"Claude is down — can I do this without AI?"

Yes. This document is self-contained. Phases 1-8 sequential, no AI required.

"I want to use Gemini / ChatGPT instead"

Paste this entire RUNBOOK.md + the CLOUD.md into your other LLM and add: "I'm at Phase X. Help me execute." Both docs are self-contained — every command, every expected output, every fallback is explicit.

"Pod cost is climbing"

Check the watchdog timer. Hard cap is 2 hours by default. If you've been running >2 hours and stage 1 still hasn't finished, something is wrong — kill via:

pkill -9 -f finetune_k0.py
cat katherine_k0.stderr.log | tail -50

Then post the tail output for diagnosis. Do not let the pod run blind for hours.

Reference

All env vars

Var	Default	Purpose
`HF_TOKEN`	unset	HF write token (required for sync)
`HF_BUCKET`	bochen2079/katherine-k0	bucket destination
`HF_SYNC_ENABLED`	1	set 0 to skip all HF pushes
`BASE_MODEL`	unsloth/Qwen3.5-9B	base model HF repo
`SFT_RANK`	64	LoRA rank for SFT
`SFT_ALPHA`	128	LoRA alpha
`SFT_DROPOUT`	0.05	LoRA dropout
`SFT_EPOCHS`	3	SFT epochs
`SFT_LR`	1e-4	SFT learning rate
`SFT_BATCH`	16	SFT per-device batch
`SFT_GRAD_ACCUM`	2	SFT grad accumulation
`SFT_MAX_SEQ`	1024	max sequence length
`DPO_EPOCHS`	2	DPO epochs
`DPO_LR`	5e-6	DPO learning rate
`DPO_BETA`	0.1	DPO KL strength
`DPO_BATCH`	4	DPO batch
`DPO_GRAD_ACCUM`	2	DPO grad accum
`GGUF_QUANTS`	"q4_k_m q5_k_m q6_k"	GGUF quants to produce
`WALLCLOCK_HARD_CAP`	7200	hard wallclock cap (seconds)
`SIGUSR1_LEAD`	300	seconds before hard cap to send SIGUSR1
`SKIP_SFT`	0	skip Stage 1
`SKIP_DPO`	0	skip Stage 2
`SKIP_GGUF`	0	skip Stage 3
`SKIP_PUSH`	0	skip Stage 4

URLs

Purpose	URL
RunPod pods console	https://www.runpod.io/console/pods
RunPod SSH keys	https://www.runpod.io/console/user/settings
HF tokens	https://huggingface.co/settings/tokens
HF bucket	https://huggingface.co/buckets/bochen2079/katherine-k0
GitHub repo	https://github.com/bochen2029-pixel/katherine-k0-finetune
Bootstrap (curl direct)	https://raw.githubusercontent.com/bochen2029-pixel/katherine-k0-finetune/master/bootstrap-runpod.sh
Qwen3.5-9B base	https://huggingface.co/Qwen/Qwen3.5-9B
Qwen3.5 Unsloth docs	https://unsloth.ai/docs/models/qwen3.5
Buddhabrot reference repo (sister project)	https://github.com/bochen2029-pixel/buddhabrot-cuda-multigpu

Cheat sheet — common commands

# Status check
ls -la katherine_k0.{DONE,FATAL,stderr.log,watchdog.log} 2>/dev/null
tail -20 katherine_k0.stderr.log
nvidia-smi --query-gpu=utilization.gpu,memory.used,power.draw --format=csv

# Skip-ahead reruns
SKIP_SFT=1 ./run-cloud-runpod.sh                            # SFT done; rerun DPO + GGUF + push
SKIP_SFT=1 SKIP_DPO=1 ./run-cloud-runpod.sh                 # only GGUF + push
SKIP_SFT=1 SKIP_DPO=1 SKIP_GGUF=1 ./run-cloud-runpod.sh     # only push

# Force kill
pkill -9 -f finetune_k0.py
pkill -9 -f dpo_k0.py
pkill -9 -f _supervise-cloud.sh

# Manual HF push (if watchdog sync failed)
hf sync . hf://buckets/bochen2079/katherine-k0/ \
    --include "adapters/*/**" --include "gguf/**/*.gguf" --include "data/*.jsonl"

Updates land at the master branch. To pull the latest version on a running pod: cd ~/katherine-k0-finetune && git pull.

Xet Storage Details

Size:: 15.5 kB
Xet hash:: ccab23055aaa5ff5b8d9523548d3ffad0a676da735ff220507c1d96be31e150e

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.