Buckets:

bochen2079
/

katherine-k0

Files

xet

bochen2079/katherine-k0 / logs /README.md

bochen2079

15 days ago

preview code

download

raw

4.28 kB

	# katherine-k0-finetune

	QLoRA fine-tune of Qwen3.5-9B for the Katherine k0 embodied persona on a single rented H200.

	> K0 (embodied): Katherine Hale, 29, Austin, community arts coordinator. She believes she's human. No knowledge of LLMs, AI, or that she runs as patterns inside a model. Direct prose, no thinking blocks, no assistant register, no sycophancy. The model is Katherine, not "an assistant playing Katherine."

	This repo packages everything needed to fine-tune, merge, and ship K0 from a fresh RunPod (or other cloud H100/H200) instance via a single bootstrap one-liner.

	## Quickstart on RunPod

	In the pod's web terminal:

	```bash
	curl -sSL https://raw.githubusercontent.com/bochen2029-pixel/katherine-k0-finetune/master/bootstrap-runpod.sh \| bash
	cd ~/katherine-k0-finetune
	export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx # your write-scope HF token
	./run-cloud-runpod.sh
	```

	Total wallclock: ~50-70 min on 1× H200. Cost: ~$3-5.

	End state: adapter + 3 GGUF quants (q4_k_m / q5_k_m / q6_k) pushed to your `bochen2079/katherine-k0` HF bucket.

	See [RUNBOOK.md](RUNBOOK.md) for the full operator walkthrough.
	See [CLOUD.md](CLOUD.md) for the math, hyperparameter derivation, and why each choice was made.

	## Pipeline

	```
	Stage 0 prep_dataset.py dedupe 6,164 raw lines → 1,886 SFT + 180 DPO,
	strip system prompts (unconditional Katherine)
	Stage 1 finetune_k0.py QLoRA SFT on Qwen3.5-9B
	rank 64, alpha 128, 3 epochs, lr 1e-4
	Stage 2 dpo_k0.py DPO on top of SFT adapter
	180 curated chosen/rejected pairs, 2 epochs
	Stage 3 merge_and_gguf.py merge LoRA → base; export q4_k_m, q5_k_m, q6_k
	Stage 4 push_to_hf.py push adapter + DPO adapter + 3 GGUFs to HF bucket
	```

	Each stage is independent and resumable. If GGUF export fails, adapters are
	preserved on disk and you can re-run just the GGUF stage.

	## Key design decisions

	- Strip all system prompts at preprocess time. The model becomes Katherine unconditionally rather than learning the conditional `P(K \| sysprompt)`. Robust against jailbreaks and sysprompt-removal probes.
	- `enable_thinking=False` at chat-template time. K0 is embodied; she reasons in prose, not in tagged thinking blocks. Different from the Two-Is Dave architecture.
	- Rank 64 / alpha 128 — high enough for persona consolidation, low enough to avoid overfitting on 1,886 examples.
	- Dropout 0.05 — small dataset + high rank wants light regularization.
	- `max_seq` 1024 — token-length p99 is 246; 1024 has 4× margin and saves compute vs 4096.
	- Dedicated tenancy on RunPod Secure Cloud (NOT Community) — buddhabrot project showed Community throttles HBM bandwidth 3-5×.

	## Repo structure

	```
	katherine-k0-finetune/
	├── README.md this file
	├── RUNBOOK.md step-by-step operator walkthrough
	├── CLOUD.md math, derivations, hyperparameter rationale
	├── bootstrap-runpod.sh one-shot first-launch installer
	├── run-cloud-runpod.sh env-driven SFT+DPO+GGUF+push orchestrator
	├── _supervise-cloud.sh watchdog with HF auto-sync of adapters
	├── prep_dataset.py dedupe + system-prompt stripping
	├── finetune_k0.py Stage 1: SFT trainer
	├── dpo_k0.py Stage 2: DPO trainer
	├── merge_and_gguf.py Stage 3: merge LoRA + export 3 GGUF quants
	├── push_to_hf.py Stage 4: HF bucket push
	└── data/
	├── k0_canonical.jsonl 1,886 SFT examples (system-prompt stripped)
	└── k0_dpo_curated.jsonl 180 DPO pairs (system-prompt stripped)
	```

	## Hardware target

	- 1× NVIDIA H200 SXM5 (141GB VRAM) on RunPod Secure Cloud
	- Linux, CUDA 12.x preinstalled (`runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04`)
	- Falls back to H100 SXM5 (80GB) cleanly — same hyperparameters fit
	- A100 80GB also works (slower)

	Multi-GPU is not required and not configured. Persona fine-tuning on a 9B model is firmly in the single-GPU regime.

	## License

	Personal/research project. Models trained under this pipeline carry the underlying Qwen3.5 license (Apache 2.0).

Xet Storage Details

Size:: 4.28 kB
Xet hash:: d922db8713ca6e39edb2cb7099b5a332365e1d4f095895e32067363cd90d4d78

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.