Spaces:

helloAK96
/

chaosops

Running

helloAK96 Claude Opus 4.7 commited on 13 days ago

Commit

2947218

1 Parent(s): 5aa47dc

README + BLOG: link all 6 LoRAs in the ablation list to their Hub repos

Each phase's LoRA name is now a clickable link to its model repo on
HuggingFace Hub. Judges can navigate from the blog/readme straight into
the auditable training artifacts (model card, training_metrics.json,
adapter_model.safetensors) for every run, including the controls and
the failed runs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (2) hide show

BLOG.md +6 -6
README.md +11 -4

BLOG.md CHANGED Viewed

@@ -234,12 +234,12 @@ ChaosOps AI is not a benchmark. It's a *training environment*: a place where the
 The thing we'd want to know if we were judging this: did a 3B model with $1.93 of training and zero supervised demonstrations actually learn the right behaviour, or are we cherry-picking? The 540-episode evaluation across 9 failure types × 3 tiers × 5 seeds × 4 policies is in the repo, with the JSON. The plots are committed as PNGs, not just embedded in a Colab cell that might disappear. Every LoRA from every phase — including the failures — is on the Hub:
-- **chaosops-grpo-lora** (Phase 0, the silent-fallback victim)
-- **chaosops-grpo-lora-p1** (Phase 1, LR fix)
-- **chaosops-grpo-lora-p2** (Phase 2, curriculum)
-- **chaosops-grpo-lora-p3a** (Phase 3A — submitted, live on the Space) 🏆
-- **chaosops-grpo-lora-p3b** (Phase 3B control, falsified the cheap hypothesis)
-- **chaosops-grpo-lora-p3c** (Phase 3C control, falsified reverse curriculum)
 Anyone who wants to verify these numbers can. That felt like the honest way to publish.

 The thing we'd want to know if we were judging this: did a 3B model with $1.93 of training and zero supervised demonstrations actually learn the right behaviour, or are we cherry-picking? The 540-episode evaluation across 9 failure types × 3 tiers × 5 seeds × 4 policies is in the repo, with the JSON. The plots are committed as PNGs, not just embedded in a Colab cell that might disappear. Every LoRA from every phase — including the failures — is on the Hub:
+- [**chaosops-grpo-lora**](https://huggingface.co/helloAK96/chaosops-grpo-lora) — Phase 0, the silent-fallback victim
+- [**chaosops-grpo-lora-p1**](https://huggingface.co/helloAK96/chaosops-grpo-lora-p1) — Phase 1, LR fix
+- [**chaosops-grpo-lora-p2**](https://huggingface.co/helloAK96/chaosops-grpo-lora-p2) — Phase 2, curriculum
+- [**chaosops-grpo-lora-p3a**](https://huggingface.co/helloAK96/chaosops-grpo-lora-p3a) — Phase 3A — submitted, live on the Space 🏆
+- [**chaosops-grpo-lora-p3b**](https://huggingface.co/helloAK96/chaosops-grpo-lora-p3b) — Phase 3B control, falsified the cheap hypothesis
+- [**chaosops-grpo-lora-p3c**](https://huggingface.co/helloAK96/chaosops-grpo-lora-p3c) — Phase 3C control, falsified reverse curriculum
 Anyone who wants to verify these numbers can. That felt like the honest way to publish.

README.md CHANGED Viewed

@@ -279,10 +279,17 @@ Baseline episodes:   3 scripted policies × 540 eps           =  1,620
                     GRAND TOTAL: 8,060+ incident rollouts simulated
 ```
-All training runs are tagged separately on HF Hub
-(`chaosops-grpo-lora`, `-p1`, `-p2`, `-p3a`, `-p3b`, `-p3c`) so the
-ablation table is independently reproducible. Total HF Jobs spend:
-~**$9.80** of the $30 credit budget.
 ---

                     GRAND TOTAL: 8,060+ incident rollouts simulated
 ```
+All training runs are tagged separately on HF Hub so the ablation table
+is independently reproducible:
+- [`chaosops-grpo-lora`](https://huggingface.co/helloAK96/chaosops-grpo-lora) — Phase 0, original baseline
+- [`chaosops-grpo-lora-p1`](https://huggingface.co/helloAK96/chaosops-grpo-lora-p1) — Phase 1, LR fix
+- [`chaosops-grpo-lora-p2`](https://huggingface.co/helloAK96/chaosops-grpo-lora-p2) — Phase 2, curriculum + r=32
+- [`chaosops-grpo-lora-p3a`](https://huggingface.co/helloAK96/chaosops-grpo-lora-p3a) — Phase 3A, **submitted (live on Space)** 🏆
+- [`chaosops-grpo-lora-p3b`](https://huggingface.co/helloAK96/chaosops-grpo-lora-p3b) — Phase 3B control, capacity-bound 1.5B
+- [`chaosops-grpo-lora-p3c`](https://huggingface.co/helloAK96/chaosops-grpo-lora-p3c) — Phase 3C control, reverse curriculum
+Total HF Jobs spend: **~$9.80** of the $30 credit budget.
 ---