README + BLOG: link all 6 LoRAs in the ablation list to their Hub repos
Browse filesEach phase's LoRA name is now a clickable link to its model repo on
HuggingFace Hub. Judges can navigate from the blog/readme straight into
the auditable training artifacts (model card, training_metrics.json,
adapter_model.safetensors) for every run, including the controls and
the failed runs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
BLOG.md
CHANGED
|
@@ -234,12 +234,12 @@ ChaosOps AI is not a benchmark. It's a *training environment*: a place where the
|
|
| 234 |
|
| 235 |
The thing we'd want to know if we were judging this: did a 3B model with $1.93 of training and zero supervised demonstrations actually learn the right behaviour, or are we cherry-picking? The 540-episode evaluation across 9 failure types Γ 3 tiers Γ 5 seeds Γ 4 policies is in the repo, with the JSON. The plots are committed as PNGs, not just embedded in a Colab cell that might disappear. Every LoRA from every phase β including the failures β is on the Hub:
|
| 236 |
|
| 237 |
-
- **chaosops-grpo-lora**
|
| 238 |
-
- **chaosops-grpo-lora-p1**
|
| 239 |
-
- **chaosops-grpo-lora-p2**
|
| 240 |
-
- **chaosops-grpo-lora-p3a**
|
| 241 |
-
- **chaosops-grpo-lora-p3b**
|
| 242 |
-
- **chaosops-grpo-lora-p3c**
|
| 243 |
|
| 244 |
Anyone who wants to verify these numbers can. That felt like the honest way to publish.
|
| 245 |
|
|
|
|
| 234 |
|
| 235 |
The thing we'd want to know if we were judging this: did a 3B model with $1.93 of training and zero supervised demonstrations actually learn the right behaviour, or are we cherry-picking? The 540-episode evaluation across 9 failure types Γ 3 tiers Γ 5 seeds Γ 4 policies is in the repo, with the JSON. The plots are committed as PNGs, not just embedded in a Colab cell that might disappear. Every LoRA from every phase β including the failures β is on the Hub:
|
| 236 |
|
| 237 |
+
- [**chaosops-grpo-lora**](https://huggingface.co/helloAK96/chaosops-grpo-lora) β Phase 0, the silent-fallback victim
|
| 238 |
+
- [**chaosops-grpo-lora-p1**](https://huggingface.co/helloAK96/chaosops-grpo-lora-p1) β Phase 1, LR fix
|
| 239 |
+
- [**chaosops-grpo-lora-p2**](https://huggingface.co/helloAK96/chaosops-grpo-lora-p2) β Phase 2, curriculum
|
| 240 |
+
- [**chaosops-grpo-lora-p3a**](https://huggingface.co/helloAK96/chaosops-grpo-lora-p3a) β Phase 3A β submitted, live on the Space π
|
| 241 |
+
- [**chaosops-grpo-lora-p3b**](https://huggingface.co/helloAK96/chaosops-grpo-lora-p3b) β Phase 3B control, falsified the cheap hypothesis
|
| 242 |
+
- [**chaosops-grpo-lora-p3c**](https://huggingface.co/helloAK96/chaosops-grpo-lora-p3c) β Phase 3C control, falsified reverse curriculum
|
| 243 |
|
| 244 |
Anyone who wants to verify these numbers can. That felt like the honest way to publish.
|
| 245 |
|
README.md
CHANGED
|
@@ -279,10 +279,17 @@ Baseline episodes: 3 scripted policies Γ 540 eps = 1,620
|
|
| 279 |
GRAND TOTAL: 8,060+ incident rollouts simulated
|
| 280 |
```
|
| 281 |
|
| 282 |
-
All training runs are tagged separately on HF Hub
|
| 283 |
-
|
| 284 |
-
|
| 285 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 286 |
|
| 287 |
---
|
| 288 |
|
|
|
|
| 279 |
GRAND TOTAL: 8,060+ incident rollouts simulated
|
| 280 |
```
|
| 281 |
|
| 282 |
+
All training runs are tagged separately on HF Hub so the ablation table
|
| 283 |
+
is independently reproducible:
|
| 284 |
+
|
| 285 |
+
- [`chaosops-grpo-lora`](https://huggingface.co/helloAK96/chaosops-grpo-lora) β Phase 0, original baseline
|
| 286 |
+
- [`chaosops-grpo-lora-p1`](https://huggingface.co/helloAK96/chaosops-grpo-lora-p1) β Phase 1, LR fix
|
| 287 |
+
- [`chaosops-grpo-lora-p2`](https://huggingface.co/helloAK96/chaosops-grpo-lora-p2) β Phase 2, curriculum + r=32
|
| 288 |
+
- [`chaosops-grpo-lora-p3a`](https://huggingface.co/helloAK96/chaosops-grpo-lora-p3a) β Phase 3A, **submitted (live on Space)** π
|
| 289 |
+
- [`chaosops-grpo-lora-p3b`](https://huggingface.co/helloAK96/chaosops-grpo-lora-p3b) β Phase 3B control, capacity-bound 1.5B
|
| 290 |
+
- [`chaosops-grpo-lora-p3c`](https://huggingface.co/helloAK96/chaosops-grpo-lora-p3c) β Phase 3C control, reverse curriculum
|
| 291 |
+
|
| 292 |
+
Total HF Jobs spend: **~$9.80** of the $30 credit budget.
|
| 293 |
|
| 294 |
---
|
| 295 |
|