helloAK96 Claude Opus 4.7 commited on
Commit
2947218
Β·
1 Parent(s): 5aa47dc

README + BLOG: link all 6 LoRAs in the ablation list to their Hub repos

Browse files

Each phase's LoRA name is now a clickable link to its model repo on
HuggingFace Hub. Judges can navigate from the blog/readme straight into
the auditable training artifacts (model card, training_metrics.json,
adapter_model.safetensors) for every run, including the controls and
the failed runs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (2) hide show
  1. BLOG.md +6 -6
  2. README.md +11 -4
BLOG.md CHANGED
@@ -234,12 +234,12 @@ ChaosOps AI is not a benchmark. It's a *training environment*: a place where the
234
 
235
  The thing we'd want to know if we were judging this: did a 3B model with $1.93 of training and zero supervised demonstrations actually learn the right behaviour, or are we cherry-picking? The 540-episode evaluation across 9 failure types Γ— 3 tiers Γ— 5 seeds Γ— 4 policies is in the repo, with the JSON. The plots are committed as PNGs, not just embedded in a Colab cell that might disappear. Every LoRA from every phase β€” including the failures β€” is on the Hub:
236
 
237
- - **chaosops-grpo-lora** (Phase 0, the silent-fallback victim)
238
- - **chaosops-grpo-lora-p1** (Phase 1, LR fix)
239
- - **chaosops-grpo-lora-p2** (Phase 2, curriculum)
240
- - **chaosops-grpo-lora-p3a** (Phase 3A β€” submitted, live on the Space) πŸ†
241
- - **chaosops-grpo-lora-p3b** (Phase 3B control, falsified the cheap hypothesis)
242
- - **chaosops-grpo-lora-p3c** (Phase 3C control, falsified reverse curriculum)
243
 
244
  Anyone who wants to verify these numbers can. That felt like the honest way to publish.
245
 
 
234
 
235
  The thing we'd want to know if we were judging this: did a 3B model with $1.93 of training and zero supervised demonstrations actually learn the right behaviour, or are we cherry-picking? The 540-episode evaluation across 9 failure types Γ— 3 tiers Γ— 5 seeds Γ— 4 policies is in the repo, with the JSON. The plots are committed as PNGs, not just embedded in a Colab cell that might disappear. Every LoRA from every phase β€” including the failures β€” is on the Hub:
236
 
237
+ - [**chaosops-grpo-lora**](https://huggingface.co/helloAK96/chaosops-grpo-lora) β€” Phase 0, the silent-fallback victim
238
+ - [**chaosops-grpo-lora-p1**](https://huggingface.co/helloAK96/chaosops-grpo-lora-p1) β€” Phase 1, LR fix
239
+ - [**chaosops-grpo-lora-p2**](https://huggingface.co/helloAK96/chaosops-grpo-lora-p2) β€” Phase 2, curriculum
240
+ - [**chaosops-grpo-lora-p3a**](https://huggingface.co/helloAK96/chaosops-grpo-lora-p3a) β€” Phase 3A β€” submitted, live on the Space πŸ†
241
+ - [**chaosops-grpo-lora-p3b**](https://huggingface.co/helloAK96/chaosops-grpo-lora-p3b) β€” Phase 3B control, falsified the cheap hypothesis
242
+ - [**chaosops-grpo-lora-p3c**](https://huggingface.co/helloAK96/chaosops-grpo-lora-p3c) β€” Phase 3C control, falsified reverse curriculum
243
 
244
  Anyone who wants to verify these numbers can. That felt like the honest way to publish.
245
 
README.md CHANGED
@@ -279,10 +279,17 @@ Baseline episodes: 3 scripted policies Γ— 540 eps = 1,620
279
  GRAND TOTAL: 8,060+ incident rollouts simulated
280
  ```
281
 
282
- All training runs are tagged separately on HF Hub
283
- (`chaosops-grpo-lora`, `-p1`, `-p2`, `-p3a`, `-p3b`, `-p3c`) so the
284
- ablation table is independently reproducible. Total HF Jobs spend:
285
- ~**$9.80** of the $30 credit budget.
 
 
 
 
 
 
 
286
 
287
  ---
288
 
 
279
  GRAND TOTAL: 8,060+ incident rollouts simulated
280
  ```
281
 
282
+ All training runs are tagged separately on HF Hub so the ablation table
283
+ is independently reproducible:
284
+
285
+ - [`chaosops-grpo-lora`](https://huggingface.co/helloAK96/chaosops-grpo-lora) β€” Phase 0, original baseline
286
+ - [`chaosops-grpo-lora-p1`](https://huggingface.co/helloAK96/chaosops-grpo-lora-p1) β€” Phase 1, LR fix
287
+ - [`chaosops-grpo-lora-p2`](https://huggingface.co/helloAK96/chaosops-grpo-lora-p2) β€” Phase 2, curriculum + r=32
288
+ - [`chaosops-grpo-lora-p3a`](https://huggingface.co/helloAK96/chaosops-grpo-lora-p3a) β€” Phase 3A, **submitted (live on Space)** πŸ†
289
+ - [`chaosops-grpo-lora-p3b`](https://huggingface.co/helloAK96/chaosops-grpo-lora-p3b) β€” Phase 3B control, capacity-bound 1.5B
290
+ - [`chaosops-grpo-lora-p3c`](https://huggingface.co/helloAK96/chaosops-grpo-lora-p3c) β€” Phase 3C control, reverse curriculum
291
+
292
+ Total HF Jobs spend: **~$9.80** of the $30 credit budget.
293
 
294
  ---
295