helloAK96 Claude Opus 4.7 commited on
Commit
7ea9030
Β·
1 Parent(s): 257b713

README: signpost the phase-wise judge demo notebook

Browse files

The notebook is now structured as a 10-phase storytelling walkthrough that
maps 1:1 to the four judging criteria. Phases 2-7 re-run the same
autoscaler_cost_cut HARD scenario with random/heuristic/oracle/trained so
judges can SEE what learned-vs-untrained looks like; Phase 8 regenerates
the full 540-episode comparison curve from scratch; Phase 9 is an
optional 20-step mini-GRPO run on Qwen 1.5B + T4 that proves the
training pipeline reproduces.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -26,7 +26,7 @@ ChaosOps AI is a reinforcement-learning environment where four specialist LLM ag
26
 
27
  - πŸš€ **Live demo (HF Space, runnable):** <https://huggingface.co/spaces/helloAK96/chaosops>
28
  - πŸ€– **Trained LoRA adapter (HF Hub):** <https://huggingface.co/helloAK96/chaosops-grpo-lora-p3a> *(Qwen 2.5-3B + curriculum + LoRA r=32 + 2Γ— rogue-rubric reward; the live Space's `trained` lane uses this adapter)*
29
- - πŸ“’ **Re-runnable training notebook:** [`notebooks/colab_train.ipynb`](notebooks/colab_train.ipynb)
30
  - πŸ“ **Mini blog post:** see [BLOG.md](BLOG.md) (also on HF Posts)
31
  - 🎬 **2-minute screencast:** *(YouTube link added once recorded)*
32
  - πŸ“¦ **Source repo:** <https://github.com/vatsalllll/chaos_ops>
 
26
 
27
  - πŸš€ **Live demo (HF Space, runnable):** <https://huggingface.co/spaces/helloAK96/chaosops>
28
  - πŸ€– **Trained LoRA adapter (HF Hub):** <https://huggingface.co/helloAK96/chaosops-grpo-lora-p3a> *(Qwen 2.5-3B + curriculum + LoRA r=32 + 2Γ— rogue-rubric reward; the live Space's `trained` lane uses this adapter)*
29
+ - πŸ“’ **Phase-wise judge demo notebook:** [`notebooks/colab_train.ipynb`](notebooks/colab_train.ipynb) *(25 cells, runs in ~15 min on a free T4. Phases 2–7 replay the same `autoscaler_cost_cut` HARD scenario with random / heuristic / oracle / trained so you can **watch** what learned-vs-untrained looks like; Phase 8 regenerates the full 540-episode comparison curve; Phase 9 is an optional 20-step mini-GRPO run that confirms the training pipeline reproduces.)*
30
  - πŸ“ **Mini blog post:** see [BLOG.md](BLOG.md) (also on HF Posts)
31
  - 🎬 **2-minute screencast:** *(YouTube link added once recorded)*
32
  - πŸ“¦ **Source repo:** <https://github.com/vatsalllll/chaos_ops>