README: signpost the phase-wise judge demo notebook
Browse filesThe notebook is now structured as a 10-phase storytelling walkthrough that
maps 1:1 to the four judging criteria. Phases 2-7 re-run the same
autoscaler_cost_cut HARD scenario with random/heuristic/oracle/trained so
judges can SEE what learned-vs-untrained looks like; Phase 8 regenerates
the full 540-episode comparison curve from scratch; Phase 9 is an
optional 20-step mini-GRPO run on Qwen 1.5B + T4 that proves the
training pipeline reproduces.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
README.md
CHANGED
|
@@ -26,7 +26,7 @@ ChaosOps AI is a reinforcement-learning environment where four specialist LLM ag
|
|
| 26 |
|
| 27 |
- π **Live demo (HF Space, runnable):** <https://huggingface.co/spaces/helloAK96/chaosops>
|
| 28 |
- π€ **Trained LoRA adapter (HF Hub):** <https://huggingface.co/helloAK96/chaosops-grpo-lora-p3a> *(Qwen 2.5-3B + curriculum + LoRA r=32 + 2Γ rogue-rubric reward; the live Space's `trained` lane uses this adapter)*
|
| 29 |
-
- π **
|
| 30 |
- π **Mini blog post:** see [BLOG.md](BLOG.md) (also on HF Posts)
|
| 31 |
- π¬ **2-minute screencast:** *(YouTube link added once recorded)*
|
| 32 |
- π¦ **Source repo:** <https://github.com/vatsalllll/chaos_ops>
|
|
|
|
| 26 |
|
| 27 |
- π **Live demo (HF Space, runnable):** <https://huggingface.co/spaces/helloAK96/chaosops>
|
| 28 |
- π€ **Trained LoRA adapter (HF Hub):** <https://huggingface.co/helloAK96/chaosops-grpo-lora-p3a> *(Qwen 2.5-3B + curriculum + LoRA r=32 + 2Γ rogue-rubric reward; the live Space's `trained` lane uses this adapter)*
|
| 29 |
+
- π **Phase-wise judge demo notebook:** [`notebooks/colab_train.ipynb`](notebooks/colab_train.ipynb) *(25 cells, runs in ~15 min on a free T4. Phases 2β7 replay the same `autoscaler_cost_cut` HARD scenario with random / heuristic / oracle / trained so you can **watch** what learned-vs-untrained looks like; Phase 8 regenerates the full 540-episode comparison curve; Phase 9 is an optional 20-step mini-GRPO run that confirms the training pipeline reproduces.)*
|
| 30 |
- π **Mini blog post:** see [BLOG.md](BLOG.md) (also on HF Posts)
|
| 31 |
- π¬ **2-minute screencast:** *(YouTube link added once recorded)*
|
| 32 |
- π¦ **Source repo:** <https://github.com/vatsalllll/chaos_ops>
|