Spaces:

helloAK96
/

chaosops

Running

helloAK96 Claude Opus 4.7 commited on 13 days ago

Commit

7ea9030

1 Parent(s): 257b713

README: signpost the phase-wise judge demo notebook

The notebook is now structured as a 10-phase storytelling walkthrough that
maps 1:1 to the four judging criteria. Phases 2-7 re-run the same
autoscaler_cost_cut HARD scenario with random/heuristic/oracle/trained so
judges can SEE what learned-vs-untrained looks like; Phase 8 regenerates
the full 540-episode comparison curve from scratch; Phase 9 is an
optional 20-step mini-GRPO run on Qwen 1.5B + T4 that proves the
training pipeline reproduces.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -26,7 +26,7 @@ ChaosOps AI is a reinforcement-learning environment where four specialist LLM ag
 - 🚀 **Live demo (HF Space, runnable):** <https://huggingface.co/spaces/helloAK96/chaosops>
 - 🤖 **Trained LoRA adapter (HF Hub):** <https://huggingface.co/helloAK96/chaosops-grpo-lora-p3a> *(Qwen 2.5-3B + curriculum + LoRA r=32 + 2× rogue-rubric reward; the live Space's `trained` lane uses this adapter)*
-- 📒 **Re-runnable training notebook:** [`notebooks/colab_train.ipynb`](notebooks/colab_train.ipynb)
 - 📝 **Mini blog post:** see [BLOG.md](BLOG.md) (also on HF Posts)
 - 🎬 **2-minute screencast:** *(YouTube link added once recorded)*
 - 📦 **Source repo:** <https://github.com/vatsalllll/chaos_ops>

 - 🚀 **Live demo (HF Space, runnable):** <https://huggingface.co/spaces/helloAK96/chaosops>
 - 🤖 **Trained LoRA adapter (HF Hub):** <https://huggingface.co/helloAK96/chaosops-grpo-lora-p3a> *(Qwen 2.5-3B + curriculum + LoRA r=32 + 2× rogue-rubric reward; the live Space's `trained` lane uses this adapter)*
+- 📒 **Phase-wise judge demo notebook:** [`notebooks/colab_train.ipynb`](notebooks/colab_train.ipynb) *(25 cells, runs in ~15 min on a free T4. Phases 2–7 replay the same `autoscaler_cost_cut` HARD scenario with random / heuristic / oracle / trained so you can **watch** what learned-vs-untrained looks like; Phase 8 regenerates the full 540-episode comparison curve; Phase 9 is an optional 20-step mini-GRPO run that confirms the training pipeline reproduces.)*
 - 📝 **Mini blog post:** see [BLOG.md](BLOG.md) (also on HF Posts)
 - 🎬 **2-minute screencast:** *(YouTube link added once recorded)*
 - 📦 **Source repo:** <https://github.com/vatsalllll/chaos_ops>