README + BLOG: link all 6 LoRAs in the ablation list to their Hub repos 2947218 Running helloAK96 Claude Opus 4.7 commited on 13 days ago
Notebook: Phase 8c cleanup cell to free GPU before Phase 9 mini-train (OOM fix) 5aa47dc verified helloAK96 commited on 13 days ago
BLOG: rewrite as a first-person team narrative β the real story 5abeaf0 helloAK96 Claude Opus 4.7 commited on 13 days ago
BLOG: rewrite as a trending HF post β hero image, big numbers, 3 plots embedded 28fa782 helloAK96 Claude Opus 4.7 commited on 13 days ago
README + BLOG: explicitly call out HF Jobs as our training infrastructure 1a6f7f1 helloAK96 Claude Opus 4.7 commited on 13 days ago
README: add Training History section β 3,200 episodes across 6 GRPO runs adbc390 helloAK96 Claude Opus 4.7 commited on 13 days ago
Notebook: add Phase 8b β training history (3,200 episodes, 6-run ablation table) a9790c1 verified helloAK96 commited on 13 days ago
Notebook: drop bitsandbytes (triton 3.x ABI break) β fp16 inference doesnt need it 5228bdf verified helloAK96 commited on 13 days ago
Notebook: fix PYTHONPATH inheritance for !python -m chaosops subprocesses e4d5c6c verified helloAK96 commited on 13 days ago
README: signpost the phase-wise judge demo notebook 7ea9030 helloAK96 Claude Opus 4.7 commited on 13 days ago
Phase-wise judge demo notebook (rubric-aligned, 25 cells, Phase 3A LoRA) 257b713 verified helloAK96 commited on 13 days ago
Ship Colab training notebook so README link resolves for validation pass 9f18471 verified helloAK96 commited on 13 days ago
Promote Phase 3A LoRA β Qwen 3B beats heuristic on HARD, 100% rogue catch 90452ca helloAK96 Claude Opus 4.7 commited on 13 days ago
GRPO: add --rogue-bonus-multiplier to amplify oversight gradient signal 6f963e5 helloAK96 Claude Opus 4.7 commited on 13 days ago
Promote Phase 2 LoRA (curriculum + LR=2e-5 + r=32) as the live trained lane f89a0e8 helloAK96 Claude Opus 4.7 commited on 13 days ago
Add A/B comparison Job for trained-policy showdown e6e88e7 helloAK96 Claude Opus 4.7 commited on 14 days ago
GRPO: expose --learning-rate, --temperature, --curriculum-schedule 6e35cec helloAK96 Claude Opus 4.7 commited on 14 days ago
README: real before/after numbers from the 540-episode evaluation 8878953 helloAK96 Claude Opus 4.7 commited on 14 days ago
Ship trained-policy artifact: training_metrics.json ffdbc68 verified helloAK96 commited on 14 days ago
Ship trained-policy artifact: evaluation_summary.txt 83ffa3f verified helloAK96 commited on 14 days ago
Ship trained-policy artifact: comparison_curve.png a1505d6 verified helloAK96 commited on 14 days ago
Add HF Jobs entry-point for post-training evaluation b36bddc helloAK96 Claude Opus 4.7 commited on 14 days ago
Ship baseline_curve.png so the Space README embed renders 5b2169b helloAK96 Claude Opus 4.7 commited on 14 days ago
README: add submission links, composable-rubric docs, plot embeds, package layout refresh 4ce0ada helloAK96 Claude Opus 4.7 commited on 14 days ago
Add transformers-backend GRPO loader (no triton/Unsloth dep) + fix Jobs deps 622e3ec helloAK96 Claude Opus 4.7 commited on 14 days ago
Add HF Jobs entry-point script for GRPO training d0b6521 helloAK96 Claude Opus 4.7 commited on 14 days ago
Phase A submission cleanup β OpenEnv compliance + composable rubrics + loud-fail trained lane adfe21e helloAK96 Claude Opus 4.7 commited on 14 days ago