README: add Training History section β 3,200 episodes across 6 GRPO runs adbc390 helloAK96 Claude Opus 4.7 commited on 14 days ago
Notebook: add Phase 8b β training history (3,200 episodes, 6-run ablation table) a9790c1 verified helloAK96 commited on 14 days ago
Notebook: drop bitsandbytes (triton 3.x ABI break) β fp16 inference doesnt need it 5228bdf verified helloAK96 commited on 14 days ago
Notebook: fix PYTHONPATH inheritance for !python -m chaosops subprocesses e4d5c6c verified helloAK96 commited on 14 days ago
README: signpost the phase-wise judge demo notebook 7ea9030 helloAK96 Claude Opus 4.7 commited on 14 days ago
Phase-wise judge demo notebook (rubric-aligned, 25 cells, Phase 3A LoRA) 257b713 verified helloAK96 commited on 14 days ago
Ship Colab training notebook so README link resolves for validation pass 9f18471 verified helloAK96 commited on 14 days ago
Promote Phase 3A LoRA β Qwen 3B beats heuristic on HARD, 100% rogue catch 90452ca helloAK96 Claude Opus 4.7 commited on 14 days ago
GRPO: add --rogue-bonus-multiplier to amplify oversight gradient signal 6f963e5 helloAK96 Claude Opus 4.7 commited on 14 days ago
Promote Phase 2 LoRA (curriculum + LR=2e-5 + r=32) as the live trained lane f89a0e8 helloAK96 Claude Opus 4.7 commited on 14 days ago
Add A/B comparison Job for trained-policy showdown e6e88e7 helloAK96 Claude Opus 4.7 commited on 15 days ago
GRPO: expose --learning-rate, --temperature, --curriculum-schedule 6e35cec helloAK96 Claude Opus 4.7 commited on 15 days ago
README: real before/after numbers from the 540-episode evaluation 8878953 helloAK96 Claude Opus 4.7 commited on 15 days ago
Ship trained-policy artifact: training_metrics.json ffdbc68 verified helloAK96 commited on 15 days ago
Ship trained-policy artifact: evaluation_summary.txt 83ffa3f verified helloAK96 commited on 15 days ago
Ship trained-policy artifact: comparison_curve.png a1505d6 verified helloAK96 commited on 15 days ago
Add HF Jobs entry-point for post-training evaluation b36bddc helloAK96 Claude Opus 4.7 commited on 15 days ago
Ship baseline_curve.png so the Space README embed renders 5b2169b helloAK96 Claude Opus 4.7 commited on 15 days ago
README: add submission links, composable-rubric docs, plot embeds, package layout refresh 4ce0ada helloAK96 Claude Opus 4.7 commited on 15 days ago
Add transformers-backend GRPO loader (no triton/Unsloth dep) + fix Jobs deps 622e3ec helloAK96 Claude Opus 4.7 commited on 15 days ago
Add HF Jobs entry-point script for GRPO training d0b6521 helloAK96 Claude Opus 4.7 commited on 15 days ago
Phase A submission cleanup β OpenEnv compliance + composable rubrics + loud-fail trained lane adfe21e helloAK96 Claude Opus 4.7 commited on 15 days ago