Commit History

README + BLOG: link all 6 LoRAs in the ablation list to their Hub repos
2947218
Running

helloAK96 Claude Opus 4.7 commited on

Notebook: Phase 8c cleanup cell to free GPU before Phase 9 mini-train (OOM fix)
5aa47dc
verified

helloAK96 commited on

docs: add 2-minute submission video script
7a6ae0c

helloAK96 Claude Opus 4.7 commited on

BLOG: rewrite as a first-person team narrative β€” the real story
5abeaf0

helloAK96 Claude Opus 4.7 commited on

BLOG: rewrite as a trending HF post β€” hero image, big numbers, 3 plots embedded
28fa782

helloAK96 Claude Opus 4.7 commited on

README + BLOG: explicitly call out HF Jobs as our training infrastructure
1a6f7f1

helloAK96 Claude Opus 4.7 commited on

README: add Training History section β€” 3,200 episodes across 6 GRPO runs
adbc390

helloAK96 Claude Opus 4.7 commited on

Notebook: add Phase 8b β€” training history (3,200 episodes, 6-run ablation table)
a9790c1
verified

helloAK96 commited on

Notebook: drop bitsandbytes (triton 3.x ABI break) β€” fp16 inference doesnt need it
5228bdf
verified

helloAK96 commited on

Notebook: fix PYTHONPATH inheritance for !python -m chaosops subprocesses
e4d5c6c
verified

helloAK96 commited on

README: signpost the phase-wise judge demo notebook
7ea9030

helloAK96 Claude Opus 4.7 commited on

Phase-wise judge demo notebook (rubric-aligned, 25 cells, Phase 3A LoRA)
257b713
verified

helloAK96 commited on

Ship Colab training notebook so README link resolves for validation pass
9f18471
verified

helloAK96 commited on

Promote Phase 3A LoRA β€” Qwen 3B beats heuristic on HARD, 100% rogue catch
90452ca

helloAK96 Claude Opus 4.7 commited on

Phase 3A: learning_curve.png
6ac8672
verified

helloAK96 commited on

Phase 3A: comparison_curve.png
7cc7a53
verified

helloAK96 commited on

GRPO: add --rogue-bonus-multiplier to amplify oversight gradient signal
6f963e5

helloAK96 Claude Opus 4.7 commited on

Promote Phase 2 LoRA (curriculum + LR=2e-5 + r=32) as the live trained lane
f89a0e8

helloAK96 Claude Opus 4.7 commited on

Phase 2 LoRA: learning_curve.png
0b53f2f
verified

helloAK96 commited on

Phase 2 LoRA: comparison_curve.png
677fb97
verified

helloAK96 commited on

Add A/B comparison Job for trained-policy showdown
e6e88e7

helloAK96 Claude Opus 4.7 commited on

GRPO: expose --learning-rate, --temperature, --curriculum-schedule
6e35cec

helloAK96 Claude Opus 4.7 commited on

README: real before/after numbers from the 540-episode evaluation
8878953

helloAK96 Claude Opus 4.7 commited on

Ship trained-policy artifact: training_metrics.json
ffdbc68
verified

helloAK96 commited on

Ship trained-policy artifact: learning_curve.png
12e2504
verified

helloAK96 commited on

Ship trained-policy artifact: evaluation_summary.txt
83ffa3f
verified

helloAK96 commited on

Ship trained-policy artifact: comparison_curve.png
a1505d6
verified

helloAK96 commited on

Add HF Jobs entry-point for post-training evaluation
b36bddc

helloAK96 Claude Opus 4.7 commited on

Ship baseline_curve.png so the Space README embed renders
5b2169b

helloAK96 Claude Opus 4.7 commited on

README: add submission links, composable-rubric docs, plot embeds, package layout refresh
4ce0ada

helloAK96 Claude Opus 4.7 commited on

Add transformers-backend GRPO loader (no triton/Unsloth dep) + fix Jobs deps
622e3ec

helloAK96 Claude Opus 4.7 commited on

Add HF Jobs entry-point script for GRPO training
d0b6521

helloAK96 Claude Opus 4.7 commited on

Phase A submission cleanup β€” OpenEnv compliance + composable rubrics + loud-fail trained lane
adfe21e

helloAK96 Claude Opus 4.7 commited on

rebuilding
1d27c7d

helloAK96 commited on

readme
56464b6

helloAK96 commited on

Initializing space
83136ac

helloAK96 commited on