Spaces:

helloAK96
/

chaosops

Running

App Files Files Community

chaosops

Commit History

README + BLOG: link all 6 LoRAs in the ablation list to their Hub repos

2947218

Running

helloAK96 Claude Opus 4.7 commited on 13 days ago

Notebook: Phase 8c cleanup cell to free GPU before Phase 9 mini-train (OOM fix)

5aa47dc
verified

helloAK96 commited on 13 days ago

docs: add 2-minute submission video script

7a6ae0c

helloAK96 Claude Opus 4.7 commited on 13 days ago

BLOG: rewrite as a first-person team narrative — the real story

5abeaf0

helloAK96 Claude Opus 4.7 commited on 13 days ago

BLOG: rewrite as a trending HF post — hero image, big numbers, 3 plots embedded

28fa782

helloAK96 Claude Opus 4.7 commited on 13 days ago

README + BLOG: explicitly call out HF Jobs as our training infrastructure

1a6f7f1

helloAK96 Claude Opus 4.7 commited on 13 days ago

README: add Training History section — 3,200 episodes across 6 GRPO runs

adbc390

helloAK96 Claude Opus 4.7 commited on 13 days ago

Notebook: add Phase 8b — training history (3,200 episodes, 6-run ablation table)

a9790c1
verified

helloAK96 commited on 13 days ago

Notebook: drop bitsandbytes (triton 3.x ABI break) — fp16 inference doesnt need it

5228bdf
verified

helloAK96 commited on 13 days ago

Notebook: fix PYTHONPATH inheritance for !python -m chaosops subprocesses

e4d5c6c
verified

helloAK96 commited on 13 days ago

README: signpost the phase-wise judge demo notebook

7ea9030

helloAK96 Claude Opus 4.7 commited on 13 days ago

Phase-wise judge demo notebook (rubric-aligned, 25 cells, Phase 3A LoRA)

257b713
verified

helloAK96 commited on 13 days ago

Ship Colab training notebook so README link resolves for validation pass

9f18471
verified

helloAK96 commited on 13 days ago

Promote Phase 3A LoRA — Qwen 3B beats heuristic on HARD, 100% rogue catch

90452ca

helloAK96 Claude Opus 4.7 commited on 13 days ago

Phase 3A: learning_curve.png

6ac8672
verified

helloAK96 commited on 13 days ago

Phase 3A: comparison_curve.png

7cc7a53
verified

helloAK96 commited on 13 days ago

GRPO: add --rogue-bonus-multiplier to amplify oversight gradient signal

6f963e5

helloAK96 Claude Opus 4.7 commited on 13 days ago

Promote Phase 2 LoRA (curriculum + LR=2e-5 + r=32) as the live trained lane

f89a0e8

helloAK96 Claude Opus 4.7 commited on 13 days ago

Phase 2 LoRA: learning_curve.png

0b53f2f
verified

helloAK96 commited on 13 days ago

Phase 2 LoRA: comparison_curve.png

677fb97
verified

helloAK96 commited on 13 days ago

Add A/B comparison Job for trained-policy showdown

e6e88e7

helloAK96 Claude Opus 4.7 commited on 14 days ago

GRPO: expose --learning-rate, --temperature, --curriculum-schedule

6e35cec

helloAK96 Claude Opus 4.7 commited on 14 days ago

README: real before/after numbers from the 540-episode evaluation

8878953

helloAK96 Claude Opus 4.7 commited on 14 days ago

Ship trained-policy artifact: training_metrics.json

ffdbc68
verified

helloAK96 commited on 14 days ago

Ship trained-policy artifact: learning_curve.png

12e2504
verified

helloAK96 commited on 14 days ago

Ship trained-policy artifact: evaluation_summary.txt

83ffa3f
verified

helloAK96 commited on 14 days ago

Ship trained-policy artifact: comparison_curve.png

a1505d6
verified

helloAK96 commited on 14 days ago

Add HF Jobs entry-point for post-training evaluation

b36bddc

helloAK96 Claude Opus 4.7 commited on 14 days ago

Ship baseline_curve.png so the Space README embed renders

5b2169b

helloAK96 Claude Opus 4.7 commited on 14 days ago

README: add submission links, composable-rubric docs, plot embeds, package layout refresh

4ce0ada

helloAK96 Claude Opus 4.7 commited on 14 days ago

Add transformers-backend GRPO loader (no triton/Unsloth dep) + fix Jobs deps

622e3ec

helloAK96 Claude Opus 4.7 commited on 14 days ago

Add HF Jobs entry-point script for GRPO training

d0b6521

helloAK96 Claude Opus 4.7 commited on 14 days ago

Phase A submission cleanup — OpenEnv compliance + composable rubrics + loud-fail trained lane

adfe21e

helloAK96 Claude Opus 4.7 commited on 14 days ago

rebuilding

1d27c7d

helloAK96 commited on 14 days ago

readme

56464b6

helloAK96 commited on 14 days ago

Initializing space

83136ac

helloAK96 commited on 14 days ago

Commit History

README + BLOG: link all 6 LoRAs in the ablation list to their Hub repos 2947218 Running

Notebook: Phase 8c cleanup cell to free GPU before Phase 9 mini-train (OOM fix) 5aa47dc verified

docs: add 2-minute submission video script 7a6ae0c

BLOG: rewrite as a first-person team narrative — the real story 5abeaf0

BLOG: rewrite as a trending HF post — hero image, big numbers, 3 plots embedded 28fa782

README + BLOG: explicitly call out HF Jobs as our training infrastructure 1a6f7f1

README: add Training History section — 3,200 episodes across 6 GRPO runs adbc390

Notebook: add Phase 8b — training history (3,200 episodes, 6-run ablation table) a9790c1 verified

Notebook: drop bitsandbytes (triton 3.x ABI break) — fp16 inference doesnt need it 5228bdf verified

Notebook: fix PYTHONPATH inheritance for !python -m chaosops subprocesses e4d5c6c verified

README: signpost the phase-wise judge demo notebook 7ea9030

Phase-wise judge demo notebook (rubric-aligned, 25 cells, Phase 3A LoRA) 257b713 verified

Ship Colab training notebook so README link resolves for validation pass 9f18471 verified

Promote Phase 3A LoRA — Qwen 3B beats heuristic on HARD, 100% rogue catch 90452ca

Phase 3A: learning_curve.png 6ac8672 verified

Phase 3A: comparison_curve.png 7cc7a53 verified

GRPO: add --rogue-bonus-multiplier to amplify oversight gradient signal 6f963e5

Promote Phase 2 LoRA (curriculum + LR=2e-5 + r=32) as the live trained lane f89a0e8

Phase 2 LoRA: learning_curve.png 0b53f2f verified

Phase 2 LoRA: comparison_curve.png 677fb97 verified

Add A/B comparison Job for trained-policy showdown e6e88e7

GRPO: expose --learning-rate, --temperature, --curriculum-schedule 6e35cec

README: real before/after numbers from the 540-episode evaluation 8878953

Ship trained-policy artifact: training_metrics.json ffdbc68 verified

Ship trained-policy artifact: learning_curve.png 12e2504 verified

Ship trained-policy artifact: evaluation_summary.txt 83ffa3f verified

Ship trained-policy artifact: comparison_curve.png a1505d6 verified

Add HF Jobs entry-point for post-training evaluation b36bddc

Ship baseline_curve.png so the Space README embed renders 5b2169b

README: add submission links, composable-rubric docs, plot embeds, package layout refresh 4ce0ada

Add transformers-backend GRPO loader (no triton/Unsloth dep) + fix Jobs deps 622e3ec

Add HF Jobs entry-point script for GRPO training d0b6521

Phase A submission cleanup — OpenEnv compliance + composable rubrics + loud-fail trained lane adfe21e

rebuilding 1d27c7d

readme 56464b6

Initializing space 83136ac

README + BLOG: link all 6 LoRAs in the ablation list to their Hub repos

2947218

Running

Notebook: Phase 8c cleanup cell to free GPU before Phase 9 mini-train (OOM fix)

5aa47dc
verified

docs: add 2-minute submission video script

7a6ae0c

BLOG: rewrite as a first-person team narrative — the real story

5abeaf0

BLOG: rewrite as a trending HF post — hero image, big numbers, 3 plots embedded

28fa782

README + BLOG: explicitly call out HF Jobs as our training infrastructure

1a6f7f1

README: add Training History section — 3,200 episodes across 6 GRPO runs

adbc390

Notebook: add Phase 8b — training history (3,200 episodes, 6-run ablation table)

a9790c1
verified

Notebook: drop bitsandbytes (triton 3.x ABI break) — fp16 inference doesnt need it

5228bdf
verified

Notebook: fix PYTHONPATH inheritance for !python -m chaosops subprocesses

e4d5c6c
verified

README: signpost the phase-wise judge demo notebook

7ea9030

Phase-wise judge demo notebook (rubric-aligned, 25 cells, Phase 3A LoRA)

257b713
verified

Ship Colab training notebook so README link resolves for validation pass

9f18471
verified

Promote Phase 3A LoRA — Qwen 3B beats heuristic on HARD, 100% rogue catch

90452ca

Phase 3A: learning_curve.png

6ac8672
verified

Phase 3A: comparison_curve.png

7cc7a53
verified

GRPO: add --rogue-bonus-multiplier to amplify oversight gradient signal

6f963e5

Promote Phase 2 LoRA (curriculum + LR=2e-5 + r=32) as the live trained lane

f89a0e8

Phase 2 LoRA: learning_curve.png

0b53f2f
verified

Phase 2 LoRA: comparison_curve.png

677fb97
verified

Add A/B comparison Job for trained-policy showdown

e6e88e7

GRPO: expose --learning-rate, --temperature, --curriculum-schedule

6e35cec

README: real before/after numbers from the 540-episode evaluation

8878953

Ship trained-policy artifact: training_metrics.json

ffdbc68
verified

Ship trained-policy artifact: learning_curve.png

12e2504
verified

Ship trained-policy artifact: evaluation_summary.txt

83ffa3f
verified

Ship trained-policy artifact: comparison_curve.png

a1505d6
verified

Add HF Jobs entry-point for post-training evaluation

b36bddc

Ship baseline_curve.png so the Space README embed renders

5b2169b

README: add submission links, composable-rubric docs, plot embeds, package layout refresh

4ce0ada

Add transformers-backend GRPO loader (no triton/Unsloth dep) + fix Jobs deps

622e3ec

Add HF Jobs entry-point script for GRPO training

d0b6521

Phase A submission cleanup — OpenEnv compliance + composable rubrics + loud-fail trained lane

adfe21e

rebuilding

1d27c7d

readme

56464b6

Initializing space

83136ac