Commit History

GRPO run: grpo-20260426-062846
4c6c4db
verified

akhiilll commited on

train: torch_dtype not dtype
e89d325
verified

akhiilll commited on

README: judging-criteria mapping + reproduce + per-task evidence pointers
3ecf9bc
verified

akhiilll commited on

notebook: add executed cell outputs (smoke test + job submission). Chart cells fill on re-execute after job completes.
2bf96c8
verified

akhiilll commited on

train: drop Unsloth (CUDA-init flake), bypass is_available cache, default to Qwen/Qwen2.5-1.5B-Instruct
9e0aa35
verified

akhiilll commited on

train: CUDA-warmup retry + cuda-gated unsloth import
f2d0b93
verified

akhiilll commited on

train: drop server/* import β€” inline verb table + dense costs (job ran without openenv-core)
eca0cc8
verified

akhiilll commited on

train: fix WS protocol β€” use data wrapper + flat message field
d56b890
verified

akhiilll commited on

claims-env-pro v2.0.0 β€” initial port (12 verbs / 5 tasks / 18 obs / 6-comp reward / Plaid)
027ea1a
verified

akhiilll commited on

initial commit
6b58588
verified

akhiilll commited on