feat: implement dataset loader, environment, and GRPO training pipeline for undertrial bail prediction bf8f1ff Draken1606 commited on 14 days ago
Fix 5 bugs: inference mode reset, step_counts in curriculum, adapter-only save (x3), DEMO001 false defence claim, episode_id in /reset 37edd09 Draken1606 commited on 14 days ago
Fix all audit gaps: custody neutral, parity-first bias, skip penalty 0.40, statutory process reward, /observation endpoint, reset() timeout, drift determinism 2bc545f Draken1606 commited on 15 days ago
Fix demo: remove shuffle for deterministic seed=0; update stage 2+4 memos to match real seed=0 GT outcomes 8975766 Draken1606 commited on 15 days ago
Add seed param to /reset: demo pins to seed=0 per stage for consistent known episodes 9932c2e Draken1606 commited on 15 days ago
Add 4 rich demo episodes (all stages incl. bias+BNSS); demo UI cycles random stages 53cd456 Draken1606 commited on 16 days ago