Spaces:
Running
Running
Commit History
---- aa1acaa
Shabista Sehar commited on
feat: implement dataset loader, environment, and GRPO training pipeline for undertrial bail prediction bf8f1ff
modified a085ad1
Shabista Sehar commited on
implemented d8f8a45
Shabista Sehar commited on
Fix A3 (OOM eval), B9 (NDPS eligibility), B3 (direction-gated computation bonus), A8-pt2 (episode_id case lookup) 4855450
Fix 8 compliance gaps: repeat-action dedup+cache, min-steps hard block, criminal history tool (12th action), efficiency removed from training formula, circular import cleaned, yaml formula synced 898bc18
Reward overhaul: add compute_reasoning_quality (anchoring+arithmetic+specificity+consistency), parity-grounds penalty, reduce outcome 40%->30%, add 10% reasoning quality signal ca62faa
Fix 5 bugs: inference mode reset, step_counts in curriculum, adapter-only save (x3), DEMO001 false defence claim, episode_id in /reset 37edd09
import fixed c1adced
Shabista Sehar commited on