undertrial-ai / training

Commit History

Fix all stale 1.5B refs to 7B + LR/beta/completion fixes
7fa6a21
Running

Draken1606 commited on

Fix notebook: remove hardcoded 384, use defaults (640, lr=5e-5, beta=0.04)
cb53788

Draken1606 commited on

Fix stagnation: LR 5e-5, beta 0.04, default 640
001f0ed

Draken1606 commited on

max_completion=640 temp=0.9 fix truncation
537d8f4

Draken1606 commited on

num_gen=4 for T4 + bump steps to 340
1fa41e5

Draken1606 commited on

Make unsloth import lazy
c1f1ab3

Draken1606 commited on

3-level curriculum + 7B + reward fixes
9868dfb

Draken1606 commited on

Add training evidence: curriculum results, plots (LFS), parse_job_log helper
805b735

Draken1606 commited on

changed model
d745c55

Draken1606 commited on

balanced
5365e54

Draken1606 commited on

fixed mismatches
d1f8afa

Shabista Sehar commited on

improved
3f2e418

Draken1606 commited on

python notebook
da5d6b0

Shabista Sehar commited on

protection against reward hacking improved
9adca2d

Draken1606 commited on

training script
1272145

Shabista Sehar commited on

feat: implement GRPO training script with environment health checks and structured reward functions for bail assessment
46d6990

Draken1606 commited on

----
aa1acaa

Shabista Sehar commited on

feat: implement GRPO training pipeline for bail assessment model and update README credits
472a28c

Draken1606 commited on

feat: implement dataset loader, environment, and GRPO training pipeline for undertrial bail prediction
bf8f1ff

Draken1606 commited on

modified
a085ad1

Shabista Sehar commited on

implemented
d8f8a45

Shabista Sehar commited on

Fix A3 (OOM eval), B9 (NDPS eligibility), B3 (direction-gated computation bonus), A8-pt2 (episode_id case lookup)
4855450

Draken1606 commited on

Fix 8 compliance gaps: repeat-action dedup+cache, min-steps hard block, criminal history tool (12th action), efficiency removed from training formula, circular import cleaned, yaml formula synced
898bc18

Draken1606 commited on

Reward overhaul: add compute_reasoning_quality (anchoring+arithmetic+specificity+consistency), parity-grounds penalty, reduce outcome 40%->30%, add 10% reasoning quality signal
ca62faa

Draken1606 commited on

Fix 5 bugs: inference mode reset, step_counts in curriculum, adapter-only save (x3), DEMO001 false defence claim, episode_id in /reset
37edd09

Draken1606 commited on

import fixed
c1adced

Shabista Sehar commited on

Fix 3 teammate-caught crashes: statutory/bias wrong arg types in trainer, env.state() in WebSocket
04b605d

Draken1606 commited on

Fix 5 audit gaps: conditional bail, action history, efficiency reward, train/val split, env API routing
6218d9a

Draken1606 commited on

Fix 6 vulnerabilities: /state crash, reward clamp, condition reward, XML exploit, tool-skip bypass, timeout enforcement
d76d092

Draken1606 commited on

Apply 3 critical fixes: unify reward with server/reward.py, before/after eval + results.json, generation inspection callback
19bb454

Draken1606 commited on

Fix install URL in training script to point to real HF Space
3417d7e

Draken1606 commited on

first commit
4052d84

Draken1606 commited on