Spaces:

Draken1606
/

undertrial-ai

Running

App Files Files Community

undertrial-ai / training

Commit History

Fix all stale 1.5B refs to 7B + LR/beta/completion fixes

7fa6a21

Running

Draken1606 commited on about 16 hours ago

Fix notebook: remove hardcoded 384, use defaults (640, lr=5e-5, beta=0.04)

cb53788

Draken1606 commited on about 16 hours ago

Fix stagnation: LR 5e-5, beta 0.04, default 640

001f0ed

Draken1606 commited on about 17 hours ago

max_completion=640 temp=0.9 fix truncation

537d8f4

Draken1606 commited on 2 days ago

num_gen=4 for T4 + bump steps to 340

1fa41e5

Draken1606 commited on 2 days ago

3-level

2c93c00

Draken1606 commited on 3 days ago

Make unsloth import lazy

c1f1ab3

Draken1606 commited on 3 days ago

3-level curriculum + 7B + reward fixes

9868dfb

Draken1606 commited on 3 days ago

Add training evidence: curriculum results, plots (LFS), parse_job_log helper

805b735

Draken1606 commited on 12 days ago

changed model

d745c55

Draken1606 commited on 12 days ago

balanced

5365e54

Draken1606 commited on 12 days ago

fixed mismatches

d1f8afa

Shabista Sehar commited on 12 days ago

hf job

409ca4d

Draken1606 commited on 12 days ago

improved

3f2e418

Draken1606 commited on 12 days ago

python notebook

da5d6b0

Shabista Sehar commited on 12 days ago

protection against reward hacking improved

9adca2d

Draken1606 commited on 12 days ago

training script

1272145

Shabista Sehar commited on 12 days ago

feat: implement GRPO training script with environment health checks and structured reward functions for bail assessment

46d6990

Draken1606 commited on 12 days ago

----

aa1acaa

Shabista Sehar commited on 13 days ago

feat: implement GRPO training pipeline for bail assessment model and update README credits

472a28c

Draken1606 commited on 13 days ago

feat: implement dataset loader, environment, and GRPO training pipeline for undertrial bail prediction

bf8f1ff

Draken1606 commited on 13 days ago

modified

a085ad1

Shabista Sehar commited on 13 days ago

implemented

d8f8a45

Shabista Sehar commited on 14 days ago

Fix A3 (OOM eval), B9 (NDPS eligibility), B3 (direction-gated computation bonus), A8-pt2 (episode_id case lookup)

4855450

Draken1606 commited on 14 days ago

Fix 8 compliance gaps: repeat-action dedup+cache, min-steps hard block, criminal history tool (12th action), efficiency removed from training formula, circular import cleaned, yaml formula synced

898bc18

Draken1606 commited on 14 days ago

Reward overhaul: add compute_reasoning_quality (anchoring+arithmetic+specificity+consistency), parity-grounds penalty, reduce outcome 40%->30%, add 10% reasoning quality signal

ca62faa

Draken1606 commited on 14 days ago

Fix 5 bugs: inference mode reset, step_counts in curriculum, adapter-only save (x3), DEMO001 false defence claim, episode_id in /reset

37edd09

Draken1606 commited on 14 days ago

import fixed

c1adced

Shabista Sehar commited on 14 days ago

Fix 3 teammate-caught crashes: statutory/bias wrong arg types in trainer, env.state() in WebSocket

04b605d

Draken1606 commited on 14 days ago

Fix 5 audit gaps: conditional bail, action history, efficiency reward, train/val split, env API routing

6218d9a

Draken1606 commited on 15 days ago

Fix 6 vulnerabilities: /state crash, reward clamp, condition reward, XML exploit, tool-skip bypass, timeout enforcement

d76d092

Draken1606 commited on 15 days ago

Apply 3 critical fixes: unify reward with server/reward.py, before/after eval + results.json, generation inspection callback

19bb454

Draken1606 commited on 15 days ago

Fix install URL in training script to point to real HF Space

3417d7e

Draken1606 commited on 15 days ago

first commit

4052d84

Draken1606 commited on 15 days ago