polypharmacy-env / backend

Commit History

Fix API reward clamp to (0.001, 0.999) and update README
1bb11d9

TheJackBright Claude Opus 4.6 commited on

Enforce strict (0.001, 0.999) bounds on ALL rewards and scores
c314a65

TheJackBright Claude Opus 4.6 commited on

Set minimum shaped reward to 0.001 (strict >0)
3948a09

TheJackBright Claude Opus 4.6 commited on

Clamp shaped rewards to non-negative values
d5dbfe8

TheJackBright Claude Opus 4.6 commited on

Fix terminal reward to be grader score only (strict 0-1 range)
5961585

TheJackBright Claude Opus 4.6 commited on

Fix score bounds to (0.001, 0.999) and use HF Router defaults
373c99b

TheJackBright Claude Opus 4.6 commited on

Tighten score bounds to (0.000001, 0.999999) for strict validation
6f37fb0

TheJackBright Claude Opus 4.6 commited on

Fix grader scores to be strictly within (0, 1) range
c5b547b

TheJackBright Claude Opus 4.6 commited on

Version 3: add trained model checkpoints
ab786b3

TheJackBright Claude Opus 4.6 commited on

Version 3
f0ef01d

TheJackBright Claude Opus 4.6 commited on