Fix API reward clamp to (0.001, 0.999) and update README 1bb11d9 TheJackBright Claude Opus 4.6 commited on 30 days ago
Enforce strict (0.001, 0.999) bounds on ALL rewards and scores c314a65 TheJackBright Claude Opus 4.6 commited on 30 days ago
Set minimum shaped reward to 0.001 (strict >0) 3948a09 TheJackBright Claude Opus 4.6 commited on 30 days ago
Clamp shaped rewards to non-negative values d5dbfe8 TheJackBright Claude Opus 4.6 commited on 30 days ago
Fix terminal reward to be grader score only (strict 0-1 range) 5961585 TheJackBright Claude Opus 4.6 commited on 30 days ago
Fix score bounds to (0.001, 0.999) and use HF Router defaults 373c99b TheJackBright Claude Opus 4.6 commited on 30 days ago
Tighten score bounds to (0.000001, 0.999999) for strict validation 6f37fb0 TheJackBright Claude Opus 4.6 commited on 30 days ago
Fix grader scores to be strictly within (0, 1) range c5b547b TheJackBright Claude Opus 4.6 commited on 30 days ago
Version 3: add trained model checkpoints ab786b3 TheJackBright Claude Opus 4.6 commited on about 1 month ago