Fix API reward clamp to (0.001, 0.999) and update README 1bb11d9 TheJackBright Claude Opus 4.6 commited on about 1 month ago
Enforce strict (0.001, 0.999) bounds on ALL rewards and scores c314a65 TheJackBright Claude Opus 4.6 commited on about 1 month ago
Set minimum shaped reward to 0.001 (strict >0) 3948a09 TheJackBright Claude Opus 4.6 commited on about 1 month ago
Clamp shaped rewards to non-negative values d5dbfe8 TheJackBright Claude Opus 4.6 commited on about 1 month ago
Fix terminal reward to be grader score only (strict 0-1 range) 5961585 TheJackBright Claude Opus 4.6 commited on about 1 month ago
Fix score bounds to (0.001, 0.999) and use HF Router defaults 373c99b TheJackBright Claude Opus 4.6 commited on about 1 month ago
Tighten score bounds to (0.000001, 0.999999) for strict validation 6f37fb0 TheJackBright Claude Opus 4.6 commited on about 1 month ago
Fix grader scores to be strictly within (0, 1) range c5b547b TheJackBright Claude Opus 4.6 commited on about 1 month ago
Version 3: add trained model checkpoints ab786b3 TheJackBright Claude Opus 4.6 commited on about 1 month ago