TheJackBright Claude Opus 4.6 commited on
Commit
d5dbfe8
·
1 Parent(s): 5961585

Clamp shaped rewards to non-negative values

Browse files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

backend/src/polypharmacy_env/rewards.py CHANGED
@@ -93,4 +93,4 @@ def compute_shaped_reward(
93
 
94
  # finish_review terminal bonus is added by the caller after grading
95
 
96
- return reward
 
93
 
94
  # finish_review terminal bonus is added by the caller after grading
95
 
96
+ return max(0.0, reward)