Spaces:

Mihir1107
/

DateSelectEnv

Sleeping

Mihir1107 Claude Sonnet 4.6 commited on Apr 7

Commit

1b9f5b0

1 Parent(s): 89b4daf

Fix log_end: clamp score and use 4dp to avoid score=0.00 in [END] log

:.2f format would print 0.00 for any score < 0.005 (e.g. 0.001 after
the server-side clamp). If validator parses the [END] log for the score
value, 0.00 is treated as exactly 0 which fails the strictly-(0,1) check.
Added client-side max(0.001, min(0.999, score)) + :.4f to be safe.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Files changed (1) hide show

inference.py +4 -2

inference.py CHANGED Viewed

@@ -99,10 +99,12 @@ def log_step(step: int, action: dict, reward: float, done: bool,
 def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
-    rewards_str = ",".join(f"{r:.2f}" for r in rewards)
     print(
         f"[END] success={str(success).lower()} steps={steps} "
-        f"score={score:.2f} rewards={rewards_str}",
         flush=True,
     )

 def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
+    # Clamp score to (0.001, 0.999) strictly — validator rejects exact 0.0 or 1.0
+    score = max(0.001, min(0.999, score))
+    rewards_str = ",".join(f"{r:.4f}" for r in rewards)
     print(
         f"[END] success={str(success).lower()} steps={steps} "
+        f"score={score:.4f} rewards={rewards_str}",
         flush=True,
     )