samrat-rm commited on
Commit
89b370c
·
1 Parent(s): f58e721

fix: implementing strict prompt conditions for scores/reward to be in 0.0–1.0 range

Browse files
Files changed (1) hide show
  1. openenv.yaml +4 -1
openenv.yaml CHANGED
@@ -37,6 +37,7 @@ tasks:
37
  - 0.30: Wrong label but description matches a related concept
38
  - 0.15: Wrong failure mode, no diagnosis submitted, or empty response
39
  - If in doubt, return 0.15. Only return one of the four values listed above.
 
40
 
41
  - id: task_medium
42
  difficulty: medium
@@ -69,6 +70,7 @@ tasks:
69
  - 0.30: Wrong label but description matches a related concept
70
  - 0.15: Wrong failure mode, no diagnosis submitted, or empty response
71
  - If in doubt, return 0.15. Only return one of the four values listed above.
 
72
 
73
  - id: task_hard
74
  difficulty: hard
@@ -103,4 +105,5 @@ tasks:
103
  - 0.50: Correct failure mode but fix is vague, wrong, or missing
104
  - 0.20: Wrong failure mode but fix is incidentally relevant
105
  - 0.15: Wrong failure mode, no useful fix, no diagnosis submitted, or empty response
106
- - If in doubt, return 0.15. Only return one of the five values listed above.
 
 
37
  - 0.30: Wrong label but description matches a related concept
38
  - 0.15: Wrong failure mode, no diagnosis submitted, or empty response
39
  - If in doubt, return 0.15. Only return one of the four values listed above.
40
+ - CRITICAL: Never return 0, 0.0, 1, or 1.0. These are invalid scores.
41
 
42
  - id: task_medium
43
  difficulty: medium
 
70
  - 0.30: Wrong label but description matches a related concept
71
  - 0.15: Wrong failure mode, no diagnosis submitted, or empty response
72
  - If in doubt, return 0.15. Only return one of the four values listed above.
73
+ - CRITICAL: Never return 0, 0.0, 1, or 1.0. These are invalid scores.
74
 
75
  - id: task_hard
76
  difficulty: hard
 
105
  - 0.50: Correct failure mode but fix is vague, wrong, or missing
106
  - 0.20: Wrong failure mode but fix is incidentally relevant
107
  - 0.15: Wrong failure mode, no useful fix, no diagnosis submitted, or empty response
108
+ - If in doubt, return 0.15. Only return one of the five values listed above.
109
+ - CRITICAL: Never return 0, 0.0, 1, or 1.0. These are invalid scores.