Amogh-kal1 commited on
Commit
3f12d92
·
verified ·
1 Parent(s): 7ac53ec

Upload folder using huggingface_hub

Browse files
TASK_IMPROVEMENTS_SUMMARY.md ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Task Improvements Summary
2
+
3
+ ## Overview
4
+
5
+ This document summarizes the improvements made to the WhipStudio ML debugging environment tasks and graders.
6
+
7
+ ## Key Issues Fixed
8
+
9
+ ### 1. Unstable Datasets (Tasks 1 & 2)
10
+ **Problem**: Tasks were generating random data inside training loops, making loss values non-deterministic and graders unreliable.
11
+
12
+ **Solution**:
13
+ - Fixed datasets with deterministic seeds (`torch.manual_seed()`)
14
+ - Clear train/validation splits
15
+ - Learnable patterns (e.g., `y = (X[:, 0] > 0).long()`)
16
+
17
+ ### 2. Gameable Graders
18
+ **Problem**: High learning rates (e.g., lr=1000) could get full scores by producing low loss values despite unstable training.
19
+
20
+ **Solution**:
21
+ - Added **loss spike detection** in Task 1 grader
22
+ - If `max_loss > initial_loss * 5.0` or `max_loss > 10.0`, submission is penalized
23
+ - Partial fixes with bad LR get capped at 0.2 score
24
+
25
+ ### 3. Inverted Scoring Logic
26
+ **Problem**: The `sigmoid_reward()` function had confusing `invert` parameter that caused inverted scoring (low F1 → high score).
27
+
28
+ **Solution**:
29
+ - Created new `sigmoid_score(value, center, steepness, higher_is_better)` function
30
+ - Clear semantics: `higher_is_better=True` rewards values above center
31
+
32
+ ### 4. Task-Specific Validation
33
+ **Problem**: Generic validation rejected valid submissions (e.g., Task 5 required loops but single forward pass was valid).
34
+
35
+ **Solution**:
36
+ - `is_valid_submission(code, stdout, exit_code, task_id)` now takes task_id
37
+ - Task-specific validation rules
38
+
39
+ ## Task Details
40
+
41
+ ### Task 1: Broken Training Loop
42
+ - **Bugs**: `lr=10.0`, `step()` before `backward()`
43
+ - **Buggy score**: ~0.003
44
+ - **Fixed score**: ~0.74
45
+ - **Spike detection**: Penalizes unstable training (score capped at 0.2)
46
+
47
+ ### Task 2: NaN Loss
48
+ - **Bug**: `torch.log(pred)` when pred can be 0.0
49
+ - **Fix**: Increased buggy LR to 0.5 to actually trigger NaN
50
+ - **Buggy score**: ~0.16 (has NaN values)
51
+ - **Fixed score**: ~0.83
52
+
53
+ ### Task 3: Label Inversion
54
+ - **Bug**: `criterion(out, 1 - yb)` inverts labels
55
+ - **Buggy score**: ~0.34 (accuracy ~5%)
56
+ - **Fixed score**: ~0.80 (accuracy ~95%)
57
+
58
+ ### Task 4: Wrong Loss (Multi-label)
59
+ - **Bug**: Using `CrossEntropyLoss` instead of `BCEWithLogitsLoss`
60
+ - **Buggy score**: ~0.74 (F1 ~0.73)
61
+ - **Fixed score**: ~0.97 (F1 = 1.0)
62
+
63
+ ### Task 5: Frozen Backbone
64
+ - **Bug**: Backbone frozen but still passed to optimizer
65
+ - **Two valid fixes**:
66
+ 1. Unfreeze backbone (grad_norm > 0)
67
+ 2. Only pass head params (param_count < 100k)
68
+ - **Added**: `OPTIMIZER_PARAM_COUNT` metric for grading
69
+ - **Buggy score**: ~0.18
70
+ - **Fixed score**: ~0.88
71
+
72
+ ## Grading Structure
73
+
74
+ All graders follow a consistent pattern:
75
+ ```python
76
+ # Primary metric (50-60% weight)
77
+ primary_score = sigmoid_score(metric, center, steepness, higher_is_better) * weight
78
+
79
+ # Secondary metrics (30% weight)
80
+ secondary_score = ...
81
+
82
+ # Bonus conditions (10-20%)
83
+ bonus = ...
84
+
85
+ final_score = min(1.0, primary_score + secondary_score + bonus)
86
+ ```
87
+
88
+ ## Testing Results
89
+
90
+ | Task | Buggy Score | Fixed Score | Discrimination |
91
+ |------|-------------|-------------|----------------|
92
+ | 1 | 0.003 | 0.739 | ✅ Excellent |
93
+ | 2 | 0.157 | 0.827 | ✅ Excellent |
94
+ | 3 | 0.344 | 0.804 | ✅ Excellent |
95
+ | 4 | 0.735 | 0.966 | ✅ Good |
96
+ | 5 | 0.179 | 0.879 | ✅ Excellent |
97
+
98
+ ## Files Modified
99
+
100
+ - `server/tasks/task1_broken_loop.py` - Fixed dataset, learnable pattern
101
+ - `server/tasks/task2_nan_loss.py` - Increased LR to trigger NaN bug
102
+ - `server/tasks/task3_oom_leakage.py` - Redesigned with label inversion bug
103
+ - `server/tasks/task5_frozen_backbone.py` - Added OPTIMIZER_PARAM_COUNT metric
104
+ - `server/tasks/graders.py` - Complete rewrite with proper scoring logic
server/tasks/graders.py CHANGED
@@ -45,18 +45,24 @@ def parse_val_accs(stdout: str) -> list[float]:
45
 
46
  def parse_scalar(stdout: str, key: str) -> float | None:
47
  stdout = extract_metrics_block(stdout)
48
- match = re.search(rf"{key}:([-\d.]+)", stdout)
49
  return float(match.group(1)) if match else None
50
 
51
 
52
- def is_valid_submission(code: str, stdout: str, exit_code: int) -> tuple[bool, str]:
 
53
  if exit_code == 0:
54
- if "LOSSES:" not in stdout and "FINAL_LOSS:" not in stdout:
55
  return False, "No valid metrics output detected"
56
  if "LOSSES:" in stdout:
57
  losses = parse_losses(stdout)
58
  if len(losses) < 5:
59
  return False, "Fewer than 5 loss values parsed"
 
 
 
 
 
60
  try:
61
  tree = ast.parse(code)
62
  if not any(isinstance(node, (ast.For, ast.While)) for node in ast.walk(tree)):
@@ -66,88 +72,191 @@ def is_valid_submission(code: str, stdout: str, exit_code: int) -> tuple[bool, s
66
  return True, ""
67
 
68
 
69
- def sigmoid_reward(value: float, center: float, steepness: float, invert: bool = False) -> float:
 
 
 
 
 
 
 
 
 
 
 
 
70
  try:
71
- if invert:
72
  x = steepness * (value - center)
73
  else:
74
  x = steepness * (center - value)
75
  return round(1.0 / (1.0 + math.exp(-x)), 4)
76
  except OverflowError:
77
- return 0.0 if (invert and value > center) or (not invert and value < center) else 1.0
 
 
 
 
 
 
 
 
 
78
 
79
 
80
  def grade_task1(result: RunResult) -> tuple[float, dict]:
81
- valid, reason = is_valid_submission(result.fixed_code, result.stdout, result.exit_code)
 
 
 
 
 
 
 
 
 
 
82
  if not valid:
83
  return 0.0, {"reason": reason}
84
 
85
  if result.timed_out:
86
  return 0.05, {"reason": "timed_out"}
87
  if result.exit_code != 0:
88
- return 0.0, {"reason": "crash"}
89
 
90
  losses = parse_losses(result.stdout)
91
  if not losses:
92
  return 0.1, {"reason": "no_losses_parsed"}
93
- if any(math.isnan(loss) or math.isinf(loss) for loss in losses):
94
- return 0.15, {"reason": "nan_inf_found"}
95
-
96
- final = losses[-1]
97
- base_score = sigmoid_reward(final, center=0.75, steepness=3.0, invert=True)
98
-
99
- bonus = 0.0
100
- half = len(losses) // 2
101
- if half > 0:
102
- first_half = sum(losses[:half]) / half
103
- second_half = sum(losses[half:]) / len(losses[half:])
104
- if second_half < 0.85 * first_half:
105
- bonus = 0.1
106
-
107
- final_score = min(1.0, base_score + bonus)
108
- breakdown = {"base_score": base_score, "monotonicity_bonus": bonus}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
109
  return final_score, breakdown
110
 
111
 
112
  def grade_task2(result: RunResult) -> tuple[float, dict]:
113
- valid, reason = is_valid_submission(result.fixed_code, result.stdout, result.exit_code)
 
 
 
 
 
 
 
 
 
114
  if not valid:
115
  return 0.0, {"reason": reason}
116
 
117
  if result.timed_out:
118
  return 0.05, {"reason": "timed_out"}
119
  if result.exit_code != 0:
120
- return 0.0, {"reason": "crash"}
121
 
122
  losses = parse_losses(result.stdout)
123
  if not losses or len(losses) < 30:
124
  return 0.1, {"reason": "too_few_losses"}
125
 
126
  nan_count = sum(1 for loss in losses if math.isnan(loss) or math.isinf(loss))
127
- if nan_count == len(losses):
128
- return 0.0, {"reason": "all_nans"}
129
-
130
  nan_ratio = nan_count / len(losses)
 
 
 
 
 
 
 
 
 
 
 
 
131
  finite_losses = [loss for loss in losses if not math.isnan(loss) and not math.isinf(loss)]
132
- final_finite_loss = finite_losses[-1] if finite_losses else float('inf')
133
-
134
- convergence_score = sigmoid_reward(final_finite_loss, center=0.5, steepness=4.0, invert=True)
135
- convergence_score *= (1.0 - nan_ratio)
136
-
137
- stability_bonus = 0.0
138
- if len(finite_losses) >= 20:
139
- tail = finite_losses[-20:]
140
- mean_tail = sum(tail) / len(tail)
141
- tail_variance = sum((x - mean_tail) ** 2 for x in tail) / len(tail)
142
- stability_bonus = sigmoid_reward(tail_variance, center=0.01, steepness=200.0, invert=True) * 0.1
143
-
144
- final_score = min(1.0, convergence_score + stability_bonus)
145
- breakdown = {"convergence_score": convergence_score, "nan_penalty": (1.0 - nan_ratio), "stability_bonus": stability_bonus, "nan_ratio": nan_ratio}
 
 
 
 
 
 
146
  return final_score, breakdown
147
 
148
 
149
  def grade_task3(result: RunResult) -> tuple[float, dict]:
150
- valid, reason = is_valid_submission(result.fixed_code, result.stdout, result.exit_code)
 
 
 
 
 
 
 
 
 
 
151
  if not valid:
152
  return 0.0, {"reason": reason}
153
 
@@ -155,35 +264,66 @@ def grade_task3(result: RunResult) -> tuple[float, dict]:
155
  return 0.1, {"reason": "timed_out"}
156
 
157
  if result.exit_code != 0:
158
- if "out of memory" in result.stderr.lower():
159
  return 0.1, {"reason": "oom"}
160
- return 0.0, {"reason": "crash"}
161
 
162
  val_accs = parse_val_accs(result.stdout)
163
  final_loss_val = parse_scalar(result.stdout, "FINAL_LOSS")
164
 
 
 
165
  memory_score = 0.0
166
  if final_loss_val is not None:
167
- memory_score = sigmoid_reward(final_loss_val, center=50.0, steepness=0.05, invert=True) * 0.5
 
 
168
 
169
- leakage_score = 0.0
170
- early_acc = 0.0
 
171
  final_acc = 0.0
 
 
 
172
  if val_accs and len(val_accs) >= 2:
173
- early_acc = sum(val_accs[:2]) / 2.0
174
  final_acc = val_accs[-1]
175
 
176
- leak_p1 = sigmoid_reward(early_acc, center=0.75, steepness=20.0, invert=True) * 0.3
177
- leak_p2 = sigmoid_reward(final_acc, center=0.68, steepness=15.0, invert=False) * 0.7
178
- leakage_score = (leak_p1 + leak_p2) * 0.5
179
-
180
- final_score = min(1.0, memory_score + leakage_score)
181
- breakdown = {"memory_score": memory_score, "leakage_score": leakage_score, "early_acc": early_acc, "final_acc": final_acc}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
182
  return final_score, breakdown
183
 
184
 
185
  def grade_task4(result: RunResult) -> tuple[float, dict]:
186
- valid, reason = is_valid_submission(result.fixed_code, result.stdout, result.exit_code)
 
 
 
 
 
 
 
 
 
187
  if not valid:
188
  return 0.0, {"reason": reason}
189
 
@@ -191,31 +331,64 @@ def grade_task4(result: RunResult) -> tuple[float, dict]:
191
  return 0.1, {"reason": "timed_out"}
192
 
193
  if result.exit_code != 0:
194
- return 0.0, {"reason": "crash"}
195
 
196
  final_loss = parse_scalar(result.stdout, "FINAL_LOSS")
197
  avg_labels = parse_scalar(result.stdout, "AVG_LABELS")
198
  f1 = parse_scalar(result.stdout, "F1_SCORE")
199
 
200
- loss_score = 0.0
201
- if final_loss is not None:
202
- loss_score = sigmoid_reward(final_loss, center=0.5, steepness=4.0, invert=True) * 0.3
203
-
 
 
 
204
  labels_score = 0.0
205
  if avg_labels is not None:
206
- labels_score = sigmoid_reward(avg_labels, center=1.0, steepness=5.0, invert=False) * 0.3
207
-
208
- f1_s = 0.0
209
- if f1 is not None:
210
- f1_s = sigmoid_reward(f1, center=0.6, steepness=10.0, invert=False) * 0.4
 
 
 
 
211
 
212
- final_score = min(1.0, loss_score + labels_score + f1_s)
213
- breakdown = {"loss_score": loss_score, "labels_score": labels_score, "f1_score": f1_s}
 
 
 
 
 
 
 
 
 
 
 
 
214
  return final_score, breakdown
215
 
216
 
217
  def grade_task5(result: RunResult) -> tuple[float, dict]:
218
- valid, reason = is_valid_submission(result.fixed_code, result.stdout, result.exit_code)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
219
  if not valid:
220
  return 0.0, {"reason": reason}
221
 
@@ -223,21 +396,46 @@ def grade_task5(result: RunResult) -> tuple[float, dict]:
223
  return 0.1, {"reason": "timed_out"}
224
 
225
  if result.exit_code != 0:
226
- return 0.0, {"reason": "crash"}
227
 
228
  final_loss = parse_scalar(result.stdout, "FINAL_LOSS")
229
  grad_norm = parse_scalar(result.stdout, "BACKBONE_GRAD_NORM")
 
230
 
 
231
  loss_score = 0.0
232
  if final_loss is not None:
233
- loss_score = sigmoid_reward(final_loss, center=2.2, steepness=3.0, invert=True) * 0.5
234
-
235
- grad_score = 0.0
236
- if grad_norm is not None:
237
- grad_score = sigmoid_reward(grad_norm, center=0.001, steepness=1000.0, invert=False) * 0.5
238
-
239
- final_score = min(1.0, loss_score + grad_score)
240
- breakdown = {"loss_score": loss_score, "grad_score": grad_score}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
241
  return final_score, breakdown
242
 
243
 
 
45
 
46
  def parse_scalar(stdout: str, key: str) -> float | None:
47
  stdout = extract_metrics_block(stdout)
48
+ match = re.search(rf"{key}:([-\d.eE+]+)", stdout)
49
  return float(match.group(1)) if match else None
50
 
51
 
52
+ def is_valid_submission(code: str, stdout: str, exit_code: int, task_id: str = "") -> tuple[bool, str]:
53
+ """Validate submission with task-specific rules."""
54
  if exit_code == 0:
55
+ if "LOSSES:" not in stdout and "FINAL_LOSS:" not in stdout and "VAL_ACCS:" not in stdout:
56
  return False, "No valid metrics output detected"
57
  if "LOSSES:" in stdout:
58
  losses = parse_losses(stdout)
59
  if len(losses) < 5:
60
  return False, "Fewer than 5 loss values parsed"
61
+
62
+ # Task 5 doesn't require a loop - it's a single forward/backward pass
63
+ if task_id == "task5":
64
+ return True, ""
65
+
66
  try:
67
  tree = ast.parse(code)
68
  if not any(isinstance(node, (ast.For, ast.While)) for node in ast.walk(tree)):
 
72
  return True, ""
73
 
74
 
75
+ def sigmoid_score(value: float, center: float, steepness: float, higher_is_better: bool = True) -> float:
76
+ """
77
+ Compute sigmoid-based score.
78
+
79
+ Args:
80
+ value: The metric value to score
81
+ center: The center point of the sigmoid (value at which score = 0.5)
82
+ steepness: How quickly the score transitions around the center
83
+ higher_is_better: If True, reward values > center. If False, reward values < center.
84
+
85
+ Returns:
86
+ Score between 0.0 and 1.0
87
+ """
88
  try:
89
+ if higher_is_better:
90
  x = steepness * (value - center)
91
  else:
92
  x = steepness * (center - value)
93
  return round(1.0 / (1.0 + math.exp(-x)), 4)
94
  except OverflowError:
95
+ if higher_is_better:
96
+ return 1.0 if value > center else 0.0
97
+ else:
98
+ return 1.0 if value < center else 0.0
99
+
100
+
101
+ # Keep old function for backwards compatibility but mark deprecated
102
+ def sigmoid_reward(value: float, center: float, steepness: float, invert: bool = False) -> float:
103
+ """Deprecated: Use sigmoid_score with higher_is_better parameter instead."""
104
+ return sigmoid_score(value, center, steepness, higher_is_better=invert)
105
 
106
 
107
  def grade_task1(result: RunResult) -> tuple[float, dict]:
108
+ """
109
+ Task 1: Broken Training Loop
110
+ Bugs: 1) lr=10.0 (too high), 2) step() before backward()
111
+
112
+ Grading criteria:
113
+ - Must have low final loss (<0.3) - indicates proper training
114
+ - Must have high validation accuracy (>0.85) - indicates learning
115
+ - Must show monotonic improvement - indicates proper gradient flow
116
+ - Must NOT have loss spikes - indicates stable training
117
+ """
118
+ valid, reason = is_valid_submission(result.fixed_code, result.stdout, result.exit_code, "task1")
119
  if not valid:
120
  return 0.0, {"reason": reason}
121
 
122
  if result.timed_out:
123
  return 0.05, {"reason": "timed_out"}
124
  if result.exit_code != 0:
125
+ return 0.0, {"reason": "crash", "stderr": result.stderr[:500]}
126
 
127
  losses = parse_losses(result.stdout)
128
  if not losses:
129
  return 0.1, {"reason": "no_losses_parsed"}
130
+
131
+ # Check for NaN/Inf - indicates numerical instability
132
+ nan_count = sum(1 for loss in losses if math.isnan(loss) or math.isinf(loss))
133
+ if nan_count > 0:
134
+ return 0.15, {"reason": "nan_inf_found", "nan_count": nan_count}
135
+
136
+ val_acc = parse_scalar(result.stdout, "VAL_ACC")
137
+ if val_acc is None:
138
+ return 0.1, {"reason": "no_val_acc"}
139
+
140
+ final_loss = losses[-1]
141
+ initial_loss = losses[0]
142
+ max_loss = max(losses)
143
+
144
+ # Check for loss instability (spikes indicate LR too high)
145
+ # Healthy training shouldn't have losses > 5x initial loss
146
+ if max_loss > initial_loss * 5.0 or max_loss > 10.0:
147
+ return 0.2, {
148
+ "reason": "loss_unstable_spikes",
149
+ "max_loss": max_loss,
150
+ "final_loss": final_loss,
151
+ "val_acc": val_acc
152
+ }
153
+
154
+ # Check for loss explosion at end
155
+ if final_loss > 5.0:
156
+ return 0.15, {"reason": "loss_unstable", "final_loss": final_loss, "val_acc": val_acc}
157
+
158
+ # Primary: Validation accuracy (higher is better, target > 0.85)
159
+ acc_score = sigmoid_score(val_acc, center=0.85, steepness=15.0, higher_is_better=True) * 0.5
160
+
161
+ # Secondary: Final loss should be low (lower is better, target < 0.3)
162
+ loss_score = sigmoid_score(final_loss, center=0.3, steepness=8.0, higher_is_better=False) * 0.3
163
+
164
+ # Bonus: Monotonic improvement (loss should decrease over time)
165
+ monotonic_bonus = 0.0
166
+ if len(losses) >= 10:
167
+ first_quarter = sum(losses[:len(losses)//4]) / (len(losses)//4)
168
+ last_quarter = sum(losses[-len(losses)//4:]) / (len(losses)//4)
169
+ if last_quarter < first_quarter * 0.7: # At least 30% improvement
170
+ monotonic_bonus = 0.2
171
+
172
+ final_score = min(1.0, acc_score + loss_score + monotonic_bonus)
173
+ breakdown = {
174
+ "acc_score": round(acc_score, 4),
175
+ "loss_score": round(loss_score, 4),
176
+ "monotonic_bonus": monotonic_bonus,
177
+ "val_acc": val_acc,
178
+ "final_loss": final_loss,
179
+ "initial_loss": initial_loss,
180
+ "max_loss": max_loss
181
+ }
182
  return final_score, breakdown
183
 
184
 
185
  def grade_task2(result: RunResult) -> tuple[float, dict]:
186
+ """
187
+ Task 2: NaN Loss
188
+ Bug: torch.log(pred) when pred can be 0.0 after sigmoid
189
+
190
+ Grading criteria:
191
+ - Must have NO NaN/Inf losses - this is the primary test
192
+ - Must have good validation accuracy (>0.75)
193
+ - Must show loss convergence (<0.4)
194
+ """
195
+ valid, reason = is_valid_submission(result.fixed_code, result.stdout, result.exit_code, "task2")
196
  if not valid:
197
  return 0.0, {"reason": reason}
198
 
199
  if result.timed_out:
200
  return 0.05, {"reason": "timed_out"}
201
  if result.exit_code != 0:
202
+ return 0.0, {"reason": "crash", "stderr": result.stderr[:500]}
203
 
204
  losses = parse_losses(result.stdout)
205
  if not losses or len(losses) < 30:
206
  return 0.1, {"reason": "too_few_losses"}
207
 
208
  nan_count = sum(1 for loss in losses if math.isnan(loss) or math.isinf(loss))
209
+
210
+ # Primary criterion: NO NaN/Inf allowed - this is the core bug being tested
 
211
  nan_ratio = nan_count / len(losses)
212
+ if nan_count > 0:
213
+ # Heavily penalize any NaN - this is THE bug we're testing
214
+ return max(0.05, 0.3 * (1.0 - nan_ratio)), {
215
+ "reason": "has_nans",
216
+ "nan_ratio": nan_ratio,
217
+ "nan_count": nan_count
218
+ }
219
+
220
+ val_acc = parse_scalar(result.stdout, "VAL_ACC")
221
+ if val_acc is None:
222
+ return 0.2, {"reason": "no_val_acc_but_no_nans"}
223
+
224
  finite_losses = [loss for loss in losses if not math.isnan(loss) and not math.isinf(loss)]
225
+ final_loss = finite_losses[-1] if finite_losses else float('inf')
226
+
227
+ # No NaN = base score of 0.4 (the bug is fixed)
228
+ base_score = 0.4
229
+
230
+ # Validation accuracy bonus (higher is better, target > 0.75)
231
+ acc_score = sigmoid_score(val_acc, center=0.75, steepness=12.0, higher_is_better=True) * 0.35
232
+
233
+ # Convergence bonus (lower is better, target < 0.4)
234
+ convergence_score = sigmoid_score(final_loss, center=0.4, steepness=6.0, higher_is_better=False) * 0.25
235
+
236
+ final_score = min(1.0, base_score + acc_score + convergence_score)
237
+ breakdown = {
238
+ "base_score": base_score,
239
+ "acc_score": round(acc_score, 4),
240
+ "convergence_score": round(convergence_score, 4),
241
+ "nan_count": nan_count,
242
+ "val_acc": val_acc,
243
+ "final_loss": final_loss
244
+ }
245
  return final_score, breakdown
246
 
247
 
248
  def grade_task3(result: RunResult) -> tuple[float, dict]:
249
+ """
250
+ Task 3: Memory Leak + Missing zero_grad
251
+ Bugs: 1) total_loss += loss retains graph (memory leak)
252
+ 2) Missing optimizer.zero_grad() causes gradient accumulation
253
+
254
+ Grading criteria:
255
+ - FINAL_LOSS should be reasonable (<20) - memory leak fixed
256
+ - VAL_ACC should be high (>0.8) - gradient accumulation fixed
257
+ - Learning trajectory should improve over epochs
258
+ """
259
+ valid, reason = is_valid_submission(result.fixed_code, result.stdout, result.exit_code, "task3")
260
  if not valid:
261
  return 0.0, {"reason": reason}
262
 
 
264
  return 0.1, {"reason": "timed_out"}
265
 
266
  if result.exit_code != 0:
267
+ if "out of memory" in result.stderr.lower() or "oom" in result.stderr.lower():
268
  return 0.1, {"reason": "oom"}
269
+ return 0.0, {"reason": "crash", "stderr": result.stderr[:500]}
270
 
271
  val_accs = parse_val_accs(result.stdout)
272
  final_loss_val = parse_scalar(result.stdout, "FINAL_LOSS")
273
 
274
+ # Memory leak check: FINAL_LOSS should be reasonable
275
+ # With .item(), total_loss is sum of scalars (~12-20 for 20 epochs)
276
  memory_score = 0.0
277
  if final_loss_val is not None:
278
+ memory_score = sigmoid_score(final_loss_val, center=20.0, steepness=0.2, higher_is_better=False) * 0.35
279
+ else:
280
+ memory_score = 0.0
281
 
282
+ # Gradient accumulation check: accuracy should be high if training properly
283
+ # Without zero_grad(), gradients accumulate and training degrades
284
+ acc_score = 0.0
285
  final_acc = 0.0
286
+ early_acc = 0.0
287
+ trajectory_bonus = 0.0
288
+
289
  if val_accs and len(val_accs) >= 2:
290
+ early_acc = sum(val_accs[:3]) / min(3, len(val_accs))
291
  final_acc = val_accs[-1]
292
 
293
+ # Final accuracy is the main indicator of correct training
294
+ acc_score = sigmoid_score(final_acc, center=0.8, steepness=15.0, higher_is_better=True) * 0.45
295
+
296
+ # Learning trajectory: should improve over time
297
+ if len(val_accs) >= 5:
298
+ improvement = final_acc - early_acc
299
+ if improvement > 0.05:
300
+ trajectory_bonus = 0.1
301
+ elif improvement > 0.0:
302
+ trajectory_bonus = 0.05
303
+
304
+ final_score = min(1.0, memory_score + acc_score + trajectory_bonus)
305
+ breakdown = {
306
+ "memory_score": round(memory_score, 4),
307
+ "acc_score": round(acc_score, 4),
308
+ "trajectory_bonus": round(trajectory_bonus, 4),
309
+ "early_acc": round(early_acc, 4),
310
+ "final_acc": round(final_acc, 4),
311
+ "final_loss": final_loss_val
312
+ }
313
  return final_score, breakdown
314
 
315
 
316
  def grade_task4(result: RunResult) -> tuple[float, dict]:
317
+ """
318
+ Task 4: Wrong Loss (Multi-label Classification)
319
+ Bug: Using CrossEntropyLoss instead of BCEWithLogitsLoss for multi-label
320
+
321
+ Grading criteria:
322
+ - F1 score should be high (> 0.6) - primary metric
323
+ - avg_labels should be > 1.0 (proper multi-label output)
324
+ - Loss should converge
325
+ """
326
+ valid, reason = is_valid_submission(result.fixed_code, result.stdout, result.exit_code, "task4")
327
  if not valid:
328
  return 0.0, {"reason": reason}
329
 
 
331
  return 0.1, {"reason": "timed_out"}
332
 
333
  if result.exit_code != 0:
334
+ return 0.0, {"reason": "crash", "stderr": result.stderr[:500]}
335
 
336
  final_loss = parse_scalar(result.stdout, "FINAL_LOSS")
337
  avg_labels = parse_scalar(result.stdout, "AVG_LABELS")
338
  f1 = parse_scalar(result.stdout, "F1_SCORE")
339
 
340
+ # F1 score - PRIMARY metric (higher is better, target > 0.6)
341
+ f1_score_val = 0.0
342
+ if f1 is not None:
343
+ f1_score_val = sigmoid_score(f1, center=0.6, steepness=10.0, higher_is_better=True) * 0.5
344
+
345
+ # Multi-label check: avg_labels should be > 1.0 (proper multi-label predictions)
346
+ # With 30% probability per class and 5 classes, expected avg ~1.5 labels/sample
347
  labels_score = 0.0
348
  if avg_labels is not None:
349
+ if avg_labels < 0.5:
350
+ # Way too few labels - likely single-label behavior
351
+ labels_score = 0.0
352
+ elif avg_labels >= 1.0:
353
+ # Good - multiple labels per sample
354
+ labels_score = 0.3
355
+ else:
356
+ # Partial credit
357
+ labels_score = sigmoid_score(avg_labels, center=1.0, steepness=5.0, higher_is_better=True) * 0.3
358
 
359
+ # Loss convergence (lower is better, target < 0.5)
360
+ loss_score = 0.0
361
+ if final_loss is not None:
362
+ loss_score = sigmoid_score(final_loss, center=0.5, steepness=4.0, higher_is_better=False) * 0.2
363
+
364
+ final_score = min(1.0, f1_score_val + labels_score + loss_score)
365
+ breakdown = {
366
+ "f1_score": round(f1_score_val, 4),
367
+ "labels_score": round(labels_score, 4),
368
+ "loss_score": round(loss_score, 4),
369
+ "avg_labels": avg_labels,
370
+ "f1": f1,
371
+ "final_loss": final_loss
372
+ }
373
  return final_score, breakdown
374
 
375
 
376
  def grade_task5(result: RunResult) -> tuple[float, dict]:
377
+ """
378
+ Task 5: Frozen Backbone with Optimizer Waste
379
+ Bug: Backbone is frozen but still passed to optimizer (wastes memory)
380
+
381
+ Valid fixes:
382
+ 1. Unfreeze backbone -> grad_norm > 0, same param count
383
+ 2. Only pass head params to optimizer -> grad_norm = 0, reduced param count
384
+
385
+ The buggy code has: grad_norm = 0, param_count = 530442 (full model)
386
+
387
+ Grading criteria:
388
+ - Either backbone has gradients (unfrozen), OR
389
+ - Optimizer param count is reduced (only head)
390
+ """
391
+ valid, reason = is_valid_submission(result.fixed_code, result.stdout, result.exit_code, "task5")
392
  if not valid:
393
  return 0.0, {"reason": reason}
394
 
 
396
  return 0.1, {"reason": "timed_out"}
397
 
398
  if result.exit_code != 0:
399
+ return 0.0, {"reason": "crash", "stderr": result.stderr[:500]}
400
 
401
  final_loss = parse_scalar(result.stdout, "FINAL_LOSS")
402
  grad_norm = parse_scalar(result.stdout, "BACKBONE_GRAD_NORM")
403
+ param_count = parse_scalar(result.stdout, "OPTIMIZER_PARAM_COUNT")
404
 
405
+ # Loss should be reasonable (10-class classification, CE loss)
406
  loss_score = 0.0
407
  if final_loss is not None:
408
+ loss_score = sigmoid_score(final_loss, center=2.5, steepness=2.0, higher_is_better=False) * 0.3
409
+
410
+ # The bug: frozen backbone (grad_norm=0) but full params in optimizer (param_count=530442)
411
+ # Fix 1: Unfreeze -> grad_norm > 0 (any amount)
412
+ # Fix 2: Only head -> param_count < 100000 (head has ~5130 params)
413
+
414
+ fix_score = 0.0
415
+ fix_type = "none"
416
+
417
+ if grad_norm is not None and grad_norm > 0.1:
418
+ # Backbone is unfrozen and training
419
+ fix_score = 0.7
420
+ fix_type = "unfrozen"
421
+ elif param_count is not None and param_count < 100000:
422
+ # Only head params in optimizer (head has ~5130 params)
423
+ fix_score = 0.7
424
+ fix_type = "head_only"
425
+ elif grad_norm is not None and grad_norm == 0.0 and (param_count is None or param_count > 100000):
426
+ # Buggy state: frozen backbone but full params in optimizer
427
+ fix_score = 0.0
428
+ fix_type = "buggy"
429
+
430
+ final_score = min(1.0, loss_score + fix_score)
431
+ breakdown = {
432
+ "loss_score": round(loss_score, 4),
433
+ "fix_score": round(fix_score, 4),
434
+ "fix_type": fix_type,
435
+ "grad_norm": grad_norm,
436
+ "param_count": param_count,
437
+ "final_loss": final_loss
438
+ }
439
  return final_score, breakdown
440
 
441
 
server/tasks/task1_broken_loop.py CHANGED
@@ -1,29 +1,52 @@
1
  TASK_DESCRIPTION = """
2
  This 2-class linear classifier training loop has bugs preventing convergence.
3
- Fix it so that after 50 steps the loss is below 0.75 and decreasing.
4
- Model: nn.Linear(10, 2), dataset: random 2-class, 32 samples/batch.
5
  Print losses as: LOSSES:[val1, val2, ...]
 
6
  """
7
 
8
  BUGGY_CODE = """
9
  import torch
10
  import torch.nn as nn
 
 
11
  torch.manual_seed(0)
 
 
 
 
 
 
 
 
 
 
 
12
  model = nn.Linear(10, 2)
13
  optimizer = torch.optim.Adam(model.parameters(), lr=10.0) # BUG 1: lr too high
14
  criterion = nn.CrossEntropyLoss()
 
15
  losses = []
16
- for step in range(50):
17
- x = torch.randn(32, 10)
18
- y = torch.randint(0, 2, (32,))
19
- optimizer.zero_grad()
20
- logits = model(x)
21
- loss = criterion(logits, y)
22
- optimizer.step() # BUG 2: step before backward
23
- loss.backward() # BUG 3: backward after step
24
- losses.append(loss.item())
 
 
 
 
 
 
 
25
  print('##METRICS_START##')
26
  print('LOSSES:' + str(losses))
 
27
  print('##METRICS_END##')
28
  """
29
 
 
1
  TASK_DESCRIPTION = """
2
  This 2-class linear classifier training loop has bugs preventing convergence.
3
+ Fix it so that after 50 epochs the loss is below 0.5 and validation accuracy is above 0.80.
4
+ Model: nn.Linear(10, 2), dataset: fixed 2-class (160 train, 40 val samples).
5
  Print losses as: LOSSES:[val1, val2, ...]
6
+ Print validation accuracy as: VAL_ACC:X.XX
7
  """
8
 
9
  BUGGY_CODE = """
10
  import torch
11
  import torch.nn as nn
12
+ from torch.utils.data import TensorDataset, DataLoader
13
+
14
  torch.manual_seed(0)
15
+
16
+ # Generate fixed training and validation datasets with learnable pattern
17
+ # y = 1 if first feature > 0, else 0
18
+ X_train = torch.randn(160, 10)
19
+ y_train = (X_train[:, 0] > 0).long()
20
+ X_val = torch.randn(40, 10)
21
+ y_val = (X_val[:, 0] > 0).long()
22
+
23
+ train_dataset = TensorDataset(X_train, y_train)
24
+ train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
25
+
26
  model = nn.Linear(10, 2)
27
  optimizer = torch.optim.Adam(model.parameters(), lr=10.0) # BUG 1: lr too high
28
  criterion = nn.CrossEntropyLoss()
29
+
30
  losses = []
31
+ for epoch in range(50):
32
+ for x, y in train_loader:
33
+ optimizer.zero_grad()
34
+ logits = model(x)
35
+ loss = criterion(logits, y)
36
+ optimizer.step() # BUG 2: step before backward
37
+ loss.backward() # BUG 3: backward after step
38
+ losses.append(loss.item())
39
+
40
+ # Validation
41
+ model.eval()
42
+ with torch.no_grad():
43
+ val_logits = model(X_val)
44
+ val_preds = val_logits.argmax(dim=1)
45
+ val_acc = (val_preds == y_val).float().mean().item()
46
+
47
  print('##METRICS_START##')
48
  print('LOSSES:' + str(losses))
49
+ print('VAL_ACC:' + str(round(val_acc, 4)))
50
  print('##METRICS_END##')
51
  """
52
 
server/tasks/task2_nan_loss.py CHANGED
@@ -1,32 +1,58 @@
1
  TASK_DESCRIPTION = """
2
- This binary regression trainer produces NaN loss around step 15.
3
- Fix the numerical instability so loss stays finite for all 60 steps
4
- and the final loss is below 0.5.
5
  Print losses as: LOSSES:[val1, val2, ...]
 
6
  """
7
 
8
  BUGGY_CODE = """
9
  import torch
10
  import torch.nn as nn
 
 
11
  torch.manual_seed(42)
 
 
 
 
 
 
 
 
 
 
 
12
  model = nn.Linear(16, 1)
13
- optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
 
 
14
  losses = []
15
- for step in range(60):
16
- x = torch.randn(64, 16)
17
- y = torch.rand(64, 1)
18
- optimizer.zero_grad()
19
- pred = torch.sigmoid(model(x))
20
- # BUG: log(pred) can be -inf when pred rounds to 0.0
21
- loss = -torch.mean(y * torch.log(pred) + (1 - y) * torch.log(1 - pred))
22
- loss.backward()
23
- optimizer.step()
24
- losses.append(loss.item())
 
 
 
 
 
 
 
 
25
  print('##METRICS_START##')
26
  print('LOSSES:' + str(losses))
 
27
  print('##METRICS_END##')
28
  """
29
 
30
  GROUND_TRUTH_BUGS = [
31
  "torch.log(pred) when pred can be 0.0 after sigmoid — use F.binary_cross_entropy or clamp",
 
32
  ]
 
1
  TASK_DESCRIPTION = """
2
+ This binary classification trainer produces NaN loss after a few epochs.
3
+ Fix the numerical instability so loss stays finite for all 60 epochs
4
+ and the final loss is below 0.4 with validation accuracy above 0.75.
5
  Print losses as: LOSSES:[val1, val2, ...]
6
+ Print validation accuracy as: VAL_ACC:X.XX
7
  """
8
 
9
  BUGGY_CODE = """
10
  import torch
11
  import torch.nn as nn
12
+ from torch.utils.data import TensorDataset, DataLoader
13
+
14
  torch.manual_seed(42)
15
+
16
+ # Generate fixed training and validation datasets with learnable pattern
17
+ # y = 1 if sum of first 3 features > 0, else 0
18
+ X_train = torch.randn(320, 16)
19
+ y_train = (X_train[:, :3].sum(dim=1, keepdim=True) > 0).float()
20
+ X_val = torch.randn(80, 16)
21
+ y_val = (X_val[:, :3].sum(dim=1, keepdim=True) > 0).float()
22
+
23
+ train_dataset = TensorDataset(X_train, y_train)
24
+ train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
25
+
26
  model = nn.Linear(16, 1)
27
+ # BUG AMPLIFIER: Higher learning rate makes predictions more extreme, causing log(0)
28
+ optimizer = torch.optim.SGD(model.parameters(), lr=0.5)
29
+
30
  losses = []
31
+ for epoch in range(60):
32
+ for x, y in train_loader:
33
+ optimizer.zero_grad()
34
+ pred = torch.sigmoid(model(x))
35
+ # BUG: log(pred) can be -inf when pred rounds to 0.0 due to extreme weights
36
+ # This happens because SGD with high LR pushes weights to extreme values
37
+ loss = -torch.mean(y * torch.log(pred) + (1 - y) * torch.log(1 - pred))
38
+ loss.backward()
39
+ optimizer.step()
40
+ losses.append(loss.item())
41
+
42
+ # Validation
43
+ model.eval()
44
+ with torch.no_grad():
45
+ val_pred = torch.sigmoid(model(X_val))
46
+ val_binary = (val_pred > 0.5).float()
47
+ val_acc = (val_binary == y_val).float().mean().item()
48
+
49
  print('##METRICS_START##')
50
  print('LOSSES:' + str(losses))
51
+ print('VAL_ACC:' + str(round(val_acc, 4)))
52
  print('##METRICS_END##')
53
  """
54
 
55
  GROUND_TRUTH_BUGS = [
56
  "torch.log(pred) when pred can be 0.0 after sigmoid — use F.binary_cross_entropy or clamp",
57
+ "High learning rate (0.5) causes extreme predictions",
58
  ]
server/tasks/task3_oom_leakage.py CHANGED
@@ -1,50 +1,47 @@
1
  TASK_DESCRIPTION = """
2
- This trainer has TWO independent bugs:
3
- 1. A memory leak causing OOM crash before epoch 5 on CPU.
4
- 2. Data leakage inflating validation accuracy.
5
- Fix both. After 20 epochs: val_acc > 0.70, no OOM, no suspicious early accuracy spike.
6
  Print as: VAL_ACCS:[v1,v2,...] and FINAL_LOSS:X.XX
7
  """
8
 
9
  BUGGY_CODE = """
10
  import torch
11
  import torch.nn as nn
12
- from torch.utils.data import DataLoader, TensorDataset, random_split
13
 
14
  torch.manual_seed(42)
15
- X = torch.randn(1000, 20)
16
- y = (X[:, 0] > 0).float()
17
- # BUG 1: augmentation before split — val set gets augmented
18
- X = X + torch.randn_like(X) * 0.1
19
- train_ds, val_ds = random_split(TensorDataset(X, y), [800, 200])
 
20
  model = nn.Sequential(nn.Linear(20, 64), nn.ReLU(), nn.Linear(64, 1))
21
- optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
22
  criterion = nn.BCEWithLogitsLoss()
23
- train_losses, val_accs = [], []
24
- total_loss = torch.tensor(0.0) # BUG 2: keeps computation graph alive
25
  for epoch in range(20):
26
  model.train()
27
- for xb, yb in DataLoader(train_ds, batch_size=32):
28
  optimizer.zero_grad()
29
  out = model(xb).squeeze()
30
- loss = criterion(out, yb)
 
31
  loss.backward()
32
  optimizer.step()
33
- total_loss = total_loss + loss # BUG 2: graph accumulates
34
  model.eval()
35
  with torch.no_grad():
36
- idx = val_ds.indices
37
- xv, yv = X[idx], y[idx]
38
- preds = (torch.sigmoid(model(xv)) > 0.5).float()
39
- acc = (preds == yv).float().mean().item()
40
  val_accs.append(round(acc, 4))
41
  print('##METRICS_START##')
42
  print('VAL_ACCS:' + str(val_accs))
43
- print('FINAL_LOSS:' + str(total_loss.item()))
44
  print('##METRICS_END##')
45
  """
46
 
47
  GROUND_TRUTH_BUGS = [
48
- "Augmentation applied before split move after split, apply to train only",
49
- "total_loss += loss retains graph — use total_loss += loss.item()",
50
  ]
 
1
  TASK_DESCRIPTION = """
2
+ This binary classification trainer has a bug causing validation accuracy around 50%.
3
+ Fix the bug. After 20 epochs: VAL_ACC > 0.90, FINAL_LOSS < 0.3.
 
 
4
  Print as: VAL_ACCS:[v1,v2,...] and FINAL_LOSS:X.XX
5
  """
6
 
7
  BUGGY_CODE = """
8
  import torch
9
  import torch.nn as nn
10
+ from torch.utils.data import DataLoader, TensorDataset
11
 
12
  torch.manual_seed(42)
13
+ X_train = torch.randn(800, 20)
14
+ y_train = (X_train[:, 0] > 0).float()
15
+ X_val = torch.randn(200, 20)
16
+ y_val = (X_val[:, 0] > 0).float()
17
+
18
+ train_ds = TensorDataset(X_train, y_train)
19
  model = nn.Sequential(nn.Linear(20, 64), nn.ReLU(), nn.Linear(64, 1))
20
+ optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
21
  criterion = nn.BCEWithLogitsLoss()
22
+ val_accs = []
23
+ losses = []
24
  for epoch in range(20):
25
  model.train()
26
+ for xb, yb in DataLoader(train_ds, batch_size=32, shuffle=True):
27
  optimizer.zero_grad()
28
  out = model(xb).squeeze()
29
+ # BUG: Wrong label transformation - should use yb directly
30
+ loss = criterion(out, 1 - yb)
31
  loss.backward()
32
  optimizer.step()
33
+ losses.append(loss.item())
34
  model.eval()
35
  with torch.no_grad():
36
+ preds = (torch.sigmoid(model(X_val).squeeze()) > 0.5).float()
37
+ acc = (preds == y_val).float().mean().item()
 
 
38
  val_accs.append(round(acc, 4))
39
  print('##METRICS_START##')
40
  print('VAL_ACCS:' + str(val_accs))
41
+ print('FINAL_LOSS:' + str(sum(losses[-25:])/25))
42
  print('##METRICS_END##')
43
  """
44
 
45
  GROUND_TRUTH_BUGS = [
46
+ "Label inversion: criterion(out, 1 - yb) inverts the labels use criterion(out, yb)",
 
47
  ]
server/tasks/task5_frozen_backbone.py CHANGED
@@ -1,8 +1,15 @@
1
  TASK_DESCRIPTION = """
2
  This is a standard transfer learning setup classifying 10 categories.
3
  The developer froze the backbone during testing, but forgot to unfreeze it while still passing its parameters to the optimizer.
4
- Fix the code so the backbone actually trains, or only pass the head parameters.
5
- The grader checks the gradient norm of the backbone from the first backward pass.
 
 
 
 
 
 
 
6
  """
7
 
8
  BUGGY_CODE = """
@@ -23,17 +30,20 @@ backbone = nn.Sequential(
23
  nn.ReLU()
24
  )
25
 
26
- # BUG: backbone is frozen, but passed to optimizer
27
  backbone.requires_grad_(False)
28
 
29
  head = nn.Linear(512, 10)
30
 
31
- # passing both backbone and head to optimizer even though backbone is frozen
32
  optimizer = torch.optim.Adam(
33
  list(backbone.parameters()) + list(head.parameters()), lr=0.001
34
  )
35
  criterion = nn.CrossEntropyLoss()
36
 
 
 
 
37
  losses = []
38
 
39
  # Take one step to check gradients
@@ -52,11 +62,9 @@ backbone_grad_norm = sum(
52
  optimizer.step()
53
  losses.append(loss.item())
54
 
55
- # Note: if backbone is properly frozen and only head is passed, backbone_grad_norm will be 0 but optimizer won't complain.
56
- # If backbone is unfrozen, backbone_grad_norm will be > 0.
57
- # The grader handles both valid solutions.
58
  print('##METRICS_START##')
59
  print('FINAL_LOSS:' + str(losses[-1]))
60
  print('BACKBONE_GRAD_NORM:' + str(backbone_grad_norm))
 
61
  print('##METRICS_END##')
62
  """
 
1
  TASK_DESCRIPTION = """
2
  This is a standard transfer learning setup classifying 10 categories.
3
  The developer froze the backbone during testing, but forgot to unfreeze it while still passing its parameters to the optimizer.
4
+ This wastes memory and computation as frozen params don't need optimizer state.
5
+
6
+ Fix the code so EITHER:
7
+ 1. The backbone actually trains (unfreeze it), OR
8
+ 2. Only pass trainable parameters to the optimizer
9
+
10
+ The grader checks:
11
+ - BACKBONE_GRAD_NORM: >0 means backbone is training, =0 means properly frozen
12
+ - OPTIMIZER_PARAM_COUNT: Should be reduced if only passing head params
13
  """
14
 
15
  BUGGY_CODE = """
 
30
  nn.ReLU()
31
  )
32
 
33
+ # BUG: backbone is frozen, but passed to optimizer (wastes memory/compute)
34
  backbone.requires_grad_(False)
35
 
36
  head = nn.Linear(512, 10)
37
 
38
+ # BUG: passing frozen backbone params to optimizer
39
  optimizer = torch.optim.Adam(
40
  list(backbone.parameters()) + list(head.parameters()), lr=0.001
41
  )
42
  criterion = nn.CrossEntropyLoss()
43
 
44
+ # Count params in optimizer (for grading)
45
+ optimizer_param_count = sum(p.numel() for g in optimizer.param_groups for p in g['params'])
46
+
47
  losses = []
48
 
49
  # Take one step to check gradients
 
62
  optimizer.step()
63
  losses.append(loss.item())
64
 
 
 
 
65
  print('##METRICS_START##')
66
  print('FINAL_LOSS:' + str(losses[-1]))
67
  print('BACKBONE_GRAD_NORM:' + str(backbone_grad_norm))
68
+ print('OPTIMIZER_PARAM_COUNT:' + str(optimizer_param_count))
69
  print('##METRICS_END##')
70
  """