File size: 99,633 Bytes
10643b7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
======================================================================
  FRANKENSTEIN REALIGNMENT v2  FRESH START
======================================================================
  Raw merge:   /mnt/scratch/checkpoints/sentinel_prime_frankenstein.pt
  Steps:       5000
  Unfreeze at: step 500
  Batch:       8 × 6 = 48
  Seq len:     4096
  Phase 1 LR:  0.0001  3e-05 (warmup 100)
  Phase 2:     SGDR 5 cycles, expert_scale=0.3
  aux_loss:    0.05, z_loss: 0.002 (from step 0)
  EMA:         decay=0.9995, every 10 steps
  Eff tokens/step: 196,608

[1/5] Building model...
  14.40B parameters

[2/5] Loading raw merge: /mnt/scratch/checkpoints/sentinel_prime_frankenstein.pt
  Merge loaded.
   Router aux_loss_weight = 0.05 (all layers)
  Merge meta: Sentinel Prime (Frankenstein Edition)
    attention_norms: NousResearch/Hermes-3-Llama-3.1-8B
    ffn_experts_0_2: Salesforce/xLAM-7b-fc-r
    ffn_experts_1_3: deepseek-ai/deepseek-coder-6.7b-instruct
    embeddings: SentinelBrain-14B-MoE-v0.1 (original)
    router: SentinelBrain-14B-MoE-v0.1 (original)
  VRAM after load: 28.8GB

  Enabling gradient checkpointing...
  Gradient checkpointing enabled for 24 layers

[3/5] Progressive unfreezing setup...
  Froze 192 expert params. Trainable: 290/482

[4/5] Loading training data from /mnt/scratch/shards
  [train] 1710 shards, 16.48B tokens
  [val] 160 shards, 0.86B tokens

[5/5] Setting up optimizer (Phase 1)...
  Trainable: 5.75B / 14.40B
  Optimizer: AdamW (decay: 241, no-decay: 49)

  Initial evaluation...
  Initial val_loss=15.8210, val_ppl=7429962.9
   EMA initialized (482 params on CPU)

======================================================================
  STARTING TRAINING
======================================================================
  Batch:       8 x 6 = 48 effective
  Tokens/step: 196,608
  VRAM:        28.9/206GB (14%)
  SGDR cycles: 5
    Cycle 0: steps 500-700 (T=200), peak=5.0e-05, ramp=30
    Cycle 1: steps 700-1100 (T=400), peak=4.0e-05, ramp=40
    Cycle 2: steps 1100-1900 (T=800), peak=3.0e-05, ramp=60
    Cycle 3: steps 1900-3500 (T=1600), peak=2.5e-05, ramp=80
    Cycle 4: steps 3500-5000 (T=1500), peak=2.0e-05, ramp=100
  step     0/5000 | loss 15.8368 | ppl 7548307.7 | lr 1.00e-06 | gnorm 12.75 | tok/s 4,544 | VRAM 67GB (32%) | ETA 60.1h [FROZEN] | [E0:26% E1:24% E2:25% E3:24%] CF=[1.03 0.98 1.02 0.97]
  step    10/5000 | loss 14.6608 | ppl 2328555.1 | lr 1.10e-05 | gnorm 13.44 | tok/s 6,258 | VRAM 67GB (32%) | ETA 43.5h [FROZEN] | [E0:26% E1:25% E2:25% E3:24%] CF=[1.04 0.98 1.01 0.96]
  step    20/5000 | loss 12.4423 | ppl 253288.5 | lr 2.10e-05 | gnorm 13.94 | tok/s 6,368 | VRAM 67GB (32%) | ETA 42.7h [FROZEN] | [E0:28% E1:24% E2:25% E3:23%] CF=[1.11 0.95 1.02 0.92]
  step    30/5000 | loss 10.6901 | ppl 43919.2 | lr 3.10e-05 | gnorm 5.16 | tok/s 6,412 | VRAM 67GB (32%) | ETA 42.3h [FROZEN] | [E0:30% E1:21% E2:26% E3:23%] CF=[1.19 0.84 1.06 0.92]
  step    40/5000 | loss 9.5284 | ppl 13745.1 | lr 4.10e-05 | gnorm 3.09 | tok/s 6,438 | VRAM 67GB (32%) | ETA 42.1h [FROZEN] | [E0:31% E1:19% E2:28% E3:21%] CF=[1.24 0.77 1.13 0.85]
  step    50/5000 | loss 8.8601 | ppl 7045.2 | lr 5.10e-05 | gnorm 3.36 | tok/s 6,512 | VRAM 67GB (32%) | ETA 41.5h [FROZEN] | [E0:30% E1:20% E2:29% E3:21%] CF=[1.18 0.81 1.17 0.84]
  step    60/5000 | loss 8.2434 | ppl 3802.4 | lr 6.10e-05 | gnorm 3.39 | tok/s 6,521 | VRAM 67GB (32%) | ETA 41.4h [FROZEN] | [E0:29% E1:20% E2:32% E3:18%] CF=[1.18 0.79 1.29 0.74]
  step    70/5000 | loss 8.0612 | ppl 3169.1 | lr 7.10e-05 | gnorm 2.05 | tok/s 6,528 | VRAM 67GB (32%) | ETA 41.2h [FROZEN] | [E0:31% E1:18% E2:33% E3:18%] CF=[1.24 0.72 1.32 0.72]
  step    80/5000 | loss 7.7648 | ppl 2356.2 | lr 8.10e-05 | gnorm 4.00 | tok/s 6,532 | VRAM 67GB (32%) | ETA 41.1h [FROZEN] | [E0:31% E1:18% E2:33% E3:18%] CF=[1.25 0.73 1.31 0.71]
  step    90/5000 | loss 7.6206 | ppl 2039.9 | lr 9.10e-05 | gnorm 2.97 | tok/s 6,533 | VRAM 67GB (32%) | ETA 41.0h [FROZEN] | [E0:31% E1:18% E2:32% E3:19%] CF=[1.25 0.72 1.29 0.74]
  step   100/5000 | loss 7.4868 | ppl 1784.4 | lr 1.00e-04 | gnorm 4.38 | tok/s 6,530 | VRAM 67GB (32%) | ETA 41.0h [FROZEN] | [E0:31% E1:18% E2:32% E3:19%] CF=[1.25 0.72 1.28 0.75]
  >> EVAL: val_loss=7.3564 ppl=1566.2  NEW BEST  saved (+ EMA + full optimizer)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 100, full state + optimizer)
  step   110/5000 | loss 7.3392 | ppl 1539.5 | lr 9.99e-05 | gnorm 2.39 | tok/s 6,524 | VRAM 67GB (32%) | ETA 40.9h [FROZEN] | [E0:31% E1:18% E2:32% E3:18%] CF=[1.25 0.73 1.28 0.74]
  step   120/5000 | loss 7.2058 | ppl 1347.2 | lr 9.96e-05 | gnorm 3.05 | tok/s 6,517 | VRAM 67GB (32%) | ETA 40.9h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.26 0.73 1.27 0.73]
  step   130/5000 | loss 7.1589 | ppl 1285.5 | lr 9.90e-05 | gnorm 3.62 | tok/s 6,507 | VRAM 67GB (32%) | ETA 40.9h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.26 0.73 1.27 0.74]
  step   140/5000 | loss 6.9785 | ppl 1073.3 | lr 9.83e-05 | gnorm 3.08 | tok/s 6,483 | VRAM 67GB (32%) | ETA 40.9h [FROZEN] | [E0:32% E1:18% E2:32% E3:19%] CF=[1.26 0.72 1.27 0.74]
  step   150/5000 | loss 6.9154 | ppl 1007.7 | lr 9.73e-05 | gnorm 2.44 | tok/s 6,472 | VRAM 67GB (32%) | ETA 40.9h [FROZEN] | [E0:32% E1:18% E2:32% E3:19%] CF=[1.27 0.72 1.27 0.74]
  step   160/5000 | loss 6.7309 | ppl 837.9 | lr 9.62e-05 | gnorm 2.78 | tok/s 6,443 | VRAM 67GB (32%) | ETA 41.0h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.27 0.73]
  step   170/5000 | loss 6.7939 | ppl 892.4 | lr 9.48e-05 | gnorm 3.16 | tok/s 6,437 | VRAM 67GB (32%) | ETA 41.0h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.27 0.74]
  step   180/5000 | loss 6.7680 | ppl 869.6 | lr 9.33e-05 | gnorm 3.95 | tok/s 6,420 | VRAM 67GB (32%) | ETA 41.0h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.27 0.74]
  step   190/5000 | loss 6.6892 | ppl 803.7 | lr 9.16e-05 | gnorm 1.92 | tok/s 6,430 | VRAM 67GB (32%) | ETA 40.9h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.26 0.74]
  step   200/5000 | loss 6.5718 | ppl 714.6 | lr 8.97e-05 | gnorm 2.44 | tok/s 6,405 | VRAM 67GB (32%) | ETA 40.9h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.26 0.74]
  >> EVAL: val_loss=6.4265 ppl=618.0  NEW BEST  saved (+ EMA + full optimizer)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 200, full state + optimizer)
  step   210/5000 | loss 6.3597 | ppl 578.1 | lr 8.77e-05 | gnorm 1.84 | tok/s 6,423 | VRAM 67GB (32%) | ETA 40.7h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.26 0.74]
  step   220/5000 | loss 6.4999 | ppl 665.1 | lr 8.56e-05 | gnorm 3.22 | tok/s 6,417 | VRAM 67GB (32%) | ETA 40.7h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.26 0.74]
  step   230/5000 | loss 6.3102 | ppl 550.1 | lr 8.33e-05 | gnorm 3.12 | tok/s 6,414 | VRAM 67GB (32%) | ETA 40.6h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.74]
  step   240/5000 | loss 6.2094 | ppl 497.4 | lr 8.09e-05 | gnorm 2.61 | tok/s 6,406 | VRAM 67GB (32%) | ETA 40.6h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.26 0.74]
  step   250/5000 | loss 6.1780 | ppl 482.0 | lr 7.84e-05 | gnorm 2.05 | tok/s 6,428 | VRAM 67GB (32%) | ETA 40.4h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73]
  >> MILESTONE step 250 LOCKED  /mnt/scratch/checkpoints/frankenstein_v2_milestone_250.pt
  step   260/5000 | loss 6.3846 | ppl 592.7 | lr 7.58e-05 | gnorm 1.93 | tok/s 6,426 | VRAM 67GB (32%) | ETA 40.3h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   270/5000 | loss 6.2122 | ppl 498.8 | lr 7.32e-05 | gnorm 1.94 | tok/s 6,427 | VRAM 67GB (32%) | ETA 40.2h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   280/5000 | loss 6.1144 | ppl 452.3 | lr 7.05e-05 | gnorm 1.58 | tok/s 6,427 | VRAM 67GB (32%) | ETA 40.1h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.26 0.73]
  step   290/5000 | loss 6.0846 | ppl 439.0 | lr 6.77e-05 | gnorm 1.45 | tok/s 6,428 | VRAM 67GB (32%) | ETA 40.0h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   300/5000 | loss 6.0901 | ppl 441.5 | lr 6.50e-05 | gnorm 1.68 | tok/s 6,420 | VRAM 67GB (32%) | ETA 40.0h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  >> EVAL: val_loss=6.1254 ppl=457.3  NEW BEST  saved (+ EMA + full optimizer)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 300, full state + optimizer)
  step   310/5000 | loss 6.1131 | ppl 451.8 | lr 6.23e-05 | gnorm 1.55 | tok/s 6,421 | VRAM 67GB (32%) | ETA 39.9h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   320/5000 | loss 6.0769 | ppl 435.7 | lr 5.95e-05 | gnorm 1.51 | tok/s 6,420 | VRAM 67GB (32%) | ETA 39.8h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   330/5000 | loss 6.0861 | ppl 439.7 | lr 5.68e-05 | gnorm 1.85 | tok/s 6,437 | VRAM 67GB (32%) | ETA 39.6h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   340/5000 | loss 5.9268 | ppl 375.0 | lr 5.42e-05 | gnorm 1.74 | tok/s 6,449 | VRAM 67GB (32%) | ETA 39.5h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   350/5000 | loss 5.9748 | ppl 393.4 | lr 5.16e-05 | gnorm 1.25 | tok/s 6,462 | VRAM 67GB (32%) | ETA 39.3h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   360/5000 | loss 6.0018 | ppl 404.2 | lr 4.91e-05 | gnorm 1.42 | tok/s 6,468 | VRAM 67GB (32%) | ETA 39.2h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   370/5000 | loss 5.9796 | ppl 395.3 | lr 4.67e-05 | gnorm 1.62 | tok/s 6,473 | VRAM 67GB (32%) | ETA 39.1h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   380/5000 | loss 6.0423 | ppl 420.8 | lr 4.44e-05 | gnorm 1.37 | tok/s 6,450 | VRAM 67GB (32%) | ETA 39.1h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   390/5000 | loss 5.9497 | ppl 383.6 | lr 4.23e-05 | gnorm 1.46 | tok/s 6,444 | VRAM 67GB (32%) | ETA 39.1h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   400/5000 | loss 5.9011 | ppl 365.4 | lr 4.03e-05 | gnorm 1.16 | tok/s 6,431 | VRAM 67GB (32%) | ETA 39.1h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  >> EVAL: val_loss=5.8662 ppl=352.9  NEW BEST  saved (+ EMA + full optimizer)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 400, full state + optimizer)
  step   410/5000 | loss 5.9157 | ppl 370.8 | lr 3.84e-05 | gnorm 1.04 | tok/s 6,428 | VRAM 67GB (32%) | ETA 39.0h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   420/5000 | loss 5.8687 | ppl 353.8 | lr 3.67e-05 | gnorm 1.44 | tok/s 6,425 | VRAM 67GB (32%) | ETA 38.9h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   430/5000 | loss 5.8683 | ppl 353.7 | lr 3.52e-05 | gnorm 1.38 | tok/s 6,449 | VRAM 67GB (32%) | ETA 38.7h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   440/5000 | loss 5.8722 | ppl 355.0 | lr 3.38e-05 | gnorm 1.61 | tok/s 6,443 | VRAM 67GB (32%) | ETA 38.7h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   450/5000 | loss 5.8595 | ppl 350.6 | lr 3.27e-05 | gnorm 1.23 | tok/s 6,453 | VRAM 67GB (32%) | ETA 38.5h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   460/5000 | loss 5.8211 | ppl 337.3 | lr 3.17e-05 | gnorm 1.14 | tok/s 6,455 | VRAM 67GB (32%) | ETA 38.4h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   470/5000 | loss 5.9151 | ppl 370.6 | lr 3.10e-05 | gnorm 1.26 | tok/s 6,460 | VRAM 67GB (32%) | ETA 38.3h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   480/5000 | loss 5.8819 | ppl 358.5 | lr 3.04e-05 | gnorm 1.43 | tok/s 6,462 | VRAM 67GB (32%) | ETA 38.2h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   490/5000 | loss 5.8794 | ppl 357.6 | lr 3.01e-05 | gnorm 1.21 | tok/s 6,471 | VRAM 67GB (32%) | ETA 38.1h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]

  >>> Step 500: UNFREEZING EXPERTS <<<
  Unfroze all 482 params.
   Pre-unfreeze checkpoint LOCKED: /mnt/scratch/checkpoints/frankenstein_v2_pre_unfreeze.pt
   Pre-unfreeze FULL checkpoint LOCKED: /mnt/scratch/checkpoints/frankenstein_v2_pre_unfreeze_full.pt
  Expert: 8.66B @ lr=4.85e-06
  Base:   5.75B @ lr=1.62e-05
  Spike guard: 3.0× EMA (tightened)
  step   500/5000 | loss 5.9927 | ppl 400.5 | lr 1.62e-05 | gnorm 1.45 | tok/s 6,205 | VRAM 119GB (58%) | ETA 39.6h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  >> EVAL: val_loss=5.8451 ppl=345.5  NEW BEST  saved (+ EMA + full optimizer)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 500, full state + optimizer)
  >> MILESTONE step 500 LOCKED  /mnt/scratch/checkpoints/frankenstein_v2_milestone_500.pt
  step   510/5000 | loss 5.7749 | ppl 322.1 | lr 2.78e-05 | gnorm 1.70 | tok/s 6,068 | VRAM 119GB (58%) | ETA 40.4h C0 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73]
  >> MILESTONE step 510 LOCKED  /mnt/scratch/checkpoints/frankenstein_v2_milestone_510.pt
  step   520/5000 | loss 5.7268 | ppl 307.0 | lr 3.95e-05 | gnorm 1.78 | tok/s 5,934 | VRAM 119GB (58%) | ETA 41.2h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   530/5000 | loss 5.8261 | ppl 339.0 | lr 5.00e-05 | gnorm 2.03 | tok/s 5,809 | VRAM 119GB (58%) | ETA 42.0h C0 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   540/5000 | loss 5.9697 | ppl 391.4 | lr 4.97e-05 | gnorm 1.78 | tok/s 5,690 | VRAM 119GB (58%) | ETA 42.8h C0 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   550/5000 | loss 5.7818 | ppl 324.3 | lr 4.88e-05 | gnorm 1.50 | tok/s 5,790 | VRAM 119GB (58%) | ETA 42.0h C0 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   560/5000 | loss 6.0418 | ppl 420.7 | lr 4.74e-05 | gnorm 1.41 | tok/s 5,791 | VRAM 119GB (58%) | ETA 41.9h C0 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   570/5000 | loss 5.7851 | ppl 325.4 | lr 4.54e-05 | gnorm 1.52 | tok/s 5,781 | VRAM 119GB (58%) | ETA 41.8h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   580/5000 | loss 5.9342 | ppl 377.7 | lr 4.30e-05 | gnorm 1.90 | tok/s 5,782 | VRAM 119GB (58%) | ETA 41.8h C0 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   590/5000 | loss 5.7303 | ppl 308.1 | lr 4.03e-05 | gnorm 1.63 | tok/s 5,779 | VRAM 119GB (58%) | ETA 41.7h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   600/5000 | loss 5.9043 | ppl 366.6 | lr 3.73e-05 | gnorm 1.30 | tok/s 5,779 | VRAM 119GB (58%) | ETA 41.6h C0 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73]
  >> EVAL: val_loss=5.7113 ppl=302.3  NEW BEST  saved (+ EMA + full optimizer)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 600, full state + optimizer)
  step   610/5000 | loss 5.7160 | ppl 303.7 | lr 3.41e-05 | gnorm 1.27 | tok/s 5,780 | VRAM 119GB (58%) | ETA 41.5h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   620/5000 | loss 5.6105 | ppl 273.3 | lr 3.09e-05 | gnorm 1.41 | tok/s 5,793 | VRAM 119GB (58%) | ETA 41.3h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   630/5000 | loss 5.5641 | ppl 260.9 | lr 2.77e-05 | gnorm 1.41 | tok/s 5,795 | VRAM 119GB (58%) | ETA 41.2h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   640/5000 | loss 5.5701 | ppl 262.5 | lr 2.47e-05 | gnorm 1.95 | tok/s 5,798 | VRAM 119GB (58%) | ETA 41.1h C0 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   650/5000 | loss 5.6485 | ppl 283.9 | lr 2.20e-05 | gnorm 1.25 | tok/s 5,799 | VRAM 119GB (58%) | ETA 41.0h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step   660/5000 | loss 5.7846 | ppl 325.3 | lr 1.96e-05 | gnorm 1.19 | tok/s 5,798 | VRAM 119GB (58%) | ETA 40.9h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   670/5000 | loss 5.8722 | ppl 355.0 | lr 1.76e-05 | gnorm 1.09 | tok/s 5,798 | VRAM 119GB (58%) | ETA 40.8h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   680/5000 | loss 5.7341 | ppl 309.2 | lr 1.62e-05 | gnorm 3.89 | tok/s 5,797 | VRAM 119GB (58%) | ETA 40.7h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   690/5000 | loss 5.8302 | ppl 340.4 | lr 1.53e-05 | gnorm 1.05 | tok/s 5,797 | VRAM 119GB (58%) | ETA 40.6h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   700/5000 | loss 5.7496 | ppl 314.1 | lr 1.56e-05 | gnorm 1.15 | tok/s 5,797 | VRAM 119GB (58%) | ETA 40.5h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.6660 ppl=288.9  NEW BEST  saved (+ EMA + full optimizer)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 700, full state + optimizer)
  step   710/5000 | loss 5.7459 | ppl 312.9 | lr 2.19e-05 | gnorm 1.55 | tok/s 5,796 | VRAM 119GB (58%) | ETA 40.4h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   720/5000 | loss 5.7757 | ppl 322.4 | lr 2.81e-05 | gnorm 1.59 | tok/s 5,797 | VRAM 119GB (58%) | ETA 40.3h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   730/5000 | loss 5.7632 | ppl 318.4 | lr 3.44e-05 | gnorm 1.78 | tok/s 5,795 | VRAM 119GB (58%) | ETA 40.2h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   740/5000 | loss 5.7689 | ppl 320.2 | lr 4.00e-05 | gnorm 1.72 | tok/s 5,795 | VRAM 119GB (58%) | ETA 40.1h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   750/5000 | loss 5.6952 | ppl 297.4 | lr 4.00e-05 | gnorm 1.71 | tok/s 5,793 | VRAM 119GB (58%) | ETA 40.1h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   760/5000 | loss 5.6071 | ppl 272.4 | lr 3.98e-05 | gnorm 1.37 | tok/s 5,793 | VRAM 119GB (58%) | ETA 40.0h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   770/5000 | loss 5.7852 | ppl 325.4 | lr 3.96e-05 | gnorm 1.85 | tok/s 5,791 | VRAM 119GB (58%) | ETA 39.9h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   780/5000 | loss 5.7631 | ppl 318.3 | lr 3.92e-05 | gnorm 1.78 | tok/s 5,792 | VRAM 119GB (58%) | ETA 39.8h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   790/5000 | loss 5.6911 | ppl 296.2 | lr 3.88e-05 | gnorm 1.59 | tok/s 5,790 | VRAM 119GB (58%) | ETA 39.7h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   800/5000 | loss 5.8214 | ppl 337.4 | lr 3.83e-05 | gnorm 1.97 | tok/s 5,791 | VRAM 119GB (58%) | ETA 39.6h C1 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.6552 ppl=285.8  NEW BEST  saved (+ EMA + full optimizer)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 800, full state + optimizer)
  step   810/5000 | loss 5.6561 | ppl 286.0 | lr 3.77e-05 | gnorm 1.14 | tok/s 5,792 | VRAM 119GB (58%) | ETA 39.5h C1 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   820/5000 | loss 5.8421 | ppl 344.5 | lr 3.71e-05 | gnorm 1.85 | tok/s 5,794 | VRAM 119GB (58%) | ETA 39.4h C1 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   830/5000 | loss 5.6260 | ppl 277.6 | lr 3.63e-05 | gnorm 1.80 | tok/s 5,793 | VRAM 119GB (58%) | ETA 39.3h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   840/5000 | loss 5.7293 | ppl 307.7 | lr 3.55e-05 | gnorm 1.38 | tok/s 5,793 | VRAM 119GB (58%) | ETA 39.2h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   850/5000 | loss 5.7658 | ppl 319.2 | lr 3.47e-05 | gnorm 1.41 | tok/s 5,793 | VRAM 119GB (58%) | ETA 39.1h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   860/5000 | loss 5.7830 | ppl 324.7 | lr 3.38e-05 | gnorm 1.22 | tok/s 5,795 | VRAM 119GB (58%) | ETA 39.0h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   870/5000 | loss 5.6170 | ppl 275.1 | lr 3.28e-05 | gnorm 1.48 | tok/s 5,795 | VRAM 119GB (58%) | ETA 38.9h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   880/5000 | loss 5.6471 | ppl 283.5 | lr 3.18e-05 | gnorm 1.59 | tok/s 5,796 | VRAM 119GB (58%) | ETA 38.8h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   890/5000 | loss 5.5396 | ppl 254.6 | lr 3.07e-05 | gnorm 1.61 | tok/s 5,796 | VRAM 119GB (58%) | ETA 38.7h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   900/5000 | loss 5.6467 | ppl 283.4 | lr 2.97e-05 | gnorm 1.34 | tok/s 5,795 | VRAM 119GB (58%) | ETA 38.6h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.5526 ppl=257.9  NEW BEST  saved (+ EMA + full optimizer)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 900, full state + optimizer)
  step   910/5000 | loss 5.6857 | ppl 294.6 | lr 2.86e-05 | gnorm 1.75 | tok/s 5,794 | VRAM 119GB (58%) | ETA 38.6h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   920/5000 | loss 5.7536 | ppl 315.3 | lr 2.75e-05 | gnorm 1.62 | tok/s 5,794 | VRAM 119GB (58%) | ETA 38.5h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   930/5000 | loss 5.6720 | ppl 290.6 | lr 2.64e-05 | gnorm 1.50 | tok/s 5,795 | VRAM 119GB (58%) | ETA 38.4h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   940/5000 | loss 5.6297 | ppl 278.6 | lr 2.53e-05 | gnorm 1.27 | tok/s 5,795 | VRAM 119GB (58%) | ETA 38.3h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   950/5000 | loss 5.6664 | ppl 289.0 | lr 2.43e-05 | gnorm 1.41 | tok/s 5,797 | VRAM 119GB (58%) | ETA 38.2h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   960/5000 | loss 5.6227 | ppl 276.6 | lr 2.32e-05 | gnorm 1.35 | tok/s 5,797 | VRAM 119GB (58%) | ETA 38.1h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   970/5000 | loss 5.5974 | ppl 269.7 | lr 2.22e-05 | gnorm 1.43 | tok/s 5,797 | VRAM 119GB (58%) | ETA 38.0h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   980/5000 | loss 5.5043 | ppl 245.8 | lr 2.13e-05 | gnorm 1.21 | tok/s 5,797 | VRAM 119GB (58%) | ETA 37.9h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step   990/5000 | loss 5.6915 | ppl 296.3 | lr 2.03e-05 | gnorm 1.30 | tok/s 5,798 | VRAM 119GB (58%) | ETA 37.8h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1000/5000 | loss 5.6183 | ppl 275.4 | lr 1.95e-05 | gnorm 1.14 | tok/s 5,799 | VRAM 119GB (58%) | ETA 37.7h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.5968 ppl=269.6 (best=5.5526)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1000, full state + optimizer)
  >> MILESTONE step 1000 LOCKED  /mnt/scratch/checkpoints/frankenstein_v2_milestone_1000.pt
  step  1010/5000 | loss 5.3524 | ppl 211.1 | lr 1.87e-05 | gnorm 1.30 | tok/s 5,652 | VRAM 119GB (58%) | ETA 38.6h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1020/5000 | loss 5.4961 | ppl 243.7 | lr 1.79e-05 | gnorm 1.25 | tok/s 5,507 | VRAM 119GB (58%) | ETA 39.5h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1030/5000 | loss 5.8695 | ppl 354.1 | lr 1.73e-05 | gnorm 1.33 | tok/s 5,368 | VRAM 119GB (58%) | ETA 40.4h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1040/5000 | loss 5.7147 | ppl 303.3 | lr 1.67e-05 | gnorm 1.63 | tok/s 5,236 | VRAM 119GB (58%) | ETA 41.3h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1050/5000 | loss 5.5944 | ppl 268.9 | lr 1.62e-05 | gnorm 1.27 | tok/s 5,111 | VRAM 119GB (58%) | ETA 42.2h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1060/5000 | loss 5.4514 | ppl 233.1 | lr 1.58e-05 | gnorm 1.82 | tok/s 5,106 | VRAM 119GB (58%) | ETA 42.1h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1070/5000 | loss 5.7486 | ppl 313.8 | lr 1.54e-05 | gnorm 3.20 | tok/s 5,107 | VRAM 119GB (58%) | ETA 42.0h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1080/5000 | loss 5.7669 | ppl 319.6 | lr 1.52e-05 | gnorm 1.33 | tok/s 5,105 | VRAM 119GB (58%) | ETA 41.9h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1090/5000 | loss 5.5144 | ppl 248.2 | lr 1.50e-05 | gnorm 1.87 | tok/s 5,105 | VRAM 119GB (58%) | ETA 41.8h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1100/5000 | loss 5.6030 | ppl 271.2 | lr 1.52e-05 | gnorm 1.29 | tok/s 5,105 | VRAM 119GB (58%) | ETA 41.7h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.5704 ppl=262.5 (best=5.5526)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1100, full state + optimizer)
  step  1110/5000 | loss 5.4279 | ppl 227.7 | lr 1.78e-05 | gnorm 1.45 | tok/s 5,230 | VRAM 119GB (58%) | ETA 40.6h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1120/5000 | loss 5.7553 | ppl 315.9 | lr 2.03e-05 | gnorm 1.13 | tok/s 5,362 | VRAM 119GB (58%) | ETA 39.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1130/5000 | loss 5.6633 | ppl 288.1 | lr 2.28e-05 | gnorm 1.34 | tok/s 5,501 | VRAM 119GB (58%) | ETA 38.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1140/5000 | loss 5.4698 | ppl 237.4 | lr 2.53e-05 | gnorm 1.23 | tok/s 5,646 | VRAM 119GB (58%) | ETA 37.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1150/5000 | loss 5.6485 | ppl 283.9 | lr 2.78e-05 | gnorm 1.44 | tok/s 5,799 | VRAM 119GB (58%) | ETA 36.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1160/5000 | loss 5.5408 | ppl 254.9 | lr 3.00e-05 | gnorm 1.52 | tok/s 5,798 | VRAM 119GB (58%) | ETA 36.2h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1170/5000 | loss 5.5947 | ppl 269.0 | lr 3.00e-05 | gnorm 1.47 | tok/s 5,798 | VRAM 119GB (58%) | ETA 36.1h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1180/5000 | loss 5.5501 | ppl 257.3 | lr 3.00e-05 | gnorm 1.77 | tok/s 5,797 | VRAM 119GB (58%) | ETA 36.0h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1190/5000 | loss 5.7825 | ppl 324.6 | lr 2.99e-05 | gnorm 1.48 | tok/s 5,798 | VRAM 119GB (58%) | ETA 35.9h C2 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1200/5000 | loss 5.4681 | ppl 237.0 | lr 2.99e-05 | gnorm 1.66 | tok/s 5,798 | VRAM 119GB (58%) | ETA 35.8h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.5044 ppl=245.8  NEW BEST  saved (+ EMA + full optimizer)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1200, full state + optimizer)
  step  1210/5000 | loss 5.5756 | ppl 263.9 | lr 2.98e-05 | gnorm 1.53 | tok/s 5,769 | VRAM 119GB (58%) | ETA 35.9h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1220/5000 | loss 5.5656 | ppl 261.3 | lr 2.98e-05 | gnorm 1.57 | tok/s 5,617 | VRAM 119GB (58%) | ETA 36.8h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1230/5000 | loss 5.6694 | ppl 289.9 | lr 2.97e-05 | gnorm 1.30 | tok/s 5,474 | VRAM 119GB (58%) | ETA 37.6h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1240/5000 | loss 5.6058 | ppl 272.0 | lr 2.96e-05 | gnorm 1.31 | tok/s 5,337 | VRAM 119GB (58%) | ETA 38.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1250/5000 | loss 5.7426 | ppl 311.9 | lr 2.95e-05 | gnorm 1.55 | tok/s 5,207 | VRAM 119GB (58%) | ETA 39.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1260/5000 | loss 5.6019 | ppl 270.9 | lr 2.93e-05 | gnorm 1.35 | tok/s 5,105 | VRAM 119GB (58%) | ETA 40.0h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1270/5000 | loss 5.5776 | ppl 264.4 | lr 2.92e-05 | gnorm 1.38 | tok/s 5,105 | VRAM 119GB (58%) | ETA 39.9h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1280/5000 | loss 5.6610 | ppl 287.4 | lr 2.90e-05 | gnorm 1.45 | tok/s 5,105 | VRAM 119GB (58%) | ETA 39.8h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1290/5000 | loss 5.4238 | ppl 226.7 | lr 2.89e-05 | gnorm 1.77 | tok/s 5,106 | VRAM 119GB (58%) | ETA 39.7h C2 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1300/5000 | loss 5.5268 | ppl 251.3 | lr 2.87e-05 | gnorm 1.48 | tok/s 5,107 | VRAM 119GB (58%) | ETA 39.6h C2 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.5083 ppl=246.7 (best=5.5044)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1300, full state + optimizer)
  step  1310/5000 | loss 5.5639 | ppl 260.8 | lr 2.85e-05 | gnorm 1.43 | tok/s 5,115 | VRAM 119GB (58%) | ETA 39.4h C2 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1320/5000 | loss 5.5524 | ppl 257.9 | lr 2.83e-05 | gnorm 1.22 | tok/s 5,116 | VRAM 119GB (58%) | ETA 39.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1330/5000 | loss 5.5659 | ppl 261.4 | lr 2.81e-05 | gnorm 1.39 | tok/s 5,117 | VRAM 119GB (58%) | ETA 39.2h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1340/5000 | loss 5.4635 | ppl 235.9 | lr 2.79e-05 | gnorm 1.48 | tok/s 5,119 | VRAM 119GB (58%) | ETA 39.0h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1350/5000 | loss 5.4772 | ppl 239.2 | lr 2.77e-05 | gnorm 1.53 | tok/s 5,119 | VRAM 119GB (58%) | ETA 38.9h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1360/5000 | loss 5.5250 | ppl 250.9 | lr 2.75e-05 | gnorm 1.62 | tok/s 5,114 | VRAM 119GB (58%) | ETA 38.9h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1370/5000 | loss 5.4610 | ppl 235.3 | lr 2.72e-05 | gnorm 1.45 | tok/s 5,115 | VRAM 119GB (58%) | ETA 38.8h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1380/5000 | loss 5.5452 | ppl 256.0 | lr 2.70e-05 | gnorm 1.59 | tok/s 5,114 | VRAM 119GB (58%) | ETA 38.7h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1390/5000 | loss 5.3807 | ppl 217.2 | lr 2.67e-05 | gnorm 1.44 | tok/s 5,113 | VRAM 119GB (58%) | ETA 38.6h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1400/5000 | loss 5.6586 | ppl 286.8 | lr 2.64e-05 | gnorm 1.30 | tok/s 5,112 | VRAM 119GB (58%) | ETA 38.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.3969 ppl=220.7  NEW BEST  saved (+ EMA + full optimizer)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1400, full state + optimizer)
  step  1410/5000 | loss 5.6199 | ppl 275.9 | lr 2.62e-05 | gnorm 1.24 | tok/s 5,235 | VRAM 119GB (58%) | ETA 37.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1420/5000 | loss 5.4698 | ppl 237.4 | lr 2.59e-05 | gnorm 1.67 | tok/s 5,365 | VRAM 119GB (58%) | ETA 36.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1430/5000 | loss 5.6563 | ppl 286.1 | lr 2.56e-05 | gnorm 1.32 | tok/s 5,501 | VRAM 119GB (58%) | ETA 35.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1440/5000 | loss 5.6030 | ppl 271.2 | lr 2.53e-05 | gnorm 1.48 | tok/s 5,646 | VRAM 119GB (58%) | ETA 34.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1450/5000 | loss 5.5052 | ppl 246.0 | lr 2.50e-05 | gnorm 1.45 | tok/s 5,797 | VRAM 119GB (58%) | ETA 33.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1460/5000 | loss 5.1654 | ppl 175.1 | lr 2.47e-05 | gnorm 1.45 | tok/s 5,796 | VRAM 119GB (58%) | ETA 33.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1470/5000 | loss 5.6636 | ppl 288.2 | lr 2.44e-05 | gnorm 1.36 | tok/s 5,796 | VRAM 119GB (58%) | ETA 33.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1480/5000 | loss 5.7023 | ppl 299.6 | lr 2.41e-05 | gnorm 1.36 | tok/s 5,796 | VRAM 119GB (58%) | ETA 33.2h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1490/5000 | loss 5.5606 | ppl 260.0 | lr 2.38e-05 | gnorm 1.38 | tok/s 5,795 | VRAM 119GB (58%) | ETA 33.1h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1500/5000 | loss 5.5311 | ppl 252.4 | lr 2.35e-05 | gnorm 1.47 | tok/s 5,796 | VRAM 119GB (58%) | ETA 33.0h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.4685 ppl=237.1 (best=5.3969)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1500, full state + optimizer)
  step  1510/5000 | loss 5.5400 | ppl 254.7 | lr 2.31e-05 | gnorm 1.58 | tok/s 5,797 | VRAM 119GB (58%) | ETA 32.9h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1520/5000 | loss 5.5024 | ppl 245.3 | lr 2.28e-05 | gnorm 1.47 | tok/s 5,796 | VRAM 119GB (58%) | ETA 32.8h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1530/5000 | loss 5.6098 | ppl 273.1 | lr 2.25e-05 | gnorm 1.27 | tok/s 5,796 | VRAM 119GB (58%) | ETA 32.7h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1540/5000 | loss 5.4081 | ppl 223.2 | lr 2.22e-05 | gnorm 1.32 | tok/s 5,794 | VRAM 119GB (58%) | ETA 32.6h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1550/5000 | loss 5.5051 | ppl 245.9 | lr 2.19e-05 | gnorm 1.47 | tok/s 5,793 | VRAM 119GB (58%) | ETA 32.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1560/5000 | loss 5.5016 | ppl 245.1 | lr 2.15e-05 | gnorm 1.36 | tok/s 5,792 | VRAM 119GB (58%) | ETA 32.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1570/5000 | loss 5.5568 | ppl 259.0 | lr 2.12e-05 | gnorm 1.36 | tok/s 5,792 | VRAM 119GB (58%) | ETA 32.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1580/5000 | loss 5.4976 | ppl 244.1 | lr 2.09e-05 | gnorm 1.52 | tok/s 5,792 | VRAM 119GB (58%) | ETA 32.2h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1590/5000 | loss 5.6104 | ppl 273.2 | lr 2.06e-05 | gnorm 1.48 | tok/s 5,793 | VRAM 119GB (58%) | ETA 32.1h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1600/5000 | loss 5.4706 | ppl 237.6 | lr 2.03e-05 | gnorm 1.39 | tok/s 5,794 | VRAM 119GB (58%) | ETA 32.0h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.4872 ppl=241.6 (best=5.3969)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1600, full state + optimizer)
  step  1610/5000 | loss 5.3893 | ppl 219.0 | lr 2.00e-05 | gnorm 1.32 | tok/s 5,707 | VRAM 119GB (58%) | ETA 32.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1620/5000 | loss 5.5226 | ppl 250.3 | lr 1.97e-05 | gnorm 1.21 | tok/s 5,560 | VRAM 119GB (58%) | ETA 33.2h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1630/5000 | loss 5.4062 | ppl 222.8 | lr 1.94e-05 | gnorm 1.38 | tok/s 5,419 | VRAM 119GB (58%) | ETA 34.0h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1640/5000 | loss 5.6297 | ppl 278.6 | lr 1.91e-05 | gnorm 1.23 | tok/s 5,286 | VRAM 119GB (58%) | ETA 34.7h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1650/5000 | loss 5.5368 | ppl 253.9 | lr 1.88e-05 | gnorm 1.43 | tok/s 5,159 | VRAM 119GB (58%) | ETA 35.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1660/5000 | loss 5.4787 | ppl 239.5 | lr 1.86e-05 | gnorm 1.18 | tok/s 5,106 | VRAM 119GB (58%) | ETA 35.7h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1670/5000 | loss 5.5194 | ppl 249.5 | lr 1.83e-05 | gnorm 1.15 | tok/s 5,106 | VRAM 119GB (58%) | ETA 35.6h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1680/5000 | loss 5.6273 | ppl 277.9 | lr 1.80e-05 | gnorm 1.62 | tok/s 5,107 | VRAM 119GB (58%) | ETA 35.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1690/5000 | loss 5.3792 | ppl 216.8 | lr 1.78e-05 | gnorm 1.38 | tok/s 5,107 | VRAM 119GB (58%) | ETA 35.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1700/5000 | loss 5.4360 | ppl 229.5 | lr 1.75e-05 | gnorm 1.28 | tok/s 5,108 | VRAM 119GB (58%) | ETA 35.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.4370 ppl=229.8 (best=5.3969)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1700, full state + optimizer)
  step  1710/5000 | loss 5.5455 | ppl 256.1 | lr 1.73e-05 | gnorm 1.70 | tok/s 5,234 | VRAM 119GB (58%) | ETA 34.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1720/5000 | loss 5.4692 | ppl 237.3 | lr 1.71e-05 | gnorm 1.39 | tok/s 5,365 | VRAM 119GB (58%) | ETA 33.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1730/5000 | loss 5.3266 | ppl 205.7 | lr 1.69e-05 | gnorm 1.27 | tok/s 5,504 | VRAM 119GB (58%) | ETA 32.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1740/5000 | loss 5.5804 | ppl 265.2 | lr 1.67e-05 | gnorm 1.09 | tok/s 5,650 | VRAM 119GB (58%) | ETA 31.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1750/5000 | loss 5.4715 | ppl 237.8 | lr 1.65e-05 | gnorm 1.24 | tok/s 5,803 | VRAM 119GB (58%) | ETA 30.6h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1760/5000 | loss 5.4621 | ppl 235.6 | lr 1.63e-05 | gnorm 1.22 | tok/s 5,801 | VRAM 119GB (58%) | ETA 30.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1770/5000 | loss 5.4925 | ppl 242.9 | lr 1.61e-05 | gnorm 1.34 | tok/s 5,801 | VRAM 119GB (58%) | ETA 30.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1780/5000 | loss 5.4751 | ppl 238.7 | lr 1.60e-05 | gnorm 1.16 | tok/s 5,798 | VRAM 119GB (58%) | ETA 30.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1790/5000 | loss 5.4281 | ppl 227.7 | lr 1.58e-05 | gnorm 1.20 | tok/s 5,796 | VRAM 119GB (58%) | ETA 30.2h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1800/5000 | loss 5.4589 | ppl 234.8 | lr 1.57e-05 | gnorm 1.45 | tok/s 5,796 | VRAM 119GB (58%) | ETA 30.2h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.4080 ppl=223.2 (best=5.3969)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1800, full state + optimizer)
  step  1810/5000 | loss 5.4631 | ppl 235.8 | lr 1.55e-05 | gnorm 1.38 | tok/s 5,668 | VRAM 119GB (58%) | ETA 30.7h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1820/5000 | loss 5.4203 | ppl 225.9 | lr 1.54e-05 | gnorm 1.24 | tok/s 5,522 | VRAM 119GB (58%) | ETA 31.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1830/5000 | loss 5.3795 | ppl 216.9 | lr 1.53e-05 | gnorm 1.23 | tok/s 5,380 | VRAM 119GB (58%) | ETA 32.2h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1840/5000 | loss 5.5472 | ppl 256.5 | lr 1.52e-05 | gnorm 1.21 | tok/s 5,247 | VRAM 119GB (58%) | ETA 32.9h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1850/5000 | loss 5.3745 | ppl 215.8 | lr 1.52e-05 | gnorm 1.16 | tok/s 5,118 | VRAM 119GB (58%) | ETA 33.6h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1860/5000 | loss 5.6542 | ppl 285.5 | lr 1.51e-05 | gnorm 1.08 | tok/s 5,094 | VRAM 119GB (58%) | ETA 33.7h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1870/5000 | loss 5.6051 | ppl 271.8 | lr 1.51e-05 | gnorm 1.12 | tok/s 5,089 | VRAM 119GB (58%) | ETA 33.6h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1880/5000 | loss 5.5013 | ppl 245.0 | lr 1.50e-05 | gnorm 1.36 | tok/s 5,093 | VRAM 119GB (58%) | ETA 33.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1890/5000 | loss 5.3653 | ppl 213.9 | lr 1.50e-05 | gnorm 2.14 | tok/s 5,094 | VRAM 119GB (58%) | ETA 33.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1900/5000 | loss 5.4583 | ppl 234.7 | lr 1.51e-05 | gnorm 1.27 | tok/s 5,097 | VRAM 119GB (58%) | ETA 33.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.4447 ppl=231.5 (best=5.3969)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1900, full state + optimizer)
  step  1910/5000 | loss 5.5599 | ppl 259.8 | lr 1.64e-05 | gnorm 1.20 | tok/s 5,154 | VRAM 119GB (58%) | ETA 32.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1920/5000 | loss 5.7572 | ppl 316.5 | lr 1.76e-05 | gnorm 1.41 | tok/s 5,158 | VRAM 119GB (58%) | ETA 32.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1930/5000 | loss 5.5125 | ppl 247.8 | lr 1.89e-05 | gnorm 1.48 | tok/s 5,158 | VRAM 119GB (58%) | ETA 32.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1940/5000 | loss 5.6845 | ppl 294.3 | lr 2.01e-05 | gnorm 1.52 | tok/s 5,160 | VRAM 119GB (58%) | ETA 32.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1950/5000 | loss 5.4571 | ppl 234.4 | lr 2.14e-05 | gnorm 1.20 | tok/s 5,160 | VRAM 119GB (58%) | ETA 32.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1960/5000 | loss 5.4911 | ppl 242.5 | lr 2.26e-05 | gnorm 1.27 | tok/s 5,106 | VRAM 119GB (58%) | ETA 32.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1970/5000 | loss 5.6330 | ppl 279.5 | lr 2.39e-05 | gnorm 1.71 | tok/s 5,106 | VRAM 119GB (58%) | ETA 32.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1980/5000 | loss 5.4440 | ppl 231.4 | lr 2.50e-05 | gnorm 1.47 | tok/s 5,106 | VRAM 119GB (58%) | ETA 32.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  1990/5000 | loss 5.4148 | ppl 224.7 | lr 2.50e-05 | gnorm 1.99 | tok/s 5,105 | VRAM 119GB (58%) | ETA 32.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2000/5000 | loss 5.4218 | ppl 226.3 | lr 2.50e-05 | gnorm 1.95 | tok/s 5,106 | VRAM 119GB (58%) | ETA 32.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.4088 ppl=223.4 (best=5.3969)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2000, full state + optimizer)
  >> MILESTONE step 2000 LOCKED  /mnt/scratch/checkpoints/frankenstein_v2_milestone_2000.pt
  step  2010/5000 | loss 5.4288 | ppl 227.9 | lr 2.50e-05 | gnorm 1.47 | tok/s 5,106 | VRAM 119GB (58%) | ETA 32.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2020/5000 | loss 5.4680 | ppl 237.0 | lr 2.50e-05 | gnorm 1.41 | tok/s 5,106 | VRAM 119GB (58%) | ETA 31.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2030/5000 | loss 5.4027 | ppl 222.0 | lr 2.50e-05 | gnorm 1.52 | tok/s 5,107 | VRAM 119GB (58%) | ETA 31.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2040/5000 | loss 5.5772 | ppl 264.3 | lr 2.50e-05 | gnorm 1.46 | tok/s 5,106 | VRAM 119GB (58%) | ETA 31.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2050/5000 | loss 5.4623 | ppl 235.6 | lr 2.49e-05 | gnorm 1.43 | tok/s 5,107 | VRAM 119GB (58%) | ETA 31.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2060/5000 | loss 5.6075 | ppl 272.5 | lr 2.49e-05 | gnorm 1.66 | tok/s 5,106 | VRAM 119GB (58%) | ETA 31.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2070/5000 | loss 5.4214 | ppl 226.2 | lr 2.49e-05 | gnorm 1.64 | tok/s 5,106 | VRAM 119GB (58%) | ETA 31.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2080/5000 | loss 5.3698 | ppl 214.8 | lr 2.49e-05 | gnorm 1.35 | tok/s 5,105 | VRAM 119GB (58%) | ETA 31.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2090/5000 | loss 5.3795 | ppl 216.9 | lr 2.49e-05 | gnorm 1.62 | tok/s 5,104 | VRAM 119GB (58%) | ETA 31.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2100/5000 | loss 5.4180 | ppl 225.4 | lr 2.48e-05 | gnorm 1.45 | tok/s 5,102 | VRAM 119GB (58%) | ETA 31.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.3973 ppl=220.8 (best=5.3969)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2100, full state + optimizer)
  step  2110/5000 | loss 5.4504 | ppl 232.9 | lr 2.48e-05 | gnorm 1.77 | tok/s 5,226 | VRAM 119GB (58%) | ETA 30.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2120/5000 | loss 5.3458 | ppl 209.7 | lr 2.48e-05 | gnorm 1.35 | tok/s 5,357 | VRAM 119GB (58%) | ETA 29.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2130/5000 | loss 5.2910 | ppl 198.5 | lr 2.48e-05 | gnorm 1.52 | tok/s 5,496 | VRAM 119GB (58%) | ETA 28.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2140/5000 | loss 5.4221 | ppl 226.4 | lr 2.47e-05 | gnorm 1.45 | tok/s 5,643 | VRAM 119GB (58%) | ETA 27.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2150/5000 | loss 5.5331 | ppl 252.9 | lr 2.47e-05 | gnorm 1.34 | tok/s 5,799 | VRAM 119GB (58%) | ETA 26.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2160/5000 | loss 5.4783 | ppl 239.4 | lr 2.47e-05 | gnorm 1.49 | tok/s 5,801 | VRAM 119GB (58%) | ETA 26.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2170/5000 | loss 5.4636 | ppl 235.9 | lr 2.46e-05 | gnorm 1.26 | tok/s 5,802 | VRAM 119GB (58%) | ETA 26.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2180/5000 | loss 5.5501 | ppl 257.3 | lr 2.46e-05 | gnorm 1.55 | tok/s 5,803 | VRAM 119GB (58%) | ETA 26.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2190/5000 | loss 5.4925 | ppl 242.9 | lr 2.45e-05 | gnorm 1.69 | tok/s 5,804 | VRAM 119GB (58%) | ETA 26.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2200/5000 | loss 5.4496 | ppl 232.7 | lr 2.45e-05 | gnorm 1.43 | tok/s 5,801 | VRAM 119GB (58%) | ETA 26.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.4529 ppl=233.4 (best=5.3969)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2200, full state + optimizer)
  step  2210/5000 | loss 5.5127 | ppl 247.8 | lr 2.44e-05 | gnorm 1.30 | tok/s 5,802 | VRAM 119GB (58%) | ETA 26.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2220/5000 | loss 5.6177 | ppl 275.3 | lr 2.44e-05 | gnorm 1.77 | tok/s 5,799 | VRAM 119GB (58%) | ETA 26.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2230/5000 | loss 5.5439 | ppl 255.7 | lr 2.43e-05 | gnorm 1.44 | tok/s 5,796 | VRAM 119GB (58%) | ETA 26.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2240/5000 | loss 5.5208 | ppl 249.8 | lr 2.43e-05 | gnorm 1.66 | tok/s 5,794 | VRAM 119GB (58%) | ETA 26.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2250/5000 | loss 5.3475 | ppl 210.1 | lr 2.42e-05 | gnorm 1.31 | tok/s 5,795 | VRAM 119GB (58%) | ETA 25.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2260/5000 | loss 5.4192 | ppl 225.7 | lr 2.42e-05 | gnorm 1.48 | tok/s 5,793 | VRAM 119GB (58%) | ETA 25.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2270/5000 | loss 5.3100 | ppl 202.4 | lr 2.41e-05 | gnorm 1.54 | tok/s 5,792 | VRAM 119GB (58%) | ETA 25.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2280/5000 | loss 5.5559 | ppl 258.8 | lr 2.41e-05 | gnorm 1.25 | tok/s 5,792 | VRAM 119GB (58%) | ETA 25.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2290/5000 | loss 5.3483 | ppl 210.3 | lr 2.40e-05 | gnorm 1.50 | tok/s 5,792 | VRAM 119GB (58%) | ETA 25.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2300/5000 | loss 5.3261 | ppl 205.6 | lr 2.39e-05 | gnorm 1.72 | tok/s 5,792 | VRAM 119GB (58%) | ETA 25.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.4253 ppl=227.1 (best=5.3969)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2300, full state + optimizer)
  step  2310/5000 | loss 5.5541 | ppl 258.3 | lr 2.39e-05 | gnorm 1.62 | tok/s 5,792 | VRAM 119GB (58%) | ETA 25.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2320/5000 | loss 5.4427 | ppl 231.1 | lr 2.38e-05 | gnorm 1.51 | tok/s 5,795 | VRAM 119GB (58%) | ETA 25.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2330/5000 | loss 5.4080 | ppl 223.2 | lr 2.37e-05 | gnorm 1.51 | tok/s 5,795 | VRAM 119GB (58%) | ETA 25.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2340/5000 | loss 5.6911 | ppl 296.2 | lr 2.37e-05 | gnorm 1.88 | tok/s 5,786 | VRAM 119GB (58%) | ETA 25.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2350/5000 | loss 5.3312 | ppl 206.7 | lr 2.36e-05 | gnorm 1.43 | tok/s 5,779 | VRAM 119GB (58%) | ETA 25.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2360/5000 | loss 5.3899 | ppl 219.2 | lr 2.35e-05 | gnorm 1.48 | tok/s 5,772 | VRAM 119GB (58%) | ETA 25.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2370/5000 | loss 5.2940 | ppl 199.1 | lr 2.35e-05 | gnorm 1.30 | tok/s 5,765 | VRAM 119GB (58%) | ETA 24.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2380/5000 | loss 5.4886 | ppl 241.9 | lr 2.34e-05 | gnorm 1.41 | tok/s 5,760 | VRAM 119GB (58%) | ETA 24.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2390/5000 | loss 5.3869 | ppl 218.5 | lr 2.33e-05 | gnorm 1.41 | tok/s 5,759 | VRAM 119GB (58%) | ETA 24.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2400/5000 | loss 5.5158 | ppl 248.6 | lr 2.32e-05 | gnorm 1.42 | tok/s 5,767 | VRAM 119GB (58%) | ETA 24.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.4569 ppl=234.4 (best=5.3969)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2400, full state + optimizer)
  step  2410/5000 | loss 5.4799 | ppl 239.8 | lr 2.32e-05 | gnorm 1.37 | tok/s 5,772 | VRAM 119GB (58%) | ETA 24.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2420/5000 | loss 5.4761 | ppl 238.9 | lr 2.31e-05 | gnorm 1.62 | tok/s 5,778 | VRAM 119GB (58%) | ETA 24.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2430/5000 | loss 5.3421 | ppl 208.9 | lr 2.30e-05 | gnorm 1.17 | tok/s 5,781 | VRAM 119GB (58%) | ETA 24.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2440/5000 | loss 5.4317 | ppl 228.5 | lr 2.29e-05 | gnorm 1.51 | tok/s 5,790 | VRAM 119GB (58%) | ETA 24.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2450/5000 | loss 5.5262 | ppl 251.2 | lr 2.28e-05 | gnorm 1.63 | tok/s 5,792 | VRAM 119GB (58%) | ETA 24.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2460/5000 | loss 5.3786 | ppl 216.7 | lr 2.27e-05 | gnorm 1.29 | tok/s 5,769 | VRAM 119GB (58%) | ETA 24.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2470/5000 | loss 5.1914 | ppl 179.7 | lr 2.26e-05 | gnorm 1.53 | tok/s 5,619 | VRAM 119GB (58%) | ETA 24.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2480/5000 | loss 5.4186 | ppl 225.6 | lr 2.26e-05 | gnorm 2.28 | tok/s 5,479 | VRAM 119GB (58%) | ETA 25.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2490/5000 | loss 5.3507 | ppl 210.8 | lr 2.25e-05 | gnorm 1.38 | tok/s 5,343 | VRAM 119GB (58%) | ETA 25.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2500/5000 | loss 5.5629 | ppl 260.6 | lr 2.24e-05 | gnorm 1.82 | tok/s 5,212 | VRAM 119GB (58%) | ETA 26.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.3151 ppl=203.4  NEW BEST  saved (+ EMA + full optimizer)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2500, full state + optimizer)
  step  2510/5000 | loss 5.4881 | ppl 241.8 | lr 2.23e-05 | gnorm 1.70 | tok/s 5,233 | VRAM 119GB (58%) | ETA 26.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2520/5000 | loss 5.5318 | ppl 252.6 | lr 2.22e-05 | gnorm 1.57 | tok/s 5,365 | VRAM 119GB (58%) | ETA 25.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2530/5000 | loss 5.2904 | ppl 198.4 | lr 2.21e-05 | gnorm 1.32 | tok/s 5,503 | VRAM 119GB (58%) | ETA 24.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2540/5000 | loss 5.3733 | ppl 215.6 | lr 2.20e-05 | gnorm 1.50 | tok/s 5,649 | VRAM 119GB (58%) | ETA 23.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2550/5000 | loss 5.2482 | ppl 190.2 | lr 2.19e-05 | gnorm 1.33 | tok/s 5,802 | VRAM 119GB (58%) | ETA 23.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2560/5000 | loss 5.3167 | ppl 203.7 | lr 2.18e-05 | gnorm 1.31 | tok/s 5,801 | VRAM 119GB (58%) | ETA 23.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2570/5000 | loss 5.4672 | ppl 236.8 | lr 2.17e-05 | gnorm 1.70 | tok/s 5,801 | VRAM 119GB (58%) | ETA 22.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2580/5000 | loss 5.2871 | ppl 197.8 | lr 2.16e-05 | gnorm 1.20 | tok/s 5,800 | VRAM 119GB (58%) | ETA 22.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2590/5000 | loss 5.5072 | ppl 246.5 | lr 2.15e-05 | gnorm 1.24 | tok/s 5,800 | VRAM 119GB (58%) | ETA 22.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2600/5000 | loss 5.4751 | ppl 238.7 | lr 2.14e-05 | gnorm 1.52 | tok/s 5,800 | VRAM 119GB (58%) | ETA 22.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.2780 ppl=196.0  NEW BEST  saved (+ EMA + full optimizer)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2600, full state + optimizer)
  step  2610/5000 | loss 5.4770 | ppl 239.1 | lr 2.13e-05 | gnorm 1.53 | tok/s 5,661 | VRAM 119GB (58%) | ETA 23.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2620/5000 | loss 5.5508 | ppl 257.5 | lr 2.12e-05 | gnorm 1.20 | tok/s 5,514 | VRAM 119GB (58%) | ETA 23.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2630/5000 | loss 5.5070 | ppl 246.4 | lr 2.11e-05 | gnorm 1.48 | tok/s 5,374 | VRAM 119GB (58%) | ETA 24.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2640/5000 | loss 5.3541 | ppl 211.5 | lr 2.10e-05 | gnorm 1.52 | tok/s 5,240 | VRAM 119GB (58%) | ETA 24.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step  2650/5000 | loss 5.2494 | ppl 190.4 | lr 2.09e-05 | gnorm 1.45 | tok/s 5,114 | VRAM 119GB (58%) | ETA 25.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2660/5000 | loss 5.4428 | ppl 231.1 | lr 2.08e-05 | gnorm 1.55 | tok/s 5,102 | VRAM 119GB (58%) | ETA 25.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2670/5000 | loss 5.3762 | ppl 216.2 | lr 2.07e-05 | gnorm 1.45 | tok/s 5,102 | VRAM 119GB (58%) | ETA 24.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2680/5000 | loss 5.4470 | ppl 232.1 | lr 2.06e-05 | gnorm 2.48 | tok/s 5,102 | VRAM 119GB (58%) | ETA 24.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2690/5000 | loss 5.5052 | ppl 246.0 | lr 2.05e-05 | gnorm 1.78 | tok/s 5,101 | VRAM 119GB (58%) | ETA 24.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2700/5000 | loss 5.3143 | ppl 203.2 | lr 2.04e-05 | gnorm 1.28 | tok/s 5,101 | VRAM 119GB (58%) | ETA 24.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.3290 ppl=206.2 (best=5.2780)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2700, full state + optimizer)
  step  2710/5000 | loss 5.4231 | ppl 226.6 | lr 2.03e-05 | gnorm 1.38 | tok/s 5,143 | VRAM 119GB (58%) | ETA 24.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2720/5000 | loss 5.5591 | ppl 259.6 | lr 2.02e-05 | gnorm 1.53 | tok/s 5,144 | VRAM 119GB (58%) | ETA 24.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2730/5000 | loss 5.4649 | ppl 236.2 | lr 2.01e-05 | gnorm 1.17 | tok/s 5,145 | VRAM 119GB (58%) | ETA 24.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2740/5000 | loss 5.3355 | ppl 207.6 | lr 2.00e-05 | gnorm 1.28 | tok/s 5,147 | VRAM 119GB (58%) | ETA 24.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2750/5000 | loss 5.2728 | ppl 195.0 | lr 1.99e-05 | gnorm 1.85 | tok/s 5,148 | VRAM 119GB (58%) | ETA 23.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2760/5000 | loss 5.4497 | ppl 232.7 | lr 1.98e-05 | gnorm 1.47 | tok/s 5,105 | VRAM 119GB (58%) | ETA 24.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2770/5000 | loss 5.4827 | ppl 240.5 | lr 1.97e-05 | gnorm 1.34 | tok/s 5,104 | VRAM 119GB (58%) | ETA 23.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2780/5000 | loss 5.4144 | ppl 224.6 | lr 1.96e-05 | gnorm 1.55 | tok/s 5,104 | VRAM 119GB (58%) | ETA 23.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2790/5000 | loss 5.5006 | ppl 244.8 | lr 1.95e-05 | gnorm 1.24 | tok/s 5,103 | VRAM 119GB (58%) | ETA 23.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2800/5000 | loss 5.4444 | ppl 231.5 | lr 1.94e-05 | gnorm 1.48 | tok/s 5,100 | VRAM 119GB (58%) | ETA 23.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.3721 ppl=215.3 (best=5.2780)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2800, full state + optimizer)
  step  2810/5000 | loss 5.3741 | ppl 215.7 | lr 1.93e-05 | gnorm 1.17 | tok/s 5,224 | VRAM 119GB (58%) | ETA 22.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2820/5000 | loss 5.6197 | ppl 275.8 | lr 1.92e-05 | gnorm 2.17 | tok/s 5,355 | VRAM 119GB (58%) | ETA 22.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2830/5000 | loss 5.3919 | ppl 219.6 | lr 1.91e-05 | gnorm 1.27 | tok/s 5,493 | VRAM 119GB (58%) | ETA 21.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2840/5000 | loss 5.3163 | ppl 203.6 | lr 1.90e-05 | gnorm 1.41 | tok/s 5,638 | VRAM 119GB (58%) | ETA 20.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2850/5000 | loss 5.4198 | ppl 225.8 | lr 1.89e-05 | gnorm 1.44 | tok/s 5,794 | VRAM 119GB (58%) | ETA 20.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2860/5000 | loss 5.2469 | ppl 190.0 | lr 1.88e-05 | gnorm 1.37 | tok/s 5,796 | VRAM 119GB (58%) | ETA 20.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2870/5000 | loss 5.5315 | ppl 252.5 | lr 1.87e-05 | gnorm 1.55 | tok/s 5,797 | VRAM 119GB (58%) | ETA 20.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2880/5000 | loss 5.4689 | ppl 237.2 | lr 1.86e-05 | gnorm 1.25 | tok/s 5,798 | VRAM 119GB (58%) | ETA 20.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2890/5000 | loss 5.2487 | ppl 190.3 | lr 1.85e-05 | gnorm 1.24 | tok/s 5,797 | VRAM 119GB (58%) | ETA 19.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2900/5000 | loss 5.3440 | ppl 209.3 | lr 1.84e-05 | gnorm 1.41 | tok/s 5,797 | VRAM 119GB (58%) | ETA 19.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.3601 ppl=212.7 (best=5.2780)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2900, full state + optimizer)
  step  2910/5000 | loss 5.2046 | ppl 182.1 | lr 1.83e-05 | gnorm 1.49 | tok/s 5,798 | VRAM 119GB (58%) | ETA 19.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2920/5000 | loss 5.1114 | ppl 165.9 | lr 1.82e-05 | gnorm 1.38 | tok/s 5,798 | VRAM 119GB (58%) | ETA 19.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2930/5000 | loss 5.3641 | ppl 213.6 | lr 1.81e-05 | gnorm 1.46 | tok/s 5,798 | VRAM 119GB (58%) | ETA 19.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2940/5000 | loss 5.5069 | ppl 246.4 | lr 1.80e-05 | gnorm 1.45 | tok/s 5,798 | VRAM 119GB (58%) | ETA 19.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2950/5000 | loss 5.2951 | ppl 199.4 | lr 1.79e-05 | gnorm 1.23 | tok/s 5,799 | VRAM 119GB (58%) | ETA 19.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2960/5000 | loss 5.3876 | ppl 218.7 | lr 1.78e-05 | gnorm 1.40 | tok/s 5,799 | VRAM 119GB (58%) | ETA 19.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2970/5000 | loss 5.4361 | ppl 229.5 | lr 1.77e-05 | gnorm 1.27 | tok/s 5,799 | VRAM 119GB (58%) | ETA 19.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2980/5000 | loss 5.3690 | ppl 214.7 | lr 1.76e-05 | gnorm 1.24 | tok/s 5,799 | VRAM 119GB (58%) | ETA 19.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  2990/5000 | loss 5.3255 | ppl 205.5 | lr 1.75e-05 | gnorm 1.56 | tok/s 5,799 | VRAM 119GB (58%) | ETA 18.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3000/5000 | loss 5.3644 | ppl 213.7 | lr 1.74e-05 | gnorm 2.14 | tok/s 5,798 | VRAM 119GB (58%) | ETA 18.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.2933 ppl=199.0 (best=5.2780)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3000, full state + optimizer)
  >> MILESTONE step 3000 LOCKED  /mnt/scratch/checkpoints/frankenstein_v2_milestone_3000.pt
  step  3010/5000 | loss 5.3409 | ppl 208.7 | lr 1.74e-05 | gnorm 1.38 | tok/s 5,799 | VRAM 119GB (58%) | ETA 18.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3020/5000 | loss 5.3559 | ppl 211.8 | lr 1.73e-05 | gnorm 1.37 | tok/s 5,799 | VRAM 119GB (58%) | ETA 18.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3030/5000 | loss 5.2910 | ppl 198.5 | lr 1.72e-05 | gnorm 1.23 | tok/s 5,801 | VRAM 119GB (58%) | ETA 18.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3040/5000 | loss 5.4000 | ppl 221.4 | lr 1.71e-05 | gnorm 1.20 | tok/s 5,802 | VRAM 119GB (58%) | ETA 18.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3050/5000 | loss 5.3424 | ppl 209.0 | lr 1.70e-05 | gnorm 1.34 | tok/s 5,803 | VRAM 119GB (58%) | ETA 18.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3060/5000 | loss 5.2612 | ppl 192.7 | lr 1.69e-05 | gnorm 1.39 | tok/s 5,803 | VRAM 119GB (58%) | ETA 18.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3070/5000 | loss 5.4217 | ppl 226.3 | lr 1.68e-05 | gnorm 1.41 | tok/s 5,803 | VRAM 119GB (58%) | ETA 18.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3080/5000 | loss 5.3010 | ppl 200.5 | lr 1.68e-05 | gnorm 1.29 | tok/s 5,802 | VRAM 119GB (58%) | ETA 18.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3090/5000 | loss 5.3627 | ppl 213.3 | lr 1.67e-05 | gnorm 1.23 | tok/s 5,802 | VRAM 119GB (58%) | ETA 18.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3100/5000 | loss 5.3135 | ppl 203.1 | lr 1.66e-05 | gnorm 1.12 | tok/s 5,802 | VRAM 119GB (58%) | ETA 17.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.2799 ppl=196.4 (best=5.2780)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3100, full state + optimizer)
  step  3110/5000 | loss 5.4153 | ppl 224.8 | lr 1.65e-05 | gnorm 1.43 | tok/s 5,803 | VRAM 119GB (58%) | ETA 17.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3120/5000 | loss 5.3672 | ppl 214.3 | lr 1.65e-05 | gnorm 1.20 | tok/s 5,803 | VRAM 119GB (58%) | ETA 17.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3130/5000 | loss 5.4669 | ppl 236.7 | lr 1.64e-05 | gnorm 1.30 | tok/s 5,802 | VRAM 119GB (58%) | ETA 17.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3140/5000 | loss 5.4039 | ppl 222.3 | lr 1.63e-05 | gnorm 1.24 | tok/s 5,802 | VRAM 119GB (58%) | ETA 17.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3150/5000 | loss 5.2486 | ppl 190.3 | lr 1.63e-05 | gnorm 1.16 | tok/s 5,802 | VRAM 119GB (58%) | ETA 17.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3160/5000 | loss 5.3892 | ppl 219.0 | lr 1.62e-05 | gnorm 1.38 | tok/s 5,802 | VRAM 119GB (58%) | ETA 17.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3170/5000 | loss 5.4131 | ppl 224.3 | lr 1.61e-05 | gnorm 1.31 | tok/s 5,801 | VRAM 119GB (58%) | ETA 17.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3180/5000 | loss 5.3139 | ppl 203.1 | lr 1.61e-05 | gnorm 1.34 | tok/s 5,804 | VRAM 119GB (58%) | ETA 17.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3190/5000 | loss 5.3839 | ppl 217.9 | lr 1.60e-05 | gnorm 1.19 | tok/s 5,804 | VRAM 119GB (58%) | ETA 17.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3200/5000 | loss 5.4903 | ppl 242.3 | lr 1.59e-05 | gnorm 1.38 | tok/s 5,804 | VRAM 119GB (58%) | ETA 16.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.4004 ppl=221.5 (best=5.2780)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3200, full state + optimizer)
  step  3210/5000 | loss 5.2450 | ppl 189.6 | lr 1.59e-05 | gnorm 1.24 | tok/s 5,804 | VRAM 119GB (58%) | ETA 16.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3220/5000 | loss 5.2446 | ppl 189.5 | lr 1.58e-05 | gnorm 1.69 | tok/s 5,803 | VRAM 119GB (58%) | ETA 16.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3230/5000 | loss 5.4406 | ppl 230.6 | lr 1.58e-05 | gnorm 1.26 | tok/s 5,801 | VRAM 119GB (58%) | ETA 16.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3240/5000 | loss 5.2381 | ppl 188.3 | lr 1.57e-05 | gnorm 1.20 | tok/s 5,800 | VRAM 119GB (58%) | ETA 16.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3250/5000 | loss 5.4181 | ppl 225.4 | lr 1.57e-05 | gnorm 1.26 | tok/s 5,798 | VRAM 119GB (58%) | ETA 16.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3260/5000 | loss 5.4927 | ppl 242.9 | lr 1.56e-05 | gnorm 1.42 | tok/s 5,797 | VRAM 119GB (58%) | ETA 16.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3270/5000 | loss 5.3934 | ppl 220.0 | lr 1.56e-05 | gnorm 1.38 | tok/s 5,797 | VRAM 119GB (58%) | ETA 16.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3280/5000 | loss 5.5290 | ppl 251.9 | lr 1.55e-05 | gnorm 1.24 | tok/s 5,798 | VRAM 119GB (58%) | ETA 16.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3290/5000 | loss 5.5065 | ppl 246.3 | lr 1.55e-05 | gnorm 1.44 | tok/s 5,798 | VRAM 119GB (58%) | ETA 16.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3300/5000 | loss 5.4718 | ppl 237.9 | lr 1.54e-05 | gnorm 1.27 | tok/s 5,799 | VRAM 119GB (58%) | ETA 16.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.3304 ppl=206.5 (best=5.2780)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3300, full state + optimizer)
  step  3310/5000 | loss 5.5244 | ppl 250.7 | lr 1.54e-05 | gnorm 1.25 | tok/s 5,649 | VRAM 119GB (58%) | ETA 16.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3320/5000 | loss 5.3519 | ppl 211.0 | lr 1.53e-05 | gnorm 1.36 | tok/s 5,504 | VRAM 119GB (58%) | ETA 16.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3330/5000 | loss 5.3470 | ppl 210.0 | lr 1.53e-05 | gnorm 1.41 | tok/s 5,367 | VRAM 119GB (58%) | ETA 17.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3340/5000 | loss 5.4374 | ppl 229.8 | lr 1.53e-05 | gnorm 1.27 | tok/s 5,235 | VRAM 119GB (58%) | ETA 17.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3350/5000 | loss 5.2242 | ppl 185.7 | lr 1.52e-05 | gnorm 1.44 | tok/s 5,109 | VRAM 119GB (58%) | ETA 17.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3360/5000 | loss 5.2989 | ppl 200.1 | lr 1.52e-05 | gnorm 1.40 | tok/s 5,107 | VRAM 119GB (58%) | ETA 17.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3370/5000 | loss 5.4252 | ppl 227.0 | lr 1.52e-05 | gnorm 1.23 | tok/s 5,106 | VRAM 119GB (58%) | ETA 17.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3380/5000 | loss 5.3276 | ppl 205.9 | lr 1.52e-05 | gnorm 1.49 | tok/s 5,106 | VRAM 119GB (58%) | ETA 17.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3390/5000 | loss 5.3532 | ppl 211.3 | lr 1.51e-05 | gnorm 1.36 | tok/s 5,107 | VRAM 119GB (58%) | ETA 17.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3400/5000 | loss 5.3068 | ppl 201.7 | lr 1.51e-05 | gnorm 1.52 | tok/s 5,107 | VRAM 119GB (58%) | ETA 17.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.3810 ppl=217.2 (best=5.2780)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3400, full state + optimizer)
  step  3410/5000 | loss 5.3361 | ppl 207.7 | lr 1.51e-05 | gnorm 1.18 | tok/s 5,232 | VRAM 119GB (58%) | ETA 16.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3420/5000 | loss 5.3199 | ppl 204.4 | lr 1.51e-05 | gnorm 1.35 | tok/s 5,364 | VRAM 119GB (58%) | ETA 16.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3430/5000 | loss 5.3233 | ppl 205.1 | lr 1.51e-05 | gnorm 1.18 | tok/s 5,502 | VRAM 119GB (58%) | ETA 15.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3440/5000 | loss 5.3585 | ppl 212.4 | lr 1.50e-05 | gnorm 1.52 | tok/s 5,647 | VRAM 119GB (58%) | ETA 15.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3450/5000 | loss 5.3438 | ppl 209.3 | lr 1.50e-05 | gnorm 1.23 | tok/s 5,801 | VRAM 119GB (58%) | ETA 14.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3460/5000 | loss 5.3168 | ppl 203.7 | lr 1.50e-05 | gnorm 1.30 | tok/s 5,801 | VRAM 119GB (58%) | ETA 14.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3470/5000 | loss 5.2030 | ppl 181.8 | lr 1.50e-05 | gnorm 1.28 | tok/s 5,802 | VRAM 119GB (58%) | ETA 14.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3480/5000 | loss 5.3649 | ppl 213.8 | lr 1.50e-05 | gnorm 1.52 | tok/s 5,802 | VRAM 119GB (58%) | ETA 14.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3490/5000 | loss 5.3546 | ppl 211.6 | lr 1.50e-05 | gnorm 1.18 | tok/s 5,801 | VRAM 119GB (58%) | ETA 14.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3500/5000 | loss 5.3840 | ppl 217.9 | lr 1.50e-05 | gnorm 1.26 | tok/s 5,802 | VRAM 119GB (58%) | ETA 14.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.3131 ppl=203.0 (best=5.2780)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3500, full state + optimizer)
  step  3510/5000 | loss 5.4281 | ppl 227.7 | lr 1.56e-05 | gnorm 1.52 | tok/s 5,801 | VRAM 119GB (58%) | ETA 14.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3520/5000 | loss 5.3775 | ppl 216.5 | lr 1.61e-05 | gnorm 1.29 | tok/s 5,801 | VRAM 119GB (58%) | ETA 13.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3530/5000 | loss 5.4500 | ppl 232.8 | lr 1.66e-05 | gnorm 1.25 | tok/s 5,801 | VRAM 119GB (58%) | ETA 13.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3540/5000 | loss 5.2171 | ppl 184.4 | lr 1.71e-05 | gnorm 1.27 | tok/s 5,800 | VRAM 119GB (58%) | ETA 13.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3550/5000 | loss 5.3167 | ppl 203.7 | lr 1.75e-05 | gnorm 1.41 | tok/s 5,801 | VRAM 119GB (58%) | ETA 13.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3560/5000 | loss 5.4094 | ppl 223.5 | lr 1.81e-05 | gnorm 1.34 | tok/s 5,801 | VRAM 119GB (58%) | ETA 13.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3570/5000 | loss 5.2578 | ppl 192.1 | lr 1.86e-05 | gnorm 1.38 | tok/s 5,800 | VRAM 119GB (58%) | ETA 13.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3580/5000 | loss 5.3372 | ppl 207.9 | lr 1.91e-05 | gnorm 1.16 | tok/s 5,800 | VRAM 119GB (58%) | ETA 13.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3590/5000 | loss 5.2401 | ppl 188.7 | lr 1.96e-05 | gnorm 1.39 | tok/s 5,800 | VRAM 119GB (58%) | ETA 13.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3600/5000 | loss 5.4146 | ppl 224.7 | lr 2.00e-05 | gnorm 1.57 | tok/s 5,799 | VRAM 119GB (58%) | ETA 13.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.3594 ppl=212.6 (best=5.2780)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3600, full state + optimizer)
  step  3610/5000 | loss 5.3050 | ppl 201.3 | lr 2.00e-05 | gnorm 1.45 | tok/s 5,799 | VRAM 119GB (58%) | ETA 13.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3620/5000 | loss 5.5146 | ppl 248.3 | lr 2.00e-05 | gnorm 1.53 | tok/s 5,800 | VRAM 119GB (58%) | ETA 13.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3630/5000 | loss 5.4447 | ppl 231.5 | lr 2.00e-05 | gnorm 1.62 | tok/s 5,800 | VRAM 119GB (58%) | ETA 12.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3640/5000 | loss 5.3102 | ppl 202.4 | lr 2.00e-05 | gnorm 1.59 | tok/s 5,799 | VRAM 119GB (58%) | ETA 12.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3650/5000 | loss 5.5038 | ppl 245.6 | lr 2.00e-05 | gnorm 1.31 | tok/s 5,799 | VRAM 119GB (58%) | ETA 12.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3660/5000 | loss 5.4059 | ppl 222.7 | lr 2.00e-05 | gnorm 1.48 | tok/s 5,799 | VRAM 119GB (58%) | ETA 12.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3670/5000 | loss 5.3952 | ppl 220.3 | lr 2.00e-05 | gnorm 1.60 | tok/s 5,798 | VRAM 119GB (58%) | ETA 12.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3680/5000 | loss 5.3411 | ppl 208.7 | lr 2.00e-05 | gnorm 1.50 | tok/s 5,799 | VRAM 119GB (58%) | ETA 12.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3690/5000 | loss 5.4777 | ppl 239.3 | lr 1.99e-05 | gnorm 1.41 | tok/s 5,800 | VRAM 119GB (58%) | ETA 12.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3700/5000 | loss 5.4171 | ppl 225.2 | lr 1.99e-05 | gnorm 1.80 | tok/s 5,800 | VRAM 119GB (58%) | ETA 12.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.1359 ppl=170.0  NEW BEST  saved (+ EMA + full optimizer)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3700, full state + optimizer)
  step  3710/5000 | loss 5.3428 | ppl 209.1 | lr 1.99e-05 | gnorm 1.38 | tok/s 5,799 | VRAM 119GB (58%) | ETA 12.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3720/5000 | loss 5.3910 | ppl 219.4 | lr 1.99e-05 | gnorm 1.45 | tok/s 5,800 | VRAM 119GB (58%) | ETA 12.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3730/5000 | loss 5.4490 | ppl 232.5 | lr 1.99e-05 | gnorm 1.39 | tok/s 5,799 | VRAM 119GB (58%) | ETA 12.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3740/5000 | loss 5.4098 | ppl 223.6 | lr 1.99e-05 | gnorm 1.27 | tok/s 5,799 | VRAM 119GB (58%) | ETA 11.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3750/5000 | loss 5.3475 | ppl 210.1 | lr 1.99e-05 | gnorm 1.31 | tok/s 5,799 | VRAM 119GB (58%) | ETA 11.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3760/5000 | loss 5.3055 | ppl 201.4 | lr 1.98e-05 | gnorm 1.39 | tok/s 5,800 | VRAM 119GB (58%) | ETA 11.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3770/5000 | loss 5.4678 | ppl 236.9 | lr 1.98e-05 | gnorm 1.23 | tok/s 5,727 | VRAM 119GB (58%) | ETA 11.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3780/5000 | loss 5.3639 | ppl 213.6 | lr 1.98e-05 | gnorm 1.44 | tok/s 5,579 | VRAM 119GB (58%) | ETA 11.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3790/5000 | loss 5.3999 | ppl 221.4 | lr 1.98e-05 | gnorm 1.28 | tok/s 5,438 | VRAM 119GB (58%) | ETA 12.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3800/5000 | loss 5.4344 | ppl 229.2 | lr 1.98e-05 | gnorm 1.35 | tok/s 5,304 | VRAM 119GB (58%) | ETA 12.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.3704 ppl=214.9 (best=5.1359)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3800, full state + optimizer)
  step  3810/5000 | loss 5.4967 | ppl 243.9 | lr 1.97e-05 | gnorm 1.23 | tok/s 5,305 | VRAM 119GB (58%) | ETA 12.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3820/5000 | loss 5.2320 | ppl 187.2 | lr 1.97e-05 | gnorm 1.29 | tok/s 5,367 | VRAM 119GB (58%) | ETA 12.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3830/5000 | loss 5.1748 | ppl 176.8 | lr 1.97e-05 | gnorm 1.30 | tok/s 5,504 | VRAM 119GB (58%) | ETA 11.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3840/5000 | loss 5.2995 | ppl 200.2 | lr 1.96e-05 | gnorm 1.52 | tok/s 5,650 | VRAM 119GB (58%) | ETA 11.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3850/5000 | loss 5.4423 | ppl 231.0 | lr 1.96e-05 | gnorm 1.49 | tok/s 5,803 | VRAM 119GB (58%) | ETA 10.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3860/5000 | loss 5.3073 | ppl 201.8 | lr 1.96e-05 | gnorm 1.13 | tok/s 5,802 | VRAM 119GB (58%) | ETA 10.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3870/5000 | loss 5.4388 | ppl 230.2 | lr 1.96e-05 | gnorm 1.30 | tok/s 5,802 | VRAM 119GB (58%) | ETA 10.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3880/5000 | loss 5.3153 | ppl 203.4 | lr 1.95e-05 | gnorm 1.48 | tok/s 5,802 | VRAM 119GB (58%) | ETA 10.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3890/5000 | loss 5.2094 | ppl 183.0 | lr 1.95e-05 | gnorm 1.41 | tok/s 5,801 | VRAM 119GB (58%) | ETA 10.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3900/5000 | loss 5.3008 | ppl 200.5 | lr 1.95e-05 | gnorm 1.34 | tok/s 5,801 | VRAM 119GB (58%) | ETA 10.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.3238 ppl=205.2 (best=5.1359)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3900, full state + optimizer)
  step  3910/5000 | loss 5.3193 | ppl 204.2 | lr 1.94e-05 | gnorm 1.52 | tok/s 5,665 | VRAM 119GB (58%) | ETA 10.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3920/5000 | loss 5.4208 | ppl 226.1 | lr 1.94e-05 | gnorm 1.45 | tok/s 5,518 | VRAM 119GB (58%) | ETA 10.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3930/5000 | loss 5.3002 | ppl 200.4 | lr 1.93e-05 | gnorm 1.27 | tok/s 5,378 | VRAM 119GB (58%) | ETA 10.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3940/5000 | loss 5.2733 | ppl 195.1 | lr 1.93e-05 | gnorm 1.37 | tok/s 5,246 | VRAM 119GB (58%) | ETA 11.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3950/5000 | loss 5.3122 | ppl 202.8 | lr 1.93e-05 | gnorm 1.45 | tok/s 5,120 | VRAM 119GB (58%) | ETA 11.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3960/5000 | loss 5.3368 | ppl 207.8 | lr 1.92e-05 | gnorm 1.55 | tok/s 5,106 | VRAM 119GB (58%) | ETA 11.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3970/5000 | loss 5.3380 | ppl 208.1 | lr 1.92e-05 | gnorm 1.30 | tok/s 5,106 | VRAM 119GB (58%) | ETA 11.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3980/5000 | loss 5.3551 | ppl 211.7 | lr 1.91e-05 | gnorm 1.45 | tok/s 5,106 | VRAM 119GB (58%) | ETA 10.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  3990/5000 | loss 5.3380 | ppl 208.1 | lr 1.91e-05 | gnorm 1.48 | tok/s 5,106 | VRAM 119GB (58%) | ETA 10.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4000/5000 | loss 5.2200 | ppl 184.9 | lr 1.91e-05 | gnorm 1.62 | tok/s 5,106 | VRAM 119GB (58%) | ETA 10.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.2679 ppl=194.0 (best=5.1359)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4000, full state + optimizer)
  >> MILESTONE step 4000 LOCKED  /mnt/scratch/checkpoints/frankenstein_v2_milestone_4000.pt
  step  4010/5000 | loss 5.2624 | ppl 192.9 | lr 1.90e-05 | gnorm 1.23 | tok/s 5,231 | VRAM 119GB (58%) | ETA 10.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4020/5000 | loss 5.3521 | ppl 211.1 | lr 1.90e-05 | gnorm 1.54 | tok/s 5,363 | VRAM 119GB (58%) | ETA 10.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4030/5000 | loss 5.3664 | ppl 214.1 | lr 1.89e-05 | gnorm 1.34 | tok/s 5,501 | VRAM 119GB (58%) | ETA 9.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4040/5000 | loss 5.2996 | ppl 200.2 | lr 1.89e-05 | gnorm 1.35 | tok/s 5,647 | VRAM 119GB (58%) | ETA 9.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4050/5000 | loss 5.2745 | ppl 195.3 | lr 1.88e-05 | gnorm 1.32 | tok/s 5,801 | VRAM 119GB (58%) | ETA 8.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4060/5000 | loss 5.2770 | ppl 195.8 | lr 1.88e-05 | gnorm 1.58 | tok/s 5,802 | VRAM 119GB (58%) | ETA 8.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4070/5000 | loss 5.3921 | ppl 219.7 | lr 1.87e-05 | gnorm 1.35 | tok/s 5,803 | VRAM 119GB (58%) | ETA 8.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4080/5000 | loss 5.2833 | ppl 197.0 | lr 1.87e-05 | gnorm 1.59 | tok/s 5,803 | VRAM 119GB (58%) | ETA 8.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4090/5000 | loss 5.4707 | ppl 237.6 | lr 1.86e-05 | gnorm 1.24 | tok/s 5,803 | VRAM 119GB (58%) | ETA 8.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4100/5000 | loss 5.3238 | ppl 205.2 | lr 1.86e-05 | gnorm 1.59 | tok/s 5,804 | VRAM 119GB (58%) | ETA 8.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.3574 ppl=212.2 (best=5.1359)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4100, full state + optimizer)
  step  4110/5000 | loss 5.3696 | ppl 214.8 | lr 1.85e-05 | gnorm 1.28 | tok/s 5,804 | VRAM 119GB (58%) | ETA 8.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4120/5000 | loss 5.2697 | ppl 194.3 | lr 1.85e-05 | gnorm 1.65 | tok/s 5,804 | VRAM 119GB (58%) | ETA 8.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4130/5000 | loss 5.2788 | ppl 196.1 | lr 1.84e-05 | gnorm 1.10 | tok/s 5,803 | VRAM 119GB (58%) | ETA 8.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4140/5000 | loss 5.3446 | ppl 209.5 | lr 1.84e-05 | gnorm 1.30 | tok/s 5,802 | VRAM 119GB (58%) | ETA 8.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4150/5000 | loss 5.1764 | ppl 177.0 | lr 1.83e-05 | gnorm 1.35 | tok/s 5,802 | VRAM 119GB (58%) | ETA 8.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4160/5000 | loss 5.2608 | ppl 192.6 | lr 1.83e-05 | gnorm 1.51 | tok/s 5,801 | VRAM 119GB (58%) | ETA 7.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4170/5000 | loss 5.3678 | ppl 214.4 | lr 1.82e-05 | gnorm 1.24 | tok/s 5,799 | VRAM 119GB (58%) | ETA 7.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4180/5000 | loss 5.2964 | ppl 199.6 | lr 1.82e-05 | gnorm 1.54 | tok/s 5,799 | VRAM 119GB (58%) | ETA 7.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4190/5000 | loss 5.4885 | ppl 241.9 | lr 1.81e-05 | gnorm 1.15 | tok/s 5,800 | VRAM 119GB (58%) | ETA 7.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4200/5000 | loss 5.4148 | ppl 224.7 | lr 1.81e-05 | gnorm 1.41 | tok/s 5,798 | VRAM 119GB (58%) | ETA 7.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.2861 ppl=197.6 (best=5.1359)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4200, full state + optimizer)
  step  4210/5000 | loss 5.3041 | ppl 201.2 | lr 1.80e-05 | gnorm 1.33 | tok/s 5,799 | VRAM 119GB (58%) | ETA 7.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4220/5000 | loss 5.3657 | ppl 213.9 | lr 1.79e-05 | gnorm 1.62 | tok/s 5,798 | VRAM 119GB (58%) | ETA 7.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4230/5000 | loss 5.3724 | ppl 215.4 | lr 1.79e-05 | gnorm 1.45 | tok/s 5,799 | VRAM 119GB (58%) | ETA 7.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4240/5000 | loss 5.2941 | ppl 199.2 | lr 1.78e-05 | gnorm 1.55 | tok/s 5,799 | VRAM 119GB (58%) | ETA 7.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4250/5000 | loss 5.4146 | ppl 224.7 | lr 1.78e-05 | gnorm 1.48 | tok/s 5,799 | VRAM 119GB (58%) | ETA 7.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4260/5000 | loss 5.4285 | ppl 227.8 | lr 1.77e-05 | gnorm 1.37 | tok/s 5,800 | VRAM 119GB (58%) | ETA 7.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
  step  4270/5000 | loss 5.2140 | ppl 183.8 | lr 1.77e-05 | gnorm 1.35 | tok/s 5,801 | VRAM 119GB (58%) | ETA 6.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4280/5000 | loss 5.1953 | ppl 180.4 | lr 1.76e-05 | gnorm 1.50 | tok/s 5,800 | VRAM 119GB (58%) | ETA 6.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4290/5000 | loss 5.2776 | ppl 195.9 | lr 1.76e-05 | gnorm 1.45 | tok/s 5,800 | VRAM 119GB (58%) | ETA 6.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4300/5000 | loss 5.3880 | ppl 218.8 | lr 1.75e-05 | gnorm 1.26 | tok/s 5,800 | VRAM 119GB (58%) | ETA 6.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.2932 ppl=199.0 (best=5.1359)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4300, full state + optimizer)
  step  4310/5000 | loss 5.2318 | ppl 187.1 | lr 1.74e-05 | gnorm 1.37 | tok/s 5,800 | VRAM 119GB (58%) | ETA 6.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4320/5000 | loss 5.1866 | ppl 178.9 | lr 1.74e-05 | gnorm 1.57 | tok/s 5,799 | VRAM 119GB (58%) | ETA 6.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4330/5000 | loss 5.2669 | ppl 193.8 | lr 1.73e-05 | gnorm 1.50 | tok/s 5,799 | VRAM 119GB (58%) | ETA 6.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4340/5000 | loss 5.2029 | ppl 181.8 | lr 1.73e-05 | gnorm 1.37 | tok/s 5,798 | VRAM 119GB (58%) | ETA 6.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4350/5000 | loss 5.2440 | ppl 189.4 | lr 1.72e-05 | gnorm 1.26 | tok/s 5,798 | VRAM 119GB (58%) | ETA 6.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4360/5000 | loss 5.4278 | ppl 227.7 | lr 1.72e-05 | gnorm 1.44 | tok/s 5,798 | VRAM 119GB (58%) | ETA 6.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4370/5000 | loss 5.3880 | ppl 218.8 | lr 1.71e-05 | gnorm 1.33 | tok/s 5,799 | VRAM 119GB (58%) | ETA 5.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4380/5000 | loss 5.3397 | ppl 208.5 | lr 1.71e-05 | gnorm 1.72 | tok/s 5,799 | VRAM 119GB (58%) | ETA 5.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4390/5000 | loss 5.3737 | ppl 215.7 | lr 1.70e-05 | gnorm 1.26 | tok/s 5,799 | VRAM 119GB (58%) | ETA 5.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4400/5000 | loss 5.3451 | ppl 209.6 | lr 1.69e-05 | gnorm 1.29 | tok/s 5,799 | VRAM 119GB (58%) | ETA 5.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.2215 ppl=185.2 (best=5.1359)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4400, full state + optimizer)
  step  4410/5000 | loss 5.2720 | ppl 194.8 | lr 1.69e-05 | gnorm 1.45 | tok/s 5,648 | VRAM 119GB (58%) | ETA 5.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4420/5000 | loss 5.3076 | ppl 201.9 | lr 1.68e-05 | gnorm 1.42 | tok/s 5,501 | VRAM 119GB (58%) | ETA 5.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4430/5000 | loss 5.3635 | ppl 213.5 | lr 1.68e-05 | gnorm 1.46 | tok/s 5,363 | VRAM 119GB (58%) | ETA 5.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4440/5000 | loss 5.4320 | ppl 228.6 | lr 1.67e-05 | gnorm 1.56 | tok/s 5,232 | VRAM 119GB (58%) | ETA 5.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4450/5000 | loss 5.3418 | ppl 208.9 | lr 1.67e-05 | gnorm 1.40 | tok/s 5,107 | VRAM 119GB (58%) | ETA 5.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4460/5000 | loss 5.3990 | ppl 221.2 | lr 1.66e-05 | gnorm 1.55 | tok/s 5,104 | VRAM 119GB (58%) | ETA 5.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4470/5000 | loss 5.2649 | ppl 193.4 | lr 1.66e-05 | gnorm 1.26 | tok/s 5,105 | VRAM 119GB (58%) | ETA 5.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4480/5000 | loss 5.3802 | ppl 217.1 | lr 1.65e-05 | gnorm 1.33 | tok/s 5,105 | VRAM 119GB (58%) | ETA 5.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4490/5000 | loss 5.1792 | ppl 177.5 | lr 1.65e-05 | gnorm 1.45 | tok/s 5,104 | VRAM 119GB (58%) | ETA 5.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4500/5000 | loss 5.4291 | ppl 227.9 | lr 1.64e-05 | gnorm 1.23 | tok/s 5,104 | VRAM 119GB (58%) | ETA 5.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.2665 ppl=193.7 (best=5.1359)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4500, full state + optimizer)
  step  4510/5000 | loss 5.2796 | ppl 196.3 | lr 1.64e-05 | gnorm 1.22 | tok/s 5,231 | VRAM 119GB (58%) | ETA 5.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4520/5000 | loss 5.2812 | ppl 196.6 | lr 1.63e-05 | gnorm 1.88 | tok/s 5,362 | VRAM 119GB (58%) | ETA 4.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4530/5000 | loss 5.2610 | ppl 192.7 | lr 1.63e-05 | gnorm 1.48 | tok/s 5,498 | VRAM 119GB (58%) | ETA 4.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4540/5000 | loss 5.3610 | ppl 212.9 | lr 1.62e-05 | gnorm 1.61 | tok/s 5,643 | VRAM 119GB (58%) | ETA 4.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4550/5000 | loss 5.2436 | ppl 189.3 | lr 1.62e-05 | gnorm 1.21 | tok/s 5,797 | VRAM 119GB (58%) | ETA 4.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4560/5000 | loss 5.3148 | ppl 203.3 | lr 1.61e-05 | gnorm 1.47 | tok/s 5,796 | VRAM 119GB (58%) | ETA 4.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4570/5000 | loss 5.3149 | ppl 203.3 | lr 1.61e-05 | gnorm 2.30 | tok/s 5,796 | VRAM 119GB (58%) | ETA 4.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4580/5000 | loss 5.3789 | ppl 216.8 | lr 1.60e-05 | gnorm 1.30 | tok/s 5,798 | VRAM 119GB (58%) | ETA 4.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4590/5000 | loss 5.3713 | ppl 215.1 | lr 1.60e-05 | gnorm 1.54 | tok/s 5,800 | VRAM 119GB (58%) | ETA 3.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4600/5000 | loss 5.3317 | ppl 206.8 | lr 1.59e-05 | gnorm 1.30 | tok/s 5,800 | VRAM 119GB (58%) | ETA 3.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.2795 ppl=196.3 (best=5.1359)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4600, full state + optimizer)
  step  4610/5000 | loss 5.2330 | ppl 187.4 | lr 1.59e-05 | gnorm 1.36 | tok/s 5,800 | VRAM 119GB (58%) | ETA 3.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4620/5000 | loss 5.2370 | ppl 188.1 | lr 1.59e-05 | gnorm 1.22 | tok/s 5,800 | VRAM 119GB (58%) | ETA 3.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4630/5000 | loss 5.3151 | ppl 203.4 | lr 1.58e-05 | gnorm 1.45 | tok/s 5,800 | VRAM 119GB (58%) | ETA 3.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4640/5000 | loss 5.2922 | ppl 198.8 | lr 1.58e-05 | gnorm 1.48 | tok/s 5,799 | VRAM 119GB (58%) | ETA 3.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4650/5000 | loss 5.3756 | ppl 216.1 | lr 1.57e-05 | gnorm 1.49 | tok/s 5,696 | VRAM 119GB (58%) | ETA 3.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4660/5000 | loss 5.2680 | ppl 194.0 | lr 1.57e-05 | gnorm 1.09 | tok/s 5,547 | VRAM 119GB (58%) | ETA 3.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4670/5000 | loss 5.4040 | ppl 222.3 | lr 1.57e-05 | gnorm 1.57 | tok/s 5,408 | VRAM 119GB (58%) | ETA 3.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4680/5000 | loss 5.3783 | ppl 216.6 | lr 1.56e-05 | gnorm 1.31 | tok/s 5,274 | VRAM 119GB (58%) | ETA 3.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4690/5000 | loss 5.1276 | ppl 168.6 | lr 1.56e-05 | gnorm 1.34 | tok/s 5,147 | VRAM 119GB (58%) | ETA 3.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4700/5000 | loss 5.3363 | ppl 207.7 | lr 1.55e-05 | gnorm 1.16 | tok/s 5,106 | VRAM 119GB (58%) | ETA 3.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.2675 ppl=193.9 (best=5.1359)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4700, full state + optimizer)
  step  4710/5000 | loss 5.2201 | ppl 185.0 | lr 1.55e-05 | gnorm 1.32 | tok/s 5,162 | VRAM 119GB (58%) | ETA 3.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4720/5000 | loss 5.3301 | ppl 206.5 | lr 1.55e-05 | gnorm 1.71 | tok/s 5,160 | VRAM 119GB (58%) | ETA 3.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4730/5000 | loss 5.3909 | ppl 219.4 | lr 1.54e-05 | gnorm 1.31 | tok/s 5,161 | VRAM 119GB (58%) | ETA 2.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4740/5000 | loss 5.2943 | ppl 199.2 | lr 1.54e-05 | gnorm 1.38 | tok/s 5,160 | VRAM 119GB (58%) | ETA 2.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4750/5000 | loss 5.3384 | ppl 208.2 | lr 1.54e-05 | gnorm 1.46 | tok/s 5,160 | VRAM 119GB (58%) | ETA 2.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4760/5000 | loss 5.2979 | ppl 199.9 | lr 1.54e-05 | gnorm 1.33 | tok/s 5,103 | VRAM 119GB (58%) | ETA 2.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4770/5000 | loss 5.3215 | ppl 204.7 | lr 1.53e-05 | gnorm 1.38 | tok/s 5,103 | VRAM 119GB (58%) | ETA 2.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4780/5000 | loss 5.3302 | ppl 206.5 | lr 1.53e-05 | gnorm 1.26 | tok/s 5,103 | VRAM 119GB (58%) | ETA 2.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4790/5000 | loss 5.3284 | ppl 206.1 | lr 1.53e-05 | gnorm 1.42 | tok/s 5,102 | VRAM 119GB (58%) | ETA 2.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4800/5000 | loss 5.4404 | ppl 230.5 | lr 1.52e-05 | gnorm 4.03 | tok/s 5,099 | VRAM 119GB (58%) | ETA 2.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.2457 ppl=189.7 (best=5.1359)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4800, full state + optimizer)
  step  4810/5000 | loss 5.3835 | ppl 217.8 | lr 1.52e-05 | gnorm 1.70 | tok/s 5,223 | VRAM 119GB (58%) | ETA 2.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4820/5000 | loss 5.2967 | ppl 199.7 | lr 1.52e-05 | gnorm 1.25 | tok/s 5,355 | VRAM 119GB (58%) | ETA 1.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4830/5000 | loss 5.3657 | ppl 213.9 | lr 1.52e-05 | gnorm 1.38 | tok/s 5,494 | VRAM 119GB (58%) | ETA 1.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4840/5000 | loss 5.3776 | ppl 216.5 | lr 1.52e-05 | gnorm 1.31 | tok/s 5,641 | VRAM 119GB (58%) | ETA 1.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4850/5000 | loss 5.2128 | ppl 183.6 | lr 1.51e-05 | gnorm 1.30 | tok/s 5,798 | VRAM 119GB (58%) | ETA 1.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4860/5000 | loss 5.3282 | ppl 206.1 | lr 1.51e-05 | gnorm 2.41 | tok/s 5,799 | VRAM 119GB (58%) | ETA 1.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4870/5000 | loss 5.4324 | ppl 228.7 | lr 1.51e-05 | gnorm 1.20 | tok/s 5,799 | VRAM 119GB (58%) | ETA 1.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4880/5000 | loss 5.3635 | ppl 213.5 | lr 1.51e-05 | gnorm 1.22 | tok/s 5,800 | VRAM 119GB (58%) | ETA 1.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4890/5000 | loss 5.3306 | ppl 206.6 | lr 1.51e-05 | gnorm 1.13 | tok/s 5,799 | VRAM 119GB (58%) | ETA 1.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4900/5000 | loss 5.3643 | ppl 213.6 | lr 1.51e-05 | gnorm 1.07 | tok/s 5,799 | VRAM 119GB (58%) | ETA 0.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  >> EVAL: val_loss=5.2650 ppl=193.4 (best=5.1359)
  >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4900, full state + optimizer)
  step  4910/5000 | loss 5.2922 | ppl 198.8 | lr 1.51e-05 | gnorm 1.30 | tok/s 5,800 | VRAM 119GB (58%) | ETA 0.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4920/5000 | loss 5.4508 | ppl 232.9 | lr 1.50e-05 | gnorm 1.67 | tok/s 5,800 | VRAM 119GB (58%) | ETA 0.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4930/5000 | loss 5.3588 | ppl 212.5 | lr 1.50e-05 | gnorm 1.20 | tok/s 5,798 | VRAM 119GB (58%) | ETA 0.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4940/5000 | loss 5.3917 | ppl 219.6 | lr 1.50e-05 | gnorm 1.53 | tok/s 5,799 | VRAM 119GB (58%) | ETA 0.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4950/5000 | loss 5.2779 | ppl 196.0 | lr 1.50e-05 | gnorm 1.73 | tok/s 5,798 | VRAM 119GB (58%) | ETA 0.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4960/5000 | loss 5.3938 | ppl 220.0 | lr 1.50e-05 | gnorm 1.26 | tok/s 5,796 | VRAM 119GB (58%) | ETA 0.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4970/5000 | loss 5.3809 | ppl 217.2 | lr 1.50e-05 | gnorm 1.12 | tok/s 5,797 | VRAM 119GB (58%) | ETA 0.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4980/5000 | loss 5.3169 | ppl 203.8 | lr 1.50e-05 | gnorm 1.31 | tok/s 5,798 | VRAM 119GB (58%) | ETA 0.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
  step  4990/5000 | loss 5.2999 | ppl 200.3 | lr 1.50e-05 | gnorm 1.34 | tok/s 5,797 | VRAM 119GB (58%) | ETA 0.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]

======================================================================
  REALIGNMENT v2 COMPLETE
======================================================================
  Steps:         5000
  Total tokens:  0.98B
  Best val_loss: 5.1359
  Total time:    50.3h
  Final:         /mnt/scratch/checkpoints/frankenstein_v2_final.pt
  Best:          /mnt/scratch/checkpoints/frankenstein_v2_best.pt
  EMA best:      /mnt/scratch/checkpoints/frankenstein_v2_ema_best.pt