File size: 99,633 Bytes
10643b7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 | ======================================================================
FRANKENSTEIN REALIGNMENT v2 — FRESH START
======================================================================
Raw merge: /mnt/scratch/checkpoints/sentinel_prime_frankenstein.pt
Steps: 5000
Unfreeze at: step 500
Batch: 8 × 6 = 48
Seq len: 4096
Phase 1 LR: 0.0001 → 3e-05 (warmup 100)
Phase 2: SGDR 5 cycles, expert_scale=0.3
aux_loss: 0.05, z_loss: 0.002 (from step 0)
EMA: decay=0.9995, every 10 steps
Eff tokens/step: 196,608
[1/5] Building model...
14.40B parameters
[2/5] Loading raw merge: /mnt/scratch/checkpoints/sentinel_prime_frankenstein.pt
Merge loaded.
✓ Router aux_loss_weight = 0.05 (all layers)
Merge meta: Sentinel Prime (Frankenstein Edition)
attention_norms: NousResearch/Hermes-3-Llama-3.1-8B
ffn_experts_0_2: Salesforce/xLAM-7b-fc-r
ffn_experts_1_3: deepseek-ai/deepseek-coder-6.7b-instruct
embeddings: SentinelBrain-14B-MoE-v0.1 (original)
router: SentinelBrain-14B-MoE-v0.1 (original)
VRAM after load: 28.8GB
Enabling gradient checkpointing...
Gradient checkpointing enabled for 24 layers
[3/5] Progressive unfreezing setup...
Froze 192 expert params. Trainable: 290/482
[4/5] Loading training data from /mnt/scratch/shards
[train] 1710 shards, 16.48B tokens
[val] 160 shards, 0.86B tokens
[5/5] Setting up optimizer (Phase 1)...
Trainable: 5.75B / 14.40B
Optimizer: AdamW (decay: 241, no-decay: 49)
Initial evaluation...
Initial val_loss=15.8210, val_ppl=7429962.9
✓ EMA initialized (482 params on CPU)
======================================================================
STARTING TRAINING
======================================================================
Batch: 8 x 6 = 48 effective
Tokens/step: 196,608
VRAM: 28.9/206GB (14%)
SGDR cycles: 5
Cycle 0: steps 500-700 (T=200), peak=5.0e-05, ramp=30
Cycle 1: steps 700-1100 (T=400), peak=4.0e-05, ramp=40
Cycle 2: steps 1100-1900 (T=800), peak=3.0e-05, ramp=60
Cycle 3: steps 1900-3500 (T=1600), peak=2.5e-05, ramp=80
Cycle 4: steps 3500-5000 (T=1500), peak=2.0e-05, ramp=100
step 0/5000 | loss 15.8368 | ppl 7548307.7 | lr 1.00e-06 | gnorm 12.75 | tok/s 4,544 | VRAM 67GB (32%) | ETA 60.1h [FROZEN] | [E0:26% E1:24% E2:25% E3:24%] CF=[1.03 0.98 1.02 0.97]
step 10/5000 | loss 14.6608 | ppl 2328555.1 | lr 1.10e-05 | gnorm 13.44 | tok/s 6,258 | VRAM 67GB (32%) | ETA 43.5h [FROZEN] | [E0:26% E1:25% E2:25% E3:24%] CF=[1.04 0.98 1.01 0.96]
step 20/5000 | loss 12.4423 | ppl 253288.5 | lr 2.10e-05 | gnorm 13.94 | tok/s 6,368 | VRAM 67GB (32%) | ETA 42.7h [FROZEN] | [E0:28% E1:24% E2:25% E3:23%] CF=[1.11 0.95 1.02 0.92]
step 30/5000 | loss 10.6901 | ppl 43919.2 | lr 3.10e-05 | gnorm 5.16 | tok/s 6,412 | VRAM 67GB (32%) | ETA 42.3h [FROZEN] | [E0:30% E1:21% E2:26% E3:23%] CF=[1.19 0.84 1.06 0.92]
step 40/5000 | loss 9.5284 | ppl 13745.1 | lr 4.10e-05 | gnorm 3.09 | tok/s 6,438 | VRAM 67GB (32%) | ETA 42.1h [FROZEN] | [E0:31% E1:19% E2:28% E3:21%] CF=[1.24 0.77 1.13 0.85]
step 50/5000 | loss 8.8601 | ppl 7045.2 | lr 5.10e-05 | gnorm 3.36 | tok/s 6,512 | VRAM 67GB (32%) | ETA 41.5h [FROZEN] | [E0:30% E1:20% E2:29% E3:21%] CF=[1.18 0.81 1.17 0.84]
step 60/5000 | loss 8.2434 | ppl 3802.4 | lr 6.10e-05 | gnorm 3.39 | tok/s 6,521 | VRAM 67GB (32%) | ETA 41.4h [FROZEN] | [E0:29% E1:20% E2:32% E3:18%] CF=[1.18 0.79 1.29 0.74]
step 70/5000 | loss 8.0612 | ppl 3169.1 | lr 7.10e-05 | gnorm 2.05 | tok/s 6,528 | VRAM 67GB (32%) | ETA 41.2h [FROZEN] | [E0:31% E1:18% E2:33% E3:18%] CF=[1.24 0.72 1.32 0.72]
step 80/5000 | loss 7.7648 | ppl 2356.2 | lr 8.10e-05 | gnorm 4.00 | tok/s 6,532 | VRAM 67GB (32%) | ETA 41.1h [FROZEN] | [E0:31% E1:18% E2:33% E3:18%] CF=[1.25 0.73 1.31 0.71]
step 90/5000 | loss 7.6206 | ppl 2039.9 | lr 9.10e-05 | gnorm 2.97 | tok/s 6,533 | VRAM 67GB (32%) | ETA 41.0h [FROZEN] | [E0:31% E1:18% E2:32% E3:19%] CF=[1.25 0.72 1.29 0.74]
step 100/5000 | loss 7.4868 | ppl 1784.4 | lr 1.00e-04 | gnorm 4.38 | tok/s 6,530 | VRAM 67GB (32%) | ETA 41.0h [FROZEN] | [E0:31% E1:18% E2:32% E3:19%] CF=[1.25 0.72 1.28 0.75]
>> EVAL: val_loss=7.3564 ppl=1566.2 ★ NEW BEST → saved (+ EMA + full optimizer)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 100, full state + optimizer)
step 110/5000 | loss 7.3392 | ppl 1539.5 | lr 9.99e-05 | gnorm 2.39 | tok/s 6,524 | VRAM 67GB (32%) | ETA 40.9h [FROZEN] | [E0:31% E1:18% E2:32% E3:18%] CF=[1.25 0.73 1.28 0.74]
step 120/5000 | loss 7.2058 | ppl 1347.2 | lr 9.96e-05 | gnorm 3.05 | tok/s 6,517 | VRAM 67GB (32%) | ETA 40.9h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.26 0.73 1.27 0.73]
step 130/5000 | loss 7.1589 | ppl 1285.5 | lr 9.90e-05 | gnorm 3.62 | tok/s 6,507 | VRAM 67GB (32%) | ETA 40.9h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.26 0.73 1.27 0.74]
step 140/5000 | loss 6.9785 | ppl 1073.3 | lr 9.83e-05 | gnorm 3.08 | tok/s 6,483 | VRAM 67GB (32%) | ETA 40.9h [FROZEN] | [E0:32% E1:18% E2:32% E3:19%] CF=[1.26 0.72 1.27 0.74]
step 150/5000 | loss 6.9154 | ppl 1007.7 | lr 9.73e-05 | gnorm 2.44 | tok/s 6,472 | VRAM 67GB (32%) | ETA 40.9h [FROZEN] | [E0:32% E1:18% E2:32% E3:19%] CF=[1.27 0.72 1.27 0.74]
step 160/5000 | loss 6.7309 | ppl 837.9 | lr 9.62e-05 | gnorm 2.78 | tok/s 6,443 | VRAM 67GB (32%) | ETA 41.0h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.27 0.73]
step 170/5000 | loss 6.7939 | ppl 892.4 | lr 9.48e-05 | gnorm 3.16 | tok/s 6,437 | VRAM 67GB (32%) | ETA 41.0h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.27 0.74]
step 180/5000 | loss 6.7680 | ppl 869.6 | lr 9.33e-05 | gnorm 3.95 | tok/s 6,420 | VRAM 67GB (32%) | ETA 41.0h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.27 0.74]
step 190/5000 | loss 6.6892 | ppl 803.7 | lr 9.16e-05 | gnorm 1.92 | tok/s 6,430 | VRAM 67GB (32%) | ETA 40.9h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.26 0.74]
step 200/5000 | loss 6.5718 | ppl 714.6 | lr 8.97e-05 | gnorm 2.44 | tok/s 6,405 | VRAM 67GB (32%) | ETA 40.9h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.26 0.74]
>> EVAL: val_loss=6.4265 ppl=618.0 ★ NEW BEST → saved (+ EMA + full optimizer)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 200, full state + optimizer)
step 210/5000 | loss 6.3597 | ppl 578.1 | lr 8.77e-05 | gnorm 1.84 | tok/s 6,423 | VRAM 67GB (32%) | ETA 40.7h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.26 0.74]
step 220/5000 | loss 6.4999 | ppl 665.1 | lr 8.56e-05 | gnorm 3.22 | tok/s 6,417 | VRAM 67GB (32%) | ETA 40.7h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.26 0.74]
step 230/5000 | loss 6.3102 | ppl 550.1 | lr 8.33e-05 | gnorm 3.12 | tok/s 6,414 | VRAM 67GB (32%) | ETA 40.6h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.74]
step 240/5000 | loss 6.2094 | ppl 497.4 | lr 8.09e-05 | gnorm 2.61 | tok/s 6,406 | VRAM 67GB (32%) | ETA 40.6h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.26 0.74]
step 250/5000 | loss 6.1780 | ppl 482.0 | lr 7.84e-05 | gnorm 2.05 | tok/s 6,428 | VRAM 67GB (32%) | ETA 40.4h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73]
>> MILESTONE step 250 LOCKED → /mnt/scratch/checkpoints/frankenstein_v2_milestone_250.pt
step 260/5000 | loss 6.3846 | ppl 592.7 | lr 7.58e-05 | gnorm 1.93 | tok/s 6,426 | VRAM 67GB (32%) | ETA 40.3h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 270/5000 | loss 6.2122 | ppl 498.8 | lr 7.32e-05 | gnorm 1.94 | tok/s 6,427 | VRAM 67GB (32%) | ETA 40.2h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 280/5000 | loss 6.1144 | ppl 452.3 | lr 7.05e-05 | gnorm 1.58 | tok/s 6,427 | VRAM 67GB (32%) | ETA 40.1h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.26 0.73]
step 290/5000 | loss 6.0846 | ppl 439.0 | lr 6.77e-05 | gnorm 1.45 | tok/s 6,428 | VRAM 67GB (32%) | ETA 40.0h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 300/5000 | loss 6.0901 | ppl 441.5 | lr 6.50e-05 | gnorm 1.68 | tok/s 6,420 | VRAM 67GB (32%) | ETA 40.0h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
>> EVAL: val_loss=6.1254 ppl=457.3 ★ NEW BEST → saved (+ EMA + full optimizer)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 300, full state + optimizer)
step 310/5000 | loss 6.1131 | ppl 451.8 | lr 6.23e-05 | gnorm 1.55 | tok/s 6,421 | VRAM 67GB (32%) | ETA 39.9h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 320/5000 | loss 6.0769 | ppl 435.7 | lr 5.95e-05 | gnorm 1.51 | tok/s 6,420 | VRAM 67GB (32%) | ETA 39.8h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 330/5000 | loss 6.0861 | ppl 439.7 | lr 5.68e-05 | gnorm 1.85 | tok/s 6,437 | VRAM 67GB (32%) | ETA 39.6h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 340/5000 | loss 5.9268 | ppl 375.0 | lr 5.42e-05 | gnorm 1.74 | tok/s 6,449 | VRAM 67GB (32%) | ETA 39.5h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 350/5000 | loss 5.9748 | ppl 393.4 | lr 5.16e-05 | gnorm 1.25 | tok/s 6,462 | VRAM 67GB (32%) | ETA 39.3h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 360/5000 | loss 6.0018 | ppl 404.2 | lr 4.91e-05 | gnorm 1.42 | tok/s 6,468 | VRAM 67GB (32%) | ETA 39.2h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 370/5000 | loss 5.9796 | ppl 395.3 | lr 4.67e-05 | gnorm 1.62 | tok/s 6,473 | VRAM 67GB (32%) | ETA 39.1h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 380/5000 | loss 6.0423 | ppl 420.8 | lr 4.44e-05 | gnorm 1.37 | tok/s 6,450 | VRAM 67GB (32%) | ETA 39.1h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 390/5000 | loss 5.9497 | ppl 383.6 | lr 4.23e-05 | gnorm 1.46 | tok/s 6,444 | VRAM 67GB (32%) | ETA 39.1h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 400/5000 | loss 5.9011 | ppl 365.4 | lr 4.03e-05 | gnorm 1.16 | tok/s 6,431 | VRAM 67GB (32%) | ETA 39.1h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
>> EVAL: val_loss=5.8662 ppl=352.9 ★ NEW BEST → saved (+ EMA + full optimizer)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 400, full state + optimizer)
step 410/5000 | loss 5.9157 | ppl 370.8 | lr 3.84e-05 | gnorm 1.04 | tok/s 6,428 | VRAM 67GB (32%) | ETA 39.0h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 420/5000 | loss 5.8687 | ppl 353.8 | lr 3.67e-05 | gnorm 1.44 | tok/s 6,425 | VRAM 67GB (32%) | ETA 38.9h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 430/5000 | loss 5.8683 | ppl 353.7 | lr 3.52e-05 | gnorm 1.38 | tok/s 6,449 | VRAM 67GB (32%) | ETA 38.7h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 440/5000 | loss 5.8722 | ppl 355.0 | lr 3.38e-05 | gnorm 1.61 | tok/s 6,443 | VRAM 67GB (32%) | ETA 38.7h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 450/5000 | loss 5.8595 | ppl 350.6 | lr 3.27e-05 | gnorm 1.23 | tok/s 6,453 | VRAM 67GB (32%) | ETA 38.5h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 460/5000 | loss 5.8211 | ppl 337.3 | lr 3.17e-05 | gnorm 1.14 | tok/s 6,455 | VRAM 67GB (32%) | ETA 38.4h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 470/5000 | loss 5.9151 | ppl 370.6 | lr 3.10e-05 | gnorm 1.26 | tok/s 6,460 | VRAM 67GB (32%) | ETA 38.3h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 480/5000 | loss 5.8819 | ppl 358.5 | lr 3.04e-05 | gnorm 1.43 | tok/s 6,462 | VRAM 67GB (32%) | ETA 38.2h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 490/5000 | loss 5.8794 | ppl 357.6 | lr 3.01e-05 | gnorm 1.21 | tok/s 6,471 | VRAM 67GB (32%) | ETA 38.1h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
>>> Step 500: UNFREEZING EXPERTS <<<
Unfroze all 482 params.
✓ Pre-unfreeze checkpoint LOCKED: /mnt/scratch/checkpoints/frankenstein_v2_pre_unfreeze.pt
✓ Pre-unfreeze FULL checkpoint LOCKED: /mnt/scratch/checkpoints/frankenstein_v2_pre_unfreeze_full.pt
Expert: 8.66B @ lr=4.85e-06
Base: 5.75B @ lr=1.62e-05
Spike guard: 3.0× EMA (tightened)
step 500/5000 | loss 5.9927 | ppl 400.5 | lr 1.62e-05 | gnorm 1.45 | tok/s 6,205 | VRAM 119GB (58%) | ETA 39.6h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
>> EVAL: val_loss=5.8451 ppl=345.5 ★ NEW BEST → saved (+ EMA + full optimizer)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 500, full state + optimizer)
>> MILESTONE step 500 LOCKED → /mnt/scratch/checkpoints/frankenstein_v2_milestone_500.pt
step 510/5000 | loss 5.7749 | ppl 322.1 | lr 2.78e-05 | gnorm 1.70 | tok/s 6,068 | VRAM 119GB (58%) | ETA 40.4h C0 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73]
>> MILESTONE step 510 LOCKED → /mnt/scratch/checkpoints/frankenstein_v2_milestone_510.pt
step 520/5000 | loss 5.7268 | ppl 307.0 | lr 3.95e-05 | gnorm 1.78 | tok/s 5,934 | VRAM 119GB (58%) | ETA 41.2h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 530/5000 | loss 5.8261 | ppl 339.0 | lr 5.00e-05 | gnorm 2.03 | tok/s 5,809 | VRAM 119GB (58%) | ETA 42.0h C0 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 540/5000 | loss 5.9697 | ppl 391.4 | lr 4.97e-05 | gnorm 1.78 | tok/s 5,690 | VRAM 119GB (58%) | ETA 42.8h C0 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 550/5000 | loss 5.7818 | ppl 324.3 | lr 4.88e-05 | gnorm 1.50 | tok/s 5,790 | VRAM 119GB (58%) | ETA 42.0h C0 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 560/5000 | loss 6.0418 | ppl 420.7 | lr 4.74e-05 | gnorm 1.41 | tok/s 5,791 | VRAM 119GB (58%) | ETA 41.9h C0 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 570/5000 | loss 5.7851 | ppl 325.4 | lr 4.54e-05 | gnorm 1.52 | tok/s 5,781 | VRAM 119GB (58%) | ETA 41.8h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 580/5000 | loss 5.9342 | ppl 377.7 | lr 4.30e-05 | gnorm 1.90 | tok/s 5,782 | VRAM 119GB (58%) | ETA 41.8h C0 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 590/5000 | loss 5.7303 | ppl 308.1 | lr 4.03e-05 | gnorm 1.63 | tok/s 5,779 | VRAM 119GB (58%) | ETA 41.7h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 600/5000 | loss 5.9043 | ppl 366.6 | lr 3.73e-05 | gnorm 1.30 | tok/s 5,779 | VRAM 119GB (58%) | ETA 41.6h C0 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73]
>> EVAL: val_loss=5.7113 ppl=302.3 ★ NEW BEST → saved (+ EMA + full optimizer)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 600, full state + optimizer)
step 610/5000 | loss 5.7160 | ppl 303.7 | lr 3.41e-05 | gnorm 1.27 | tok/s 5,780 | VRAM 119GB (58%) | ETA 41.5h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 620/5000 | loss 5.6105 | ppl 273.3 | lr 3.09e-05 | gnorm 1.41 | tok/s 5,793 | VRAM 119GB (58%) | ETA 41.3h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 630/5000 | loss 5.5641 | ppl 260.9 | lr 2.77e-05 | gnorm 1.41 | tok/s 5,795 | VRAM 119GB (58%) | ETA 41.2h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 640/5000 | loss 5.5701 | ppl 262.5 | lr 2.47e-05 | gnorm 1.95 | tok/s 5,798 | VRAM 119GB (58%) | ETA 41.1h C0 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 650/5000 | loss 5.6485 | ppl 283.9 | lr 2.20e-05 | gnorm 1.25 | tok/s 5,799 | VRAM 119GB (58%) | ETA 41.0h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 660/5000 | loss 5.7846 | ppl 325.3 | lr 1.96e-05 | gnorm 1.19 | tok/s 5,798 | VRAM 119GB (58%) | ETA 40.9h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 670/5000 | loss 5.8722 | ppl 355.0 | lr 1.76e-05 | gnorm 1.09 | tok/s 5,798 | VRAM 119GB (58%) | ETA 40.8h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 680/5000 | loss 5.7341 | ppl 309.2 | lr 1.62e-05 | gnorm 3.89 | tok/s 5,797 | VRAM 119GB (58%) | ETA 40.7h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 690/5000 | loss 5.8302 | ppl 340.4 | lr 1.53e-05 | gnorm 1.05 | tok/s 5,797 | VRAM 119GB (58%) | ETA 40.6h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 700/5000 | loss 5.7496 | ppl 314.1 | lr 1.56e-05 | gnorm 1.15 | tok/s 5,797 | VRAM 119GB (58%) | ETA 40.5h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.6660 ppl=288.9 ★ NEW BEST → saved (+ EMA + full optimizer)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 700, full state + optimizer)
step 710/5000 | loss 5.7459 | ppl 312.9 | lr 2.19e-05 | gnorm 1.55 | tok/s 5,796 | VRAM 119GB (58%) | ETA 40.4h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 720/5000 | loss 5.7757 | ppl 322.4 | lr 2.81e-05 | gnorm 1.59 | tok/s 5,797 | VRAM 119GB (58%) | ETA 40.3h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 730/5000 | loss 5.7632 | ppl 318.4 | lr 3.44e-05 | gnorm 1.78 | tok/s 5,795 | VRAM 119GB (58%) | ETA 40.2h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 740/5000 | loss 5.7689 | ppl 320.2 | lr 4.00e-05 | gnorm 1.72 | tok/s 5,795 | VRAM 119GB (58%) | ETA 40.1h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 750/5000 | loss 5.6952 | ppl 297.4 | lr 4.00e-05 | gnorm 1.71 | tok/s 5,793 | VRAM 119GB (58%) | ETA 40.1h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 760/5000 | loss 5.6071 | ppl 272.4 | lr 3.98e-05 | gnorm 1.37 | tok/s 5,793 | VRAM 119GB (58%) | ETA 40.0h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 770/5000 | loss 5.7852 | ppl 325.4 | lr 3.96e-05 | gnorm 1.85 | tok/s 5,791 | VRAM 119GB (58%) | ETA 39.9h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 780/5000 | loss 5.7631 | ppl 318.3 | lr 3.92e-05 | gnorm 1.78 | tok/s 5,792 | VRAM 119GB (58%) | ETA 39.8h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 790/5000 | loss 5.6911 | ppl 296.2 | lr 3.88e-05 | gnorm 1.59 | tok/s 5,790 | VRAM 119GB (58%) | ETA 39.7h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 800/5000 | loss 5.8214 | ppl 337.4 | lr 3.83e-05 | gnorm 1.97 | tok/s 5,791 | VRAM 119GB (58%) | ETA 39.6h C1 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.6552 ppl=285.8 ★ NEW BEST → saved (+ EMA + full optimizer)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 800, full state + optimizer)
step 810/5000 | loss 5.6561 | ppl 286.0 | lr 3.77e-05 | gnorm 1.14 | tok/s 5,792 | VRAM 119GB (58%) | ETA 39.5h C1 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 820/5000 | loss 5.8421 | ppl 344.5 | lr 3.71e-05 | gnorm 1.85 | tok/s 5,794 | VRAM 119GB (58%) | ETA 39.4h C1 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 830/5000 | loss 5.6260 | ppl 277.6 | lr 3.63e-05 | gnorm 1.80 | tok/s 5,793 | VRAM 119GB (58%) | ETA 39.3h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 840/5000 | loss 5.7293 | ppl 307.7 | lr 3.55e-05 | gnorm 1.38 | tok/s 5,793 | VRAM 119GB (58%) | ETA 39.2h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 850/5000 | loss 5.7658 | ppl 319.2 | lr 3.47e-05 | gnorm 1.41 | tok/s 5,793 | VRAM 119GB (58%) | ETA 39.1h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 860/5000 | loss 5.7830 | ppl 324.7 | lr 3.38e-05 | gnorm 1.22 | tok/s 5,795 | VRAM 119GB (58%) | ETA 39.0h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 870/5000 | loss 5.6170 | ppl 275.1 | lr 3.28e-05 | gnorm 1.48 | tok/s 5,795 | VRAM 119GB (58%) | ETA 38.9h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 880/5000 | loss 5.6471 | ppl 283.5 | lr 3.18e-05 | gnorm 1.59 | tok/s 5,796 | VRAM 119GB (58%) | ETA 38.8h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 890/5000 | loss 5.5396 | ppl 254.6 | lr 3.07e-05 | gnorm 1.61 | tok/s 5,796 | VRAM 119GB (58%) | ETA 38.7h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 900/5000 | loss 5.6467 | ppl 283.4 | lr 2.97e-05 | gnorm 1.34 | tok/s 5,795 | VRAM 119GB (58%) | ETA 38.6h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.5526 ppl=257.9 ★ NEW BEST → saved (+ EMA + full optimizer)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 900, full state + optimizer)
step 910/5000 | loss 5.6857 | ppl 294.6 | lr 2.86e-05 | gnorm 1.75 | tok/s 5,794 | VRAM 119GB (58%) | ETA 38.6h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 920/5000 | loss 5.7536 | ppl 315.3 | lr 2.75e-05 | gnorm 1.62 | tok/s 5,794 | VRAM 119GB (58%) | ETA 38.5h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 930/5000 | loss 5.6720 | ppl 290.6 | lr 2.64e-05 | gnorm 1.50 | tok/s 5,795 | VRAM 119GB (58%) | ETA 38.4h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 940/5000 | loss 5.6297 | ppl 278.6 | lr 2.53e-05 | gnorm 1.27 | tok/s 5,795 | VRAM 119GB (58%) | ETA 38.3h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 950/5000 | loss 5.6664 | ppl 289.0 | lr 2.43e-05 | gnorm 1.41 | tok/s 5,797 | VRAM 119GB (58%) | ETA 38.2h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 960/5000 | loss 5.6227 | ppl 276.6 | lr 2.32e-05 | gnorm 1.35 | tok/s 5,797 | VRAM 119GB (58%) | ETA 38.1h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 970/5000 | loss 5.5974 | ppl 269.7 | lr 2.22e-05 | gnorm 1.43 | tok/s 5,797 | VRAM 119GB (58%) | ETA 38.0h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 980/5000 | loss 5.5043 | ppl 245.8 | lr 2.13e-05 | gnorm 1.21 | tok/s 5,797 | VRAM 119GB (58%) | ETA 37.9h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 990/5000 | loss 5.6915 | ppl 296.3 | lr 2.03e-05 | gnorm 1.30 | tok/s 5,798 | VRAM 119GB (58%) | ETA 37.8h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1000/5000 | loss 5.6183 | ppl 275.4 | lr 1.95e-05 | gnorm 1.14 | tok/s 5,799 | VRAM 119GB (58%) | ETA 37.7h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.5968 ppl=269.6 (best=5.5526)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1000, full state + optimizer)
>> MILESTONE step 1000 LOCKED → /mnt/scratch/checkpoints/frankenstein_v2_milestone_1000.pt
step 1010/5000 | loss 5.3524 | ppl 211.1 | lr 1.87e-05 | gnorm 1.30 | tok/s 5,652 | VRAM 119GB (58%) | ETA 38.6h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1020/5000 | loss 5.4961 | ppl 243.7 | lr 1.79e-05 | gnorm 1.25 | tok/s 5,507 | VRAM 119GB (58%) | ETA 39.5h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1030/5000 | loss 5.8695 | ppl 354.1 | lr 1.73e-05 | gnorm 1.33 | tok/s 5,368 | VRAM 119GB (58%) | ETA 40.4h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1040/5000 | loss 5.7147 | ppl 303.3 | lr 1.67e-05 | gnorm 1.63 | tok/s 5,236 | VRAM 119GB (58%) | ETA 41.3h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1050/5000 | loss 5.5944 | ppl 268.9 | lr 1.62e-05 | gnorm 1.27 | tok/s 5,111 | VRAM 119GB (58%) | ETA 42.2h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1060/5000 | loss 5.4514 | ppl 233.1 | lr 1.58e-05 | gnorm 1.82 | tok/s 5,106 | VRAM 119GB (58%) | ETA 42.1h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1070/5000 | loss 5.7486 | ppl 313.8 | lr 1.54e-05 | gnorm 3.20 | tok/s 5,107 | VRAM 119GB (58%) | ETA 42.0h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1080/5000 | loss 5.7669 | ppl 319.6 | lr 1.52e-05 | gnorm 1.33 | tok/s 5,105 | VRAM 119GB (58%) | ETA 41.9h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1090/5000 | loss 5.5144 | ppl 248.2 | lr 1.50e-05 | gnorm 1.87 | tok/s 5,105 | VRAM 119GB (58%) | ETA 41.8h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1100/5000 | loss 5.6030 | ppl 271.2 | lr 1.52e-05 | gnorm 1.29 | tok/s 5,105 | VRAM 119GB (58%) | ETA 41.7h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.5704 ppl=262.5 (best=5.5526)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1100, full state + optimizer)
step 1110/5000 | loss 5.4279 | ppl 227.7 | lr 1.78e-05 | gnorm 1.45 | tok/s 5,230 | VRAM 119GB (58%) | ETA 40.6h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1120/5000 | loss 5.7553 | ppl 315.9 | lr 2.03e-05 | gnorm 1.13 | tok/s 5,362 | VRAM 119GB (58%) | ETA 39.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1130/5000 | loss 5.6633 | ppl 288.1 | lr 2.28e-05 | gnorm 1.34 | tok/s 5,501 | VRAM 119GB (58%) | ETA 38.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1140/5000 | loss 5.4698 | ppl 237.4 | lr 2.53e-05 | gnorm 1.23 | tok/s 5,646 | VRAM 119GB (58%) | ETA 37.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1150/5000 | loss 5.6485 | ppl 283.9 | lr 2.78e-05 | gnorm 1.44 | tok/s 5,799 | VRAM 119GB (58%) | ETA 36.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1160/5000 | loss 5.5408 | ppl 254.9 | lr 3.00e-05 | gnorm 1.52 | tok/s 5,798 | VRAM 119GB (58%) | ETA 36.2h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1170/5000 | loss 5.5947 | ppl 269.0 | lr 3.00e-05 | gnorm 1.47 | tok/s 5,798 | VRAM 119GB (58%) | ETA 36.1h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1180/5000 | loss 5.5501 | ppl 257.3 | lr 3.00e-05 | gnorm 1.77 | tok/s 5,797 | VRAM 119GB (58%) | ETA 36.0h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1190/5000 | loss 5.7825 | ppl 324.6 | lr 2.99e-05 | gnorm 1.48 | tok/s 5,798 | VRAM 119GB (58%) | ETA 35.9h C2 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1200/5000 | loss 5.4681 | ppl 237.0 | lr 2.99e-05 | gnorm 1.66 | tok/s 5,798 | VRAM 119GB (58%) | ETA 35.8h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.5044 ppl=245.8 ★ NEW BEST → saved (+ EMA + full optimizer)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1200, full state + optimizer)
step 1210/5000 | loss 5.5756 | ppl 263.9 | lr 2.98e-05 | gnorm 1.53 | tok/s 5,769 | VRAM 119GB (58%) | ETA 35.9h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1220/5000 | loss 5.5656 | ppl 261.3 | lr 2.98e-05 | gnorm 1.57 | tok/s 5,617 | VRAM 119GB (58%) | ETA 36.8h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1230/5000 | loss 5.6694 | ppl 289.9 | lr 2.97e-05 | gnorm 1.30 | tok/s 5,474 | VRAM 119GB (58%) | ETA 37.6h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1240/5000 | loss 5.6058 | ppl 272.0 | lr 2.96e-05 | gnorm 1.31 | tok/s 5,337 | VRAM 119GB (58%) | ETA 38.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1250/5000 | loss 5.7426 | ppl 311.9 | lr 2.95e-05 | gnorm 1.55 | tok/s 5,207 | VRAM 119GB (58%) | ETA 39.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1260/5000 | loss 5.6019 | ppl 270.9 | lr 2.93e-05 | gnorm 1.35 | tok/s 5,105 | VRAM 119GB (58%) | ETA 40.0h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1270/5000 | loss 5.5776 | ppl 264.4 | lr 2.92e-05 | gnorm 1.38 | tok/s 5,105 | VRAM 119GB (58%) | ETA 39.9h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1280/5000 | loss 5.6610 | ppl 287.4 | lr 2.90e-05 | gnorm 1.45 | tok/s 5,105 | VRAM 119GB (58%) | ETA 39.8h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1290/5000 | loss 5.4238 | ppl 226.7 | lr 2.89e-05 | gnorm 1.77 | tok/s 5,106 | VRAM 119GB (58%) | ETA 39.7h C2 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1300/5000 | loss 5.5268 | ppl 251.3 | lr 2.87e-05 | gnorm 1.48 | tok/s 5,107 | VRAM 119GB (58%) | ETA 39.6h C2 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.5083 ppl=246.7 (best=5.5044)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1300, full state + optimizer)
step 1310/5000 | loss 5.5639 | ppl 260.8 | lr 2.85e-05 | gnorm 1.43 | tok/s 5,115 | VRAM 119GB (58%) | ETA 39.4h C2 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1320/5000 | loss 5.5524 | ppl 257.9 | lr 2.83e-05 | gnorm 1.22 | tok/s 5,116 | VRAM 119GB (58%) | ETA 39.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1330/5000 | loss 5.5659 | ppl 261.4 | lr 2.81e-05 | gnorm 1.39 | tok/s 5,117 | VRAM 119GB (58%) | ETA 39.2h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1340/5000 | loss 5.4635 | ppl 235.9 | lr 2.79e-05 | gnorm 1.48 | tok/s 5,119 | VRAM 119GB (58%) | ETA 39.0h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1350/5000 | loss 5.4772 | ppl 239.2 | lr 2.77e-05 | gnorm 1.53 | tok/s 5,119 | VRAM 119GB (58%) | ETA 38.9h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1360/5000 | loss 5.5250 | ppl 250.9 | lr 2.75e-05 | gnorm 1.62 | tok/s 5,114 | VRAM 119GB (58%) | ETA 38.9h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1370/5000 | loss 5.4610 | ppl 235.3 | lr 2.72e-05 | gnorm 1.45 | tok/s 5,115 | VRAM 119GB (58%) | ETA 38.8h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1380/5000 | loss 5.5452 | ppl 256.0 | lr 2.70e-05 | gnorm 1.59 | tok/s 5,114 | VRAM 119GB (58%) | ETA 38.7h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1390/5000 | loss 5.3807 | ppl 217.2 | lr 2.67e-05 | gnorm 1.44 | tok/s 5,113 | VRAM 119GB (58%) | ETA 38.6h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1400/5000 | loss 5.6586 | ppl 286.8 | lr 2.64e-05 | gnorm 1.30 | tok/s 5,112 | VRAM 119GB (58%) | ETA 38.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.3969 ppl=220.7 ★ NEW BEST → saved (+ EMA + full optimizer)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1400, full state + optimizer)
step 1410/5000 | loss 5.6199 | ppl 275.9 | lr 2.62e-05 | gnorm 1.24 | tok/s 5,235 | VRAM 119GB (58%) | ETA 37.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1420/5000 | loss 5.4698 | ppl 237.4 | lr 2.59e-05 | gnorm 1.67 | tok/s 5,365 | VRAM 119GB (58%) | ETA 36.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1430/5000 | loss 5.6563 | ppl 286.1 | lr 2.56e-05 | gnorm 1.32 | tok/s 5,501 | VRAM 119GB (58%) | ETA 35.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1440/5000 | loss 5.6030 | ppl 271.2 | lr 2.53e-05 | gnorm 1.48 | tok/s 5,646 | VRAM 119GB (58%) | ETA 34.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1450/5000 | loss 5.5052 | ppl 246.0 | lr 2.50e-05 | gnorm 1.45 | tok/s 5,797 | VRAM 119GB (58%) | ETA 33.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1460/5000 | loss 5.1654 | ppl 175.1 | lr 2.47e-05 | gnorm 1.45 | tok/s 5,796 | VRAM 119GB (58%) | ETA 33.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1470/5000 | loss 5.6636 | ppl 288.2 | lr 2.44e-05 | gnorm 1.36 | tok/s 5,796 | VRAM 119GB (58%) | ETA 33.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1480/5000 | loss 5.7023 | ppl 299.6 | lr 2.41e-05 | gnorm 1.36 | tok/s 5,796 | VRAM 119GB (58%) | ETA 33.2h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1490/5000 | loss 5.5606 | ppl 260.0 | lr 2.38e-05 | gnorm 1.38 | tok/s 5,795 | VRAM 119GB (58%) | ETA 33.1h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1500/5000 | loss 5.5311 | ppl 252.4 | lr 2.35e-05 | gnorm 1.47 | tok/s 5,796 | VRAM 119GB (58%) | ETA 33.0h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.4685 ppl=237.1 (best=5.3969)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1500, full state + optimizer)
step 1510/5000 | loss 5.5400 | ppl 254.7 | lr 2.31e-05 | gnorm 1.58 | tok/s 5,797 | VRAM 119GB (58%) | ETA 32.9h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1520/5000 | loss 5.5024 | ppl 245.3 | lr 2.28e-05 | gnorm 1.47 | tok/s 5,796 | VRAM 119GB (58%) | ETA 32.8h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1530/5000 | loss 5.6098 | ppl 273.1 | lr 2.25e-05 | gnorm 1.27 | tok/s 5,796 | VRAM 119GB (58%) | ETA 32.7h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1540/5000 | loss 5.4081 | ppl 223.2 | lr 2.22e-05 | gnorm 1.32 | tok/s 5,794 | VRAM 119GB (58%) | ETA 32.6h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1550/5000 | loss 5.5051 | ppl 245.9 | lr 2.19e-05 | gnorm 1.47 | tok/s 5,793 | VRAM 119GB (58%) | ETA 32.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1560/5000 | loss 5.5016 | ppl 245.1 | lr 2.15e-05 | gnorm 1.36 | tok/s 5,792 | VRAM 119GB (58%) | ETA 32.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1570/5000 | loss 5.5568 | ppl 259.0 | lr 2.12e-05 | gnorm 1.36 | tok/s 5,792 | VRAM 119GB (58%) | ETA 32.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1580/5000 | loss 5.4976 | ppl 244.1 | lr 2.09e-05 | gnorm 1.52 | tok/s 5,792 | VRAM 119GB (58%) | ETA 32.2h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1590/5000 | loss 5.6104 | ppl 273.2 | lr 2.06e-05 | gnorm 1.48 | tok/s 5,793 | VRAM 119GB (58%) | ETA 32.1h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1600/5000 | loss 5.4706 | ppl 237.6 | lr 2.03e-05 | gnorm 1.39 | tok/s 5,794 | VRAM 119GB (58%) | ETA 32.0h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.4872 ppl=241.6 (best=5.3969)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1600, full state + optimizer)
step 1610/5000 | loss 5.3893 | ppl 219.0 | lr 2.00e-05 | gnorm 1.32 | tok/s 5,707 | VRAM 119GB (58%) | ETA 32.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1620/5000 | loss 5.5226 | ppl 250.3 | lr 1.97e-05 | gnorm 1.21 | tok/s 5,560 | VRAM 119GB (58%) | ETA 33.2h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1630/5000 | loss 5.4062 | ppl 222.8 | lr 1.94e-05 | gnorm 1.38 | tok/s 5,419 | VRAM 119GB (58%) | ETA 34.0h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1640/5000 | loss 5.6297 | ppl 278.6 | lr 1.91e-05 | gnorm 1.23 | tok/s 5,286 | VRAM 119GB (58%) | ETA 34.7h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1650/5000 | loss 5.5368 | ppl 253.9 | lr 1.88e-05 | gnorm 1.43 | tok/s 5,159 | VRAM 119GB (58%) | ETA 35.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1660/5000 | loss 5.4787 | ppl 239.5 | lr 1.86e-05 | gnorm 1.18 | tok/s 5,106 | VRAM 119GB (58%) | ETA 35.7h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1670/5000 | loss 5.5194 | ppl 249.5 | lr 1.83e-05 | gnorm 1.15 | tok/s 5,106 | VRAM 119GB (58%) | ETA 35.6h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1680/5000 | loss 5.6273 | ppl 277.9 | lr 1.80e-05 | gnorm 1.62 | tok/s 5,107 | VRAM 119GB (58%) | ETA 35.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1690/5000 | loss 5.3792 | ppl 216.8 | lr 1.78e-05 | gnorm 1.38 | tok/s 5,107 | VRAM 119GB (58%) | ETA 35.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1700/5000 | loss 5.4360 | ppl 229.5 | lr 1.75e-05 | gnorm 1.28 | tok/s 5,108 | VRAM 119GB (58%) | ETA 35.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.4370 ppl=229.8 (best=5.3969)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1700, full state + optimizer)
step 1710/5000 | loss 5.5455 | ppl 256.1 | lr 1.73e-05 | gnorm 1.70 | tok/s 5,234 | VRAM 119GB (58%) | ETA 34.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1720/5000 | loss 5.4692 | ppl 237.3 | lr 1.71e-05 | gnorm 1.39 | tok/s 5,365 | VRAM 119GB (58%) | ETA 33.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1730/5000 | loss 5.3266 | ppl 205.7 | lr 1.69e-05 | gnorm 1.27 | tok/s 5,504 | VRAM 119GB (58%) | ETA 32.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1740/5000 | loss 5.5804 | ppl 265.2 | lr 1.67e-05 | gnorm 1.09 | tok/s 5,650 | VRAM 119GB (58%) | ETA 31.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1750/5000 | loss 5.4715 | ppl 237.8 | lr 1.65e-05 | gnorm 1.24 | tok/s 5,803 | VRAM 119GB (58%) | ETA 30.6h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1760/5000 | loss 5.4621 | ppl 235.6 | lr 1.63e-05 | gnorm 1.22 | tok/s 5,801 | VRAM 119GB (58%) | ETA 30.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1770/5000 | loss 5.4925 | ppl 242.9 | lr 1.61e-05 | gnorm 1.34 | tok/s 5,801 | VRAM 119GB (58%) | ETA 30.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1780/5000 | loss 5.4751 | ppl 238.7 | lr 1.60e-05 | gnorm 1.16 | tok/s 5,798 | VRAM 119GB (58%) | ETA 30.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1790/5000 | loss 5.4281 | ppl 227.7 | lr 1.58e-05 | gnorm 1.20 | tok/s 5,796 | VRAM 119GB (58%) | ETA 30.2h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1800/5000 | loss 5.4589 | ppl 234.8 | lr 1.57e-05 | gnorm 1.45 | tok/s 5,796 | VRAM 119GB (58%) | ETA 30.2h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.4080 ppl=223.2 (best=5.3969)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1800, full state + optimizer)
step 1810/5000 | loss 5.4631 | ppl 235.8 | lr 1.55e-05 | gnorm 1.38 | tok/s 5,668 | VRAM 119GB (58%) | ETA 30.7h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1820/5000 | loss 5.4203 | ppl 225.9 | lr 1.54e-05 | gnorm 1.24 | tok/s 5,522 | VRAM 119GB (58%) | ETA 31.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1830/5000 | loss 5.3795 | ppl 216.9 | lr 1.53e-05 | gnorm 1.23 | tok/s 5,380 | VRAM 119GB (58%) | ETA 32.2h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1840/5000 | loss 5.5472 | ppl 256.5 | lr 1.52e-05 | gnorm 1.21 | tok/s 5,247 | VRAM 119GB (58%) | ETA 32.9h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1850/5000 | loss 5.3745 | ppl 215.8 | lr 1.52e-05 | gnorm 1.16 | tok/s 5,118 | VRAM 119GB (58%) | ETA 33.6h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1860/5000 | loss 5.6542 | ppl 285.5 | lr 1.51e-05 | gnorm 1.08 | tok/s 5,094 | VRAM 119GB (58%) | ETA 33.7h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1870/5000 | loss 5.6051 | ppl 271.8 | lr 1.51e-05 | gnorm 1.12 | tok/s 5,089 | VRAM 119GB (58%) | ETA 33.6h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1880/5000 | loss 5.5013 | ppl 245.0 | lr 1.50e-05 | gnorm 1.36 | tok/s 5,093 | VRAM 119GB (58%) | ETA 33.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1890/5000 | loss 5.3653 | ppl 213.9 | lr 1.50e-05 | gnorm 2.14 | tok/s 5,094 | VRAM 119GB (58%) | ETA 33.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1900/5000 | loss 5.4583 | ppl 234.7 | lr 1.51e-05 | gnorm 1.27 | tok/s 5,097 | VRAM 119GB (58%) | ETA 33.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.4447 ppl=231.5 (best=5.3969)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1900, full state + optimizer)
step 1910/5000 | loss 5.5599 | ppl 259.8 | lr 1.64e-05 | gnorm 1.20 | tok/s 5,154 | VRAM 119GB (58%) | ETA 32.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1920/5000 | loss 5.7572 | ppl 316.5 | lr 1.76e-05 | gnorm 1.41 | tok/s 5,158 | VRAM 119GB (58%) | ETA 32.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1930/5000 | loss 5.5125 | ppl 247.8 | lr 1.89e-05 | gnorm 1.48 | tok/s 5,158 | VRAM 119GB (58%) | ETA 32.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1940/5000 | loss 5.6845 | ppl 294.3 | lr 2.01e-05 | gnorm 1.52 | tok/s 5,160 | VRAM 119GB (58%) | ETA 32.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1950/5000 | loss 5.4571 | ppl 234.4 | lr 2.14e-05 | gnorm 1.20 | tok/s 5,160 | VRAM 119GB (58%) | ETA 32.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1960/5000 | loss 5.4911 | ppl 242.5 | lr 2.26e-05 | gnorm 1.27 | tok/s 5,106 | VRAM 119GB (58%) | ETA 32.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1970/5000 | loss 5.6330 | ppl 279.5 | lr 2.39e-05 | gnorm 1.71 | tok/s 5,106 | VRAM 119GB (58%) | ETA 32.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1980/5000 | loss 5.4440 | ppl 231.4 | lr 2.50e-05 | gnorm 1.47 | tok/s 5,106 | VRAM 119GB (58%) | ETA 32.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 1990/5000 | loss 5.4148 | ppl 224.7 | lr 2.50e-05 | gnorm 1.99 | tok/s 5,105 | VRAM 119GB (58%) | ETA 32.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2000/5000 | loss 5.4218 | ppl 226.3 | lr 2.50e-05 | gnorm 1.95 | tok/s 5,106 | VRAM 119GB (58%) | ETA 32.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.4088 ppl=223.4 (best=5.3969)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2000, full state + optimizer)
>> MILESTONE step 2000 LOCKED → /mnt/scratch/checkpoints/frankenstein_v2_milestone_2000.pt
step 2010/5000 | loss 5.4288 | ppl 227.9 | lr 2.50e-05 | gnorm 1.47 | tok/s 5,106 | VRAM 119GB (58%) | ETA 32.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2020/5000 | loss 5.4680 | ppl 237.0 | lr 2.50e-05 | gnorm 1.41 | tok/s 5,106 | VRAM 119GB (58%) | ETA 31.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2030/5000 | loss 5.4027 | ppl 222.0 | lr 2.50e-05 | gnorm 1.52 | tok/s 5,107 | VRAM 119GB (58%) | ETA 31.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2040/5000 | loss 5.5772 | ppl 264.3 | lr 2.50e-05 | gnorm 1.46 | tok/s 5,106 | VRAM 119GB (58%) | ETA 31.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2050/5000 | loss 5.4623 | ppl 235.6 | lr 2.49e-05 | gnorm 1.43 | tok/s 5,107 | VRAM 119GB (58%) | ETA 31.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2060/5000 | loss 5.6075 | ppl 272.5 | lr 2.49e-05 | gnorm 1.66 | tok/s 5,106 | VRAM 119GB (58%) | ETA 31.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2070/5000 | loss 5.4214 | ppl 226.2 | lr 2.49e-05 | gnorm 1.64 | tok/s 5,106 | VRAM 119GB (58%) | ETA 31.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2080/5000 | loss 5.3698 | ppl 214.8 | lr 2.49e-05 | gnorm 1.35 | tok/s 5,105 | VRAM 119GB (58%) | ETA 31.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2090/5000 | loss 5.3795 | ppl 216.9 | lr 2.49e-05 | gnorm 1.62 | tok/s 5,104 | VRAM 119GB (58%) | ETA 31.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2100/5000 | loss 5.4180 | ppl 225.4 | lr 2.48e-05 | gnorm 1.45 | tok/s 5,102 | VRAM 119GB (58%) | ETA 31.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.3973 ppl=220.8 (best=5.3969)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2100, full state + optimizer)
step 2110/5000 | loss 5.4504 | ppl 232.9 | lr 2.48e-05 | gnorm 1.77 | tok/s 5,226 | VRAM 119GB (58%) | ETA 30.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2120/5000 | loss 5.3458 | ppl 209.7 | lr 2.48e-05 | gnorm 1.35 | tok/s 5,357 | VRAM 119GB (58%) | ETA 29.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2130/5000 | loss 5.2910 | ppl 198.5 | lr 2.48e-05 | gnorm 1.52 | tok/s 5,496 | VRAM 119GB (58%) | ETA 28.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2140/5000 | loss 5.4221 | ppl 226.4 | lr 2.47e-05 | gnorm 1.45 | tok/s 5,643 | VRAM 119GB (58%) | ETA 27.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2150/5000 | loss 5.5331 | ppl 252.9 | lr 2.47e-05 | gnorm 1.34 | tok/s 5,799 | VRAM 119GB (58%) | ETA 26.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2160/5000 | loss 5.4783 | ppl 239.4 | lr 2.47e-05 | gnorm 1.49 | tok/s 5,801 | VRAM 119GB (58%) | ETA 26.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2170/5000 | loss 5.4636 | ppl 235.9 | lr 2.46e-05 | gnorm 1.26 | tok/s 5,802 | VRAM 119GB (58%) | ETA 26.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2180/5000 | loss 5.5501 | ppl 257.3 | lr 2.46e-05 | gnorm 1.55 | tok/s 5,803 | VRAM 119GB (58%) | ETA 26.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2190/5000 | loss 5.4925 | ppl 242.9 | lr 2.45e-05 | gnorm 1.69 | tok/s 5,804 | VRAM 119GB (58%) | ETA 26.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2200/5000 | loss 5.4496 | ppl 232.7 | lr 2.45e-05 | gnorm 1.43 | tok/s 5,801 | VRAM 119GB (58%) | ETA 26.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.4529 ppl=233.4 (best=5.3969)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2200, full state + optimizer)
step 2210/5000 | loss 5.5127 | ppl 247.8 | lr 2.44e-05 | gnorm 1.30 | tok/s 5,802 | VRAM 119GB (58%) | ETA 26.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2220/5000 | loss 5.6177 | ppl 275.3 | lr 2.44e-05 | gnorm 1.77 | tok/s 5,799 | VRAM 119GB (58%) | ETA 26.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2230/5000 | loss 5.5439 | ppl 255.7 | lr 2.43e-05 | gnorm 1.44 | tok/s 5,796 | VRAM 119GB (58%) | ETA 26.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2240/5000 | loss 5.5208 | ppl 249.8 | lr 2.43e-05 | gnorm 1.66 | tok/s 5,794 | VRAM 119GB (58%) | ETA 26.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2250/5000 | loss 5.3475 | ppl 210.1 | lr 2.42e-05 | gnorm 1.31 | tok/s 5,795 | VRAM 119GB (58%) | ETA 25.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2260/5000 | loss 5.4192 | ppl 225.7 | lr 2.42e-05 | gnorm 1.48 | tok/s 5,793 | VRAM 119GB (58%) | ETA 25.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2270/5000 | loss 5.3100 | ppl 202.4 | lr 2.41e-05 | gnorm 1.54 | tok/s 5,792 | VRAM 119GB (58%) | ETA 25.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2280/5000 | loss 5.5559 | ppl 258.8 | lr 2.41e-05 | gnorm 1.25 | tok/s 5,792 | VRAM 119GB (58%) | ETA 25.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2290/5000 | loss 5.3483 | ppl 210.3 | lr 2.40e-05 | gnorm 1.50 | tok/s 5,792 | VRAM 119GB (58%) | ETA 25.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2300/5000 | loss 5.3261 | ppl 205.6 | lr 2.39e-05 | gnorm 1.72 | tok/s 5,792 | VRAM 119GB (58%) | ETA 25.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.4253 ppl=227.1 (best=5.3969)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2300, full state + optimizer)
step 2310/5000 | loss 5.5541 | ppl 258.3 | lr 2.39e-05 | gnorm 1.62 | tok/s 5,792 | VRAM 119GB (58%) | ETA 25.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2320/5000 | loss 5.4427 | ppl 231.1 | lr 2.38e-05 | gnorm 1.51 | tok/s 5,795 | VRAM 119GB (58%) | ETA 25.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2330/5000 | loss 5.4080 | ppl 223.2 | lr 2.37e-05 | gnorm 1.51 | tok/s 5,795 | VRAM 119GB (58%) | ETA 25.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2340/5000 | loss 5.6911 | ppl 296.2 | lr 2.37e-05 | gnorm 1.88 | tok/s 5,786 | VRAM 119GB (58%) | ETA 25.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2350/5000 | loss 5.3312 | ppl 206.7 | lr 2.36e-05 | gnorm 1.43 | tok/s 5,779 | VRAM 119GB (58%) | ETA 25.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2360/5000 | loss 5.3899 | ppl 219.2 | lr 2.35e-05 | gnorm 1.48 | tok/s 5,772 | VRAM 119GB (58%) | ETA 25.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2370/5000 | loss 5.2940 | ppl 199.1 | lr 2.35e-05 | gnorm 1.30 | tok/s 5,765 | VRAM 119GB (58%) | ETA 24.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2380/5000 | loss 5.4886 | ppl 241.9 | lr 2.34e-05 | gnorm 1.41 | tok/s 5,760 | VRAM 119GB (58%) | ETA 24.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2390/5000 | loss 5.3869 | ppl 218.5 | lr 2.33e-05 | gnorm 1.41 | tok/s 5,759 | VRAM 119GB (58%) | ETA 24.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2400/5000 | loss 5.5158 | ppl 248.6 | lr 2.32e-05 | gnorm 1.42 | tok/s 5,767 | VRAM 119GB (58%) | ETA 24.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.4569 ppl=234.4 (best=5.3969)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2400, full state + optimizer)
step 2410/5000 | loss 5.4799 | ppl 239.8 | lr 2.32e-05 | gnorm 1.37 | tok/s 5,772 | VRAM 119GB (58%) | ETA 24.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2420/5000 | loss 5.4761 | ppl 238.9 | lr 2.31e-05 | gnorm 1.62 | tok/s 5,778 | VRAM 119GB (58%) | ETA 24.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2430/5000 | loss 5.3421 | ppl 208.9 | lr 2.30e-05 | gnorm 1.17 | tok/s 5,781 | VRAM 119GB (58%) | ETA 24.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2440/5000 | loss 5.4317 | ppl 228.5 | lr 2.29e-05 | gnorm 1.51 | tok/s 5,790 | VRAM 119GB (58%) | ETA 24.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2450/5000 | loss 5.5262 | ppl 251.2 | lr 2.28e-05 | gnorm 1.63 | tok/s 5,792 | VRAM 119GB (58%) | ETA 24.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2460/5000 | loss 5.3786 | ppl 216.7 | lr 2.27e-05 | gnorm 1.29 | tok/s 5,769 | VRAM 119GB (58%) | ETA 24.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2470/5000 | loss 5.1914 | ppl 179.7 | lr 2.26e-05 | gnorm 1.53 | tok/s 5,619 | VRAM 119GB (58%) | ETA 24.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2480/5000 | loss 5.4186 | ppl 225.6 | lr 2.26e-05 | gnorm 2.28 | tok/s 5,479 | VRAM 119GB (58%) | ETA 25.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2490/5000 | loss 5.3507 | ppl 210.8 | lr 2.25e-05 | gnorm 1.38 | tok/s 5,343 | VRAM 119GB (58%) | ETA 25.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2500/5000 | loss 5.5629 | ppl 260.6 | lr 2.24e-05 | gnorm 1.82 | tok/s 5,212 | VRAM 119GB (58%) | ETA 26.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.3151 ppl=203.4 ★ NEW BEST → saved (+ EMA + full optimizer)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2500, full state + optimizer)
step 2510/5000 | loss 5.4881 | ppl 241.8 | lr 2.23e-05 | gnorm 1.70 | tok/s 5,233 | VRAM 119GB (58%) | ETA 26.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2520/5000 | loss 5.5318 | ppl 252.6 | lr 2.22e-05 | gnorm 1.57 | tok/s 5,365 | VRAM 119GB (58%) | ETA 25.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2530/5000 | loss 5.2904 | ppl 198.4 | lr 2.21e-05 | gnorm 1.32 | tok/s 5,503 | VRAM 119GB (58%) | ETA 24.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2540/5000 | loss 5.3733 | ppl 215.6 | lr 2.20e-05 | gnorm 1.50 | tok/s 5,649 | VRAM 119GB (58%) | ETA 23.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2550/5000 | loss 5.2482 | ppl 190.2 | lr 2.19e-05 | gnorm 1.33 | tok/s 5,802 | VRAM 119GB (58%) | ETA 23.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2560/5000 | loss 5.3167 | ppl 203.7 | lr 2.18e-05 | gnorm 1.31 | tok/s 5,801 | VRAM 119GB (58%) | ETA 23.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2570/5000 | loss 5.4672 | ppl 236.8 | lr 2.17e-05 | gnorm 1.70 | tok/s 5,801 | VRAM 119GB (58%) | ETA 22.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2580/5000 | loss 5.2871 | ppl 197.8 | lr 2.16e-05 | gnorm 1.20 | tok/s 5,800 | VRAM 119GB (58%) | ETA 22.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2590/5000 | loss 5.5072 | ppl 246.5 | lr 2.15e-05 | gnorm 1.24 | tok/s 5,800 | VRAM 119GB (58%) | ETA 22.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2600/5000 | loss 5.4751 | ppl 238.7 | lr 2.14e-05 | gnorm 1.52 | tok/s 5,800 | VRAM 119GB (58%) | ETA 22.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.2780 ppl=196.0 ★ NEW BEST → saved (+ EMA + full optimizer)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2600, full state + optimizer)
step 2610/5000 | loss 5.4770 | ppl 239.1 | lr 2.13e-05 | gnorm 1.53 | tok/s 5,661 | VRAM 119GB (58%) | ETA 23.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2620/5000 | loss 5.5508 | ppl 257.5 | lr 2.12e-05 | gnorm 1.20 | tok/s 5,514 | VRAM 119GB (58%) | ETA 23.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2630/5000 | loss 5.5070 | ppl 246.4 | lr 2.11e-05 | gnorm 1.48 | tok/s 5,374 | VRAM 119GB (58%) | ETA 24.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2640/5000 | loss 5.3541 | ppl 211.5 | lr 2.10e-05 | gnorm 1.52 | tok/s 5,240 | VRAM 119GB (58%) | ETA 24.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 2650/5000 | loss 5.2494 | ppl 190.4 | lr 2.09e-05 | gnorm 1.45 | tok/s 5,114 | VRAM 119GB (58%) | ETA 25.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2660/5000 | loss 5.4428 | ppl 231.1 | lr 2.08e-05 | gnorm 1.55 | tok/s 5,102 | VRAM 119GB (58%) | ETA 25.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2670/5000 | loss 5.3762 | ppl 216.2 | lr 2.07e-05 | gnorm 1.45 | tok/s 5,102 | VRAM 119GB (58%) | ETA 24.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2680/5000 | loss 5.4470 | ppl 232.1 | lr 2.06e-05 | gnorm 2.48 | tok/s 5,102 | VRAM 119GB (58%) | ETA 24.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2690/5000 | loss 5.5052 | ppl 246.0 | lr 2.05e-05 | gnorm 1.78 | tok/s 5,101 | VRAM 119GB (58%) | ETA 24.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2700/5000 | loss 5.3143 | ppl 203.2 | lr 2.04e-05 | gnorm 1.28 | tok/s 5,101 | VRAM 119GB (58%) | ETA 24.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.3290 ppl=206.2 (best=5.2780)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2700, full state + optimizer)
step 2710/5000 | loss 5.4231 | ppl 226.6 | lr 2.03e-05 | gnorm 1.38 | tok/s 5,143 | VRAM 119GB (58%) | ETA 24.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2720/5000 | loss 5.5591 | ppl 259.6 | lr 2.02e-05 | gnorm 1.53 | tok/s 5,144 | VRAM 119GB (58%) | ETA 24.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2730/5000 | loss 5.4649 | ppl 236.2 | lr 2.01e-05 | gnorm 1.17 | tok/s 5,145 | VRAM 119GB (58%) | ETA 24.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2740/5000 | loss 5.3355 | ppl 207.6 | lr 2.00e-05 | gnorm 1.28 | tok/s 5,147 | VRAM 119GB (58%) | ETA 24.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2750/5000 | loss 5.2728 | ppl 195.0 | lr 1.99e-05 | gnorm 1.85 | tok/s 5,148 | VRAM 119GB (58%) | ETA 23.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2760/5000 | loss 5.4497 | ppl 232.7 | lr 1.98e-05 | gnorm 1.47 | tok/s 5,105 | VRAM 119GB (58%) | ETA 24.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2770/5000 | loss 5.4827 | ppl 240.5 | lr 1.97e-05 | gnorm 1.34 | tok/s 5,104 | VRAM 119GB (58%) | ETA 23.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2780/5000 | loss 5.4144 | ppl 224.6 | lr 1.96e-05 | gnorm 1.55 | tok/s 5,104 | VRAM 119GB (58%) | ETA 23.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2790/5000 | loss 5.5006 | ppl 244.8 | lr 1.95e-05 | gnorm 1.24 | tok/s 5,103 | VRAM 119GB (58%) | ETA 23.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2800/5000 | loss 5.4444 | ppl 231.5 | lr 1.94e-05 | gnorm 1.48 | tok/s 5,100 | VRAM 119GB (58%) | ETA 23.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.3721 ppl=215.3 (best=5.2780)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2800, full state + optimizer)
step 2810/5000 | loss 5.3741 | ppl 215.7 | lr 1.93e-05 | gnorm 1.17 | tok/s 5,224 | VRAM 119GB (58%) | ETA 22.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2820/5000 | loss 5.6197 | ppl 275.8 | lr 1.92e-05 | gnorm 2.17 | tok/s 5,355 | VRAM 119GB (58%) | ETA 22.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2830/5000 | loss 5.3919 | ppl 219.6 | lr 1.91e-05 | gnorm 1.27 | tok/s 5,493 | VRAM 119GB (58%) | ETA 21.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2840/5000 | loss 5.3163 | ppl 203.6 | lr 1.90e-05 | gnorm 1.41 | tok/s 5,638 | VRAM 119GB (58%) | ETA 20.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2850/5000 | loss 5.4198 | ppl 225.8 | lr 1.89e-05 | gnorm 1.44 | tok/s 5,794 | VRAM 119GB (58%) | ETA 20.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2860/5000 | loss 5.2469 | ppl 190.0 | lr 1.88e-05 | gnorm 1.37 | tok/s 5,796 | VRAM 119GB (58%) | ETA 20.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2870/5000 | loss 5.5315 | ppl 252.5 | lr 1.87e-05 | gnorm 1.55 | tok/s 5,797 | VRAM 119GB (58%) | ETA 20.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2880/5000 | loss 5.4689 | ppl 237.2 | lr 1.86e-05 | gnorm 1.25 | tok/s 5,798 | VRAM 119GB (58%) | ETA 20.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2890/5000 | loss 5.2487 | ppl 190.3 | lr 1.85e-05 | gnorm 1.24 | tok/s 5,797 | VRAM 119GB (58%) | ETA 19.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2900/5000 | loss 5.3440 | ppl 209.3 | lr 1.84e-05 | gnorm 1.41 | tok/s 5,797 | VRAM 119GB (58%) | ETA 19.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.3601 ppl=212.7 (best=5.2780)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2900, full state + optimizer)
step 2910/5000 | loss 5.2046 | ppl 182.1 | lr 1.83e-05 | gnorm 1.49 | tok/s 5,798 | VRAM 119GB (58%) | ETA 19.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2920/5000 | loss 5.1114 | ppl 165.9 | lr 1.82e-05 | gnorm 1.38 | tok/s 5,798 | VRAM 119GB (58%) | ETA 19.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2930/5000 | loss 5.3641 | ppl 213.6 | lr 1.81e-05 | gnorm 1.46 | tok/s 5,798 | VRAM 119GB (58%) | ETA 19.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2940/5000 | loss 5.5069 | ppl 246.4 | lr 1.80e-05 | gnorm 1.45 | tok/s 5,798 | VRAM 119GB (58%) | ETA 19.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2950/5000 | loss 5.2951 | ppl 199.4 | lr 1.79e-05 | gnorm 1.23 | tok/s 5,799 | VRAM 119GB (58%) | ETA 19.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2960/5000 | loss 5.3876 | ppl 218.7 | lr 1.78e-05 | gnorm 1.40 | tok/s 5,799 | VRAM 119GB (58%) | ETA 19.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2970/5000 | loss 5.4361 | ppl 229.5 | lr 1.77e-05 | gnorm 1.27 | tok/s 5,799 | VRAM 119GB (58%) | ETA 19.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2980/5000 | loss 5.3690 | ppl 214.7 | lr 1.76e-05 | gnorm 1.24 | tok/s 5,799 | VRAM 119GB (58%) | ETA 19.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 2990/5000 | loss 5.3255 | ppl 205.5 | lr 1.75e-05 | gnorm 1.56 | tok/s 5,799 | VRAM 119GB (58%) | ETA 18.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3000/5000 | loss 5.3644 | ppl 213.7 | lr 1.74e-05 | gnorm 2.14 | tok/s 5,798 | VRAM 119GB (58%) | ETA 18.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.2933 ppl=199.0 (best=5.2780)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3000, full state + optimizer)
>> MILESTONE step 3000 LOCKED → /mnt/scratch/checkpoints/frankenstein_v2_milestone_3000.pt
step 3010/5000 | loss 5.3409 | ppl 208.7 | lr 1.74e-05 | gnorm 1.38 | tok/s 5,799 | VRAM 119GB (58%) | ETA 18.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3020/5000 | loss 5.3559 | ppl 211.8 | lr 1.73e-05 | gnorm 1.37 | tok/s 5,799 | VRAM 119GB (58%) | ETA 18.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3030/5000 | loss 5.2910 | ppl 198.5 | lr 1.72e-05 | gnorm 1.23 | tok/s 5,801 | VRAM 119GB (58%) | ETA 18.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3040/5000 | loss 5.4000 | ppl 221.4 | lr 1.71e-05 | gnorm 1.20 | tok/s 5,802 | VRAM 119GB (58%) | ETA 18.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3050/5000 | loss 5.3424 | ppl 209.0 | lr 1.70e-05 | gnorm 1.34 | tok/s 5,803 | VRAM 119GB (58%) | ETA 18.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3060/5000 | loss 5.2612 | ppl 192.7 | lr 1.69e-05 | gnorm 1.39 | tok/s 5,803 | VRAM 119GB (58%) | ETA 18.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3070/5000 | loss 5.4217 | ppl 226.3 | lr 1.68e-05 | gnorm 1.41 | tok/s 5,803 | VRAM 119GB (58%) | ETA 18.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3080/5000 | loss 5.3010 | ppl 200.5 | lr 1.68e-05 | gnorm 1.29 | tok/s 5,802 | VRAM 119GB (58%) | ETA 18.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3090/5000 | loss 5.3627 | ppl 213.3 | lr 1.67e-05 | gnorm 1.23 | tok/s 5,802 | VRAM 119GB (58%) | ETA 18.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3100/5000 | loss 5.3135 | ppl 203.1 | lr 1.66e-05 | gnorm 1.12 | tok/s 5,802 | VRAM 119GB (58%) | ETA 17.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.2799 ppl=196.4 (best=5.2780)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3100, full state + optimizer)
step 3110/5000 | loss 5.4153 | ppl 224.8 | lr 1.65e-05 | gnorm 1.43 | tok/s 5,803 | VRAM 119GB (58%) | ETA 17.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3120/5000 | loss 5.3672 | ppl 214.3 | lr 1.65e-05 | gnorm 1.20 | tok/s 5,803 | VRAM 119GB (58%) | ETA 17.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3130/5000 | loss 5.4669 | ppl 236.7 | lr 1.64e-05 | gnorm 1.30 | tok/s 5,802 | VRAM 119GB (58%) | ETA 17.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3140/5000 | loss 5.4039 | ppl 222.3 | lr 1.63e-05 | gnorm 1.24 | tok/s 5,802 | VRAM 119GB (58%) | ETA 17.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3150/5000 | loss 5.2486 | ppl 190.3 | lr 1.63e-05 | gnorm 1.16 | tok/s 5,802 | VRAM 119GB (58%) | ETA 17.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3160/5000 | loss 5.3892 | ppl 219.0 | lr 1.62e-05 | gnorm 1.38 | tok/s 5,802 | VRAM 119GB (58%) | ETA 17.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3170/5000 | loss 5.4131 | ppl 224.3 | lr 1.61e-05 | gnorm 1.31 | tok/s 5,801 | VRAM 119GB (58%) | ETA 17.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3180/5000 | loss 5.3139 | ppl 203.1 | lr 1.61e-05 | gnorm 1.34 | tok/s 5,804 | VRAM 119GB (58%) | ETA 17.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3190/5000 | loss 5.3839 | ppl 217.9 | lr 1.60e-05 | gnorm 1.19 | tok/s 5,804 | VRAM 119GB (58%) | ETA 17.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3200/5000 | loss 5.4903 | ppl 242.3 | lr 1.59e-05 | gnorm 1.38 | tok/s 5,804 | VRAM 119GB (58%) | ETA 16.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.4004 ppl=221.5 (best=5.2780)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3200, full state + optimizer)
step 3210/5000 | loss 5.2450 | ppl 189.6 | lr 1.59e-05 | gnorm 1.24 | tok/s 5,804 | VRAM 119GB (58%) | ETA 16.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3220/5000 | loss 5.2446 | ppl 189.5 | lr 1.58e-05 | gnorm 1.69 | tok/s 5,803 | VRAM 119GB (58%) | ETA 16.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3230/5000 | loss 5.4406 | ppl 230.6 | lr 1.58e-05 | gnorm 1.26 | tok/s 5,801 | VRAM 119GB (58%) | ETA 16.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3240/5000 | loss 5.2381 | ppl 188.3 | lr 1.57e-05 | gnorm 1.20 | tok/s 5,800 | VRAM 119GB (58%) | ETA 16.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3250/5000 | loss 5.4181 | ppl 225.4 | lr 1.57e-05 | gnorm 1.26 | tok/s 5,798 | VRAM 119GB (58%) | ETA 16.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3260/5000 | loss 5.4927 | ppl 242.9 | lr 1.56e-05 | gnorm 1.42 | tok/s 5,797 | VRAM 119GB (58%) | ETA 16.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3270/5000 | loss 5.3934 | ppl 220.0 | lr 1.56e-05 | gnorm 1.38 | tok/s 5,797 | VRAM 119GB (58%) | ETA 16.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3280/5000 | loss 5.5290 | ppl 251.9 | lr 1.55e-05 | gnorm 1.24 | tok/s 5,798 | VRAM 119GB (58%) | ETA 16.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3290/5000 | loss 5.5065 | ppl 246.3 | lr 1.55e-05 | gnorm 1.44 | tok/s 5,798 | VRAM 119GB (58%) | ETA 16.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3300/5000 | loss 5.4718 | ppl 237.9 | lr 1.54e-05 | gnorm 1.27 | tok/s 5,799 | VRAM 119GB (58%) | ETA 16.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.3304 ppl=206.5 (best=5.2780)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3300, full state + optimizer)
step 3310/5000 | loss 5.5244 | ppl 250.7 | lr 1.54e-05 | gnorm 1.25 | tok/s 5,649 | VRAM 119GB (58%) | ETA 16.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3320/5000 | loss 5.3519 | ppl 211.0 | lr 1.53e-05 | gnorm 1.36 | tok/s 5,504 | VRAM 119GB (58%) | ETA 16.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3330/5000 | loss 5.3470 | ppl 210.0 | lr 1.53e-05 | gnorm 1.41 | tok/s 5,367 | VRAM 119GB (58%) | ETA 17.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3340/5000 | loss 5.4374 | ppl 229.8 | lr 1.53e-05 | gnorm 1.27 | tok/s 5,235 | VRAM 119GB (58%) | ETA 17.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3350/5000 | loss 5.2242 | ppl 185.7 | lr 1.52e-05 | gnorm 1.44 | tok/s 5,109 | VRAM 119GB (58%) | ETA 17.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3360/5000 | loss 5.2989 | ppl 200.1 | lr 1.52e-05 | gnorm 1.40 | tok/s 5,107 | VRAM 119GB (58%) | ETA 17.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3370/5000 | loss 5.4252 | ppl 227.0 | lr 1.52e-05 | gnorm 1.23 | tok/s 5,106 | VRAM 119GB (58%) | ETA 17.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3380/5000 | loss 5.3276 | ppl 205.9 | lr 1.52e-05 | gnorm 1.49 | tok/s 5,106 | VRAM 119GB (58%) | ETA 17.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3390/5000 | loss 5.3532 | ppl 211.3 | lr 1.51e-05 | gnorm 1.36 | tok/s 5,107 | VRAM 119GB (58%) | ETA 17.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3400/5000 | loss 5.3068 | ppl 201.7 | lr 1.51e-05 | gnorm 1.52 | tok/s 5,107 | VRAM 119GB (58%) | ETA 17.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.3810 ppl=217.2 (best=5.2780)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3400, full state + optimizer)
step 3410/5000 | loss 5.3361 | ppl 207.7 | lr 1.51e-05 | gnorm 1.18 | tok/s 5,232 | VRAM 119GB (58%) | ETA 16.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3420/5000 | loss 5.3199 | ppl 204.4 | lr 1.51e-05 | gnorm 1.35 | tok/s 5,364 | VRAM 119GB (58%) | ETA 16.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3430/5000 | loss 5.3233 | ppl 205.1 | lr 1.51e-05 | gnorm 1.18 | tok/s 5,502 | VRAM 119GB (58%) | ETA 15.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3440/5000 | loss 5.3585 | ppl 212.4 | lr 1.50e-05 | gnorm 1.52 | tok/s 5,647 | VRAM 119GB (58%) | ETA 15.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3450/5000 | loss 5.3438 | ppl 209.3 | lr 1.50e-05 | gnorm 1.23 | tok/s 5,801 | VRAM 119GB (58%) | ETA 14.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3460/5000 | loss 5.3168 | ppl 203.7 | lr 1.50e-05 | gnorm 1.30 | tok/s 5,801 | VRAM 119GB (58%) | ETA 14.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3470/5000 | loss 5.2030 | ppl 181.8 | lr 1.50e-05 | gnorm 1.28 | tok/s 5,802 | VRAM 119GB (58%) | ETA 14.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3480/5000 | loss 5.3649 | ppl 213.8 | lr 1.50e-05 | gnorm 1.52 | tok/s 5,802 | VRAM 119GB (58%) | ETA 14.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3490/5000 | loss 5.3546 | ppl 211.6 | lr 1.50e-05 | gnorm 1.18 | tok/s 5,801 | VRAM 119GB (58%) | ETA 14.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3500/5000 | loss 5.3840 | ppl 217.9 | lr 1.50e-05 | gnorm 1.26 | tok/s 5,802 | VRAM 119GB (58%) | ETA 14.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.3131 ppl=203.0 (best=5.2780)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3500, full state + optimizer)
step 3510/5000 | loss 5.4281 | ppl 227.7 | lr 1.56e-05 | gnorm 1.52 | tok/s 5,801 | VRAM 119GB (58%) | ETA 14.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3520/5000 | loss 5.3775 | ppl 216.5 | lr 1.61e-05 | gnorm 1.29 | tok/s 5,801 | VRAM 119GB (58%) | ETA 13.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3530/5000 | loss 5.4500 | ppl 232.8 | lr 1.66e-05 | gnorm 1.25 | tok/s 5,801 | VRAM 119GB (58%) | ETA 13.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3540/5000 | loss 5.2171 | ppl 184.4 | lr 1.71e-05 | gnorm 1.27 | tok/s 5,800 | VRAM 119GB (58%) | ETA 13.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3550/5000 | loss 5.3167 | ppl 203.7 | lr 1.75e-05 | gnorm 1.41 | tok/s 5,801 | VRAM 119GB (58%) | ETA 13.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3560/5000 | loss 5.4094 | ppl 223.5 | lr 1.81e-05 | gnorm 1.34 | tok/s 5,801 | VRAM 119GB (58%) | ETA 13.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3570/5000 | loss 5.2578 | ppl 192.1 | lr 1.86e-05 | gnorm 1.38 | tok/s 5,800 | VRAM 119GB (58%) | ETA 13.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3580/5000 | loss 5.3372 | ppl 207.9 | lr 1.91e-05 | gnorm 1.16 | tok/s 5,800 | VRAM 119GB (58%) | ETA 13.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3590/5000 | loss 5.2401 | ppl 188.7 | lr 1.96e-05 | gnorm 1.39 | tok/s 5,800 | VRAM 119GB (58%) | ETA 13.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3600/5000 | loss 5.4146 | ppl 224.7 | lr 2.00e-05 | gnorm 1.57 | tok/s 5,799 | VRAM 119GB (58%) | ETA 13.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.3594 ppl=212.6 (best=5.2780)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3600, full state + optimizer)
step 3610/5000 | loss 5.3050 | ppl 201.3 | lr 2.00e-05 | gnorm 1.45 | tok/s 5,799 | VRAM 119GB (58%) | ETA 13.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3620/5000 | loss 5.5146 | ppl 248.3 | lr 2.00e-05 | gnorm 1.53 | tok/s 5,800 | VRAM 119GB (58%) | ETA 13.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3630/5000 | loss 5.4447 | ppl 231.5 | lr 2.00e-05 | gnorm 1.62 | tok/s 5,800 | VRAM 119GB (58%) | ETA 12.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3640/5000 | loss 5.3102 | ppl 202.4 | lr 2.00e-05 | gnorm 1.59 | tok/s 5,799 | VRAM 119GB (58%) | ETA 12.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3650/5000 | loss 5.5038 | ppl 245.6 | lr 2.00e-05 | gnorm 1.31 | tok/s 5,799 | VRAM 119GB (58%) | ETA 12.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3660/5000 | loss 5.4059 | ppl 222.7 | lr 2.00e-05 | gnorm 1.48 | tok/s 5,799 | VRAM 119GB (58%) | ETA 12.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3670/5000 | loss 5.3952 | ppl 220.3 | lr 2.00e-05 | gnorm 1.60 | tok/s 5,798 | VRAM 119GB (58%) | ETA 12.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3680/5000 | loss 5.3411 | ppl 208.7 | lr 2.00e-05 | gnorm 1.50 | tok/s 5,799 | VRAM 119GB (58%) | ETA 12.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3690/5000 | loss 5.4777 | ppl 239.3 | lr 1.99e-05 | gnorm 1.41 | tok/s 5,800 | VRAM 119GB (58%) | ETA 12.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3700/5000 | loss 5.4171 | ppl 225.2 | lr 1.99e-05 | gnorm 1.80 | tok/s 5,800 | VRAM 119GB (58%) | ETA 12.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.1359 ppl=170.0 ★ NEW BEST → saved (+ EMA + full optimizer)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3700, full state + optimizer)
step 3710/5000 | loss 5.3428 | ppl 209.1 | lr 1.99e-05 | gnorm 1.38 | tok/s 5,799 | VRAM 119GB (58%) | ETA 12.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3720/5000 | loss 5.3910 | ppl 219.4 | lr 1.99e-05 | gnorm 1.45 | tok/s 5,800 | VRAM 119GB (58%) | ETA 12.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3730/5000 | loss 5.4490 | ppl 232.5 | lr 1.99e-05 | gnorm 1.39 | tok/s 5,799 | VRAM 119GB (58%) | ETA 12.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3740/5000 | loss 5.4098 | ppl 223.6 | lr 1.99e-05 | gnorm 1.27 | tok/s 5,799 | VRAM 119GB (58%) | ETA 11.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3750/5000 | loss 5.3475 | ppl 210.1 | lr 1.99e-05 | gnorm 1.31 | tok/s 5,799 | VRAM 119GB (58%) | ETA 11.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3760/5000 | loss 5.3055 | ppl 201.4 | lr 1.98e-05 | gnorm 1.39 | tok/s 5,800 | VRAM 119GB (58%) | ETA 11.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3770/5000 | loss 5.4678 | ppl 236.9 | lr 1.98e-05 | gnorm 1.23 | tok/s 5,727 | VRAM 119GB (58%) | ETA 11.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3780/5000 | loss 5.3639 | ppl 213.6 | lr 1.98e-05 | gnorm 1.44 | tok/s 5,579 | VRAM 119GB (58%) | ETA 11.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3790/5000 | loss 5.3999 | ppl 221.4 | lr 1.98e-05 | gnorm 1.28 | tok/s 5,438 | VRAM 119GB (58%) | ETA 12.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3800/5000 | loss 5.4344 | ppl 229.2 | lr 1.98e-05 | gnorm 1.35 | tok/s 5,304 | VRAM 119GB (58%) | ETA 12.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.3704 ppl=214.9 (best=5.1359)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3800, full state + optimizer)
step 3810/5000 | loss 5.4967 | ppl 243.9 | lr 1.97e-05 | gnorm 1.23 | tok/s 5,305 | VRAM 119GB (58%) | ETA 12.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3820/5000 | loss 5.2320 | ppl 187.2 | lr 1.97e-05 | gnorm 1.29 | tok/s 5,367 | VRAM 119GB (58%) | ETA 12.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3830/5000 | loss 5.1748 | ppl 176.8 | lr 1.97e-05 | gnorm 1.30 | tok/s 5,504 | VRAM 119GB (58%) | ETA 11.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3840/5000 | loss 5.2995 | ppl 200.2 | lr 1.96e-05 | gnorm 1.52 | tok/s 5,650 | VRAM 119GB (58%) | ETA 11.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3850/5000 | loss 5.4423 | ppl 231.0 | lr 1.96e-05 | gnorm 1.49 | tok/s 5,803 | VRAM 119GB (58%) | ETA 10.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3860/5000 | loss 5.3073 | ppl 201.8 | lr 1.96e-05 | gnorm 1.13 | tok/s 5,802 | VRAM 119GB (58%) | ETA 10.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3870/5000 | loss 5.4388 | ppl 230.2 | lr 1.96e-05 | gnorm 1.30 | tok/s 5,802 | VRAM 119GB (58%) | ETA 10.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3880/5000 | loss 5.3153 | ppl 203.4 | lr 1.95e-05 | gnorm 1.48 | tok/s 5,802 | VRAM 119GB (58%) | ETA 10.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3890/5000 | loss 5.2094 | ppl 183.0 | lr 1.95e-05 | gnorm 1.41 | tok/s 5,801 | VRAM 119GB (58%) | ETA 10.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3900/5000 | loss 5.3008 | ppl 200.5 | lr 1.95e-05 | gnorm 1.34 | tok/s 5,801 | VRAM 119GB (58%) | ETA 10.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.3238 ppl=205.2 (best=5.1359)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3900, full state + optimizer)
step 3910/5000 | loss 5.3193 | ppl 204.2 | lr 1.94e-05 | gnorm 1.52 | tok/s 5,665 | VRAM 119GB (58%) | ETA 10.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3920/5000 | loss 5.4208 | ppl 226.1 | lr 1.94e-05 | gnorm 1.45 | tok/s 5,518 | VRAM 119GB (58%) | ETA 10.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3930/5000 | loss 5.3002 | ppl 200.4 | lr 1.93e-05 | gnorm 1.27 | tok/s 5,378 | VRAM 119GB (58%) | ETA 10.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3940/5000 | loss 5.2733 | ppl 195.1 | lr 1.93e-05 | gnorm 1.37 | tok/s 5,246 | VRAM 119GB (58%) | ETA 11.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3950/5000 | loss 5.3122 | ppl 202.8 | lr 1.93e-05 | gnorm 1.45 | tok/s 5,120 | VRAM 119GB (58%) | ETA 11.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3960/5000 | loss 5.3368 | ppl 207.8 | lr 1.92e-05 | gnorm 1.55 | tok/s 5,106 | VRAM 119GB (58%) | ETA 11.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3970/5000 | loss 5.3380 | ppl 208.1 | lr 1.92e-05 | gnorm 1.30 | tok/s 5,106 | VRAM 119GB (58%) | ETA 11.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3980/5000 | loss 5.3551 | ppl 211.7 | lr 1.91e-05 | gnorm 1.45 | tok/s 5,106 | VRAM 119GB (58%) | ETA 10.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 3990/5000 | loss 5.3380 | ppl 208.1 | lr 1.91e-05 | gnorm 1.48 | tok/s 5,106 | VRAM 119GB (58%) | ETA 10.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4000/5000 | loss 5.2200 | ppl 184.9 | lr 1.91e-05 | gnorm 1.62 | tok/s 5,106 | VRAM 119GB (58%) | ETA 10.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.2679 ppl=194.0 (best=5.1359)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4000, full state + optimizer)
>> MILESTONE step 4000 LOCKED → /mnt/scratch/checkpoints/frankenstein_v2_milestone_4000.pt
step 4010/5000 | loss 5.2624 | ppl 192.9 | lr 1.90e-05 | gnorm 1.23 | tok/s 5,231 | VRAM 119GB (58%) | ETA 10.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4020/5000 | loss 5.3521 | ppl 211.1 | lr 1.90e-05 | gnorm 1.54 | tok/s 5,363 | VRAM 119GB (58%) | ETA 10.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4030/5000 | loss 5.3664 | ppl 214.1 | lr 1.89e-05 | gnorm 1.34 | tok/s 5,501 | VRAM 119GB (58%) | ETA 9.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4040/5000 | loss 5.2996 | ppl 200.2 | lr 1.89e-05 | gnorm 1.35 | tok/s 5,647 | VRAM 119GB (58%) | ETA 9.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4050/5000 | loss 5.2745 | ppl 195.3 | lr 1.88e-05 | gnorm 1.32 | tok/s 5,801 | VRAM 119GB (58%) | ETA 8.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4060/5000 | loss 5.2770 | ppl 195.8 | lr 1.88e-05 | gnorm 1.58 | tok/s 5,802 | VRAM 119GB (58%) | ETA 8.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4070/5000 | loss 5.3921 | ppl 219.7 | lr 1.87e-05 | gnorm 1.35 | tok/s 5,803 | VRAM 119GB (58%) | ETA 8.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4080/5000 | loss 5.2833 | ppl 197.0 | lr 1.87e-05 | gnorm 1.59 | tok/s 5,803 | VRAM 119GB (58%) | ETA 8.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4090/5000 | loss 5.4707 | ppl 237.6 | lr 1.86e-05 | gnorm 1.24 | tok/s 5,803 | VRAM 119GB (58%) | ETA 8.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4100/5000 | loss 5.3238 | ppl 205.2 | lr 1.86e-05 | gnorm 1.59 | tok/s 5,804 | VRAM 119GB (58%) | ETA 8.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.3574 ppl=212.2 (best=5.1359)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4100, full state + optimizer)
step 4110/5000 | loss 5.3696 | ppl 214.8 | lr 1.85e-05 | gnorm 1.28 | tok/s 5,804 | VRAM 119GB (58%) | ETA 8.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4120/5000 | loss 5.2697 | ppl 194.3 | lr 1.85e-05 | gnorm 1.65 | tok/s 5,804 | VRAM 119GB (58%) | ETA 8.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4130/5000 | loss 5.2788 | ppl 196.1 | lr 1.84e-05 | gnorm 1.10 | tok/s 5,803 | VRAM 119GB (58%) | ETA 8.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4140/5000 | loss 5.3446 | ppl 209.5 | lr 1.84e-05 | gnorm 1.30 | tok/s 5,802 | VRAM 119GB (58%) | ETA 8.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4150/5000 | loss 5.1764 | ppl 177.0 | lr 1.83e-05 | gnorm 1.35 | tok/s 5,802 | VRAM 119GB (58%) | ETA 8.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4160/5000 | loss 5.2608 | ppl 192.6 | lr 1.83e-05 | gnorm 1.51 | tok/s 5,801 | VRAM 119GB (58%) | ETA 7.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4170/5000 | loss 5.3678 | ppl 214.4 | lr 1.82e-05 | gnorm 1.24 | tok/s 5,799 | VRAM 119GB (58%) | ETA 7.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4180/5000 | loss 5.2964 | ppl 199.6 | lr 1.82e-05 | gnorm 1.54 | tok/s 5,799 | VRAM 119GB (58%) | ETA 7.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4190/5000 | loss 5.4885 | ppl 241.9 | lr 1.81e-05 | gnorm 1.15 | tok/s 5,800 | VRAM 119GB (58%) | ETA 7.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4200/5000 | loss 5.4148 | ppl 224.7 | lr 1.81e-05 | gnorm 1.41 | tok/s 5,798 | VRAM 119GB (58%) | ETA 7.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.2861 ppl=197.6 (best=5.1359)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4200, full state + optimizer)
step 4210/5000 | loss 5.3041 | ppl 201.2 | lr 1.80e-05 | gnorm 1.33 | tok/s 5,799 | VRAM 119GB (58%) | ETA 7.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4220/5000 | loss 5.3657 | ppl 213.9 | lr 1.79e-05 | gnorm 1.62 | tok/s 5,798 | VRAM 119GB (58%) | ETA 7.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4230/5000 | loss 5.3724 | ppl 215.4 | lr 1.79e-05 | gnorm 1.45 | tok/s 5,799 | VRAM 119GB (58%) | ETA 7.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4240/5000 | loss 5.2941 | ppl 199.2 | lr 1.78e-05 | gnorm 1.55 | tok/s 5,799 | VRAM 119GB (58%) | ETA 7.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4250/5000 | loss 5.4146 | ppl 224.7 | lr 1.78e-05 | gnorm 1.48 | tok/s 5,799 | VRAM 119GB (58%) | ETA 7.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4260/5000 | loss 5.4285 | ppl 227.8 | lr 1.77e-05 | gnorm 1.37 | tok/s 5,800 | VRAM 119GB (58%) | ETA 7.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73]
step 4270/5000 | loss 5.2140 | ppl 183.8 | lr 1.77e-05 | gnorm 1.35 | tok/s 5,801 | VRAM 119GB (58%) | ETA 6.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4280/5000 | loss 5.1953 | ppl 180.4 | lr 1.76e-05 | gnorm 1.50 | tok/s 5,800 | VRAM 119GB (58%) | ETA 6.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4290/5000 | loss 5.2776 | ppl 195.9 | lr 1.76e-05 | gnorm 1.45 | tok/s 5,800 | VRAM 119GB (58%) | ETA 6.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4300/5000 | loss 5.3880 | ppl 218.8 | lr 1.75e-05 | gnorm 1.26 | tok/s 5,800 | VRAM 119GB (58%) | ETA 6.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.2932 ppl=199.0 (best=5.1359)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4300, full state + optimizer)
step 4310/5000 | loss 5.2318 | ppl 187.1 | lr 1.74e-05 | gnorm 1.37 | tok/s 5,800 | VRAM 119GB (58%) | ETA 6.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4320/5000 | loss 5.1866 | ppl 178.9 | lr 1.74e-05 | gnorm 1.57 | tok/s 5,799 | VRAM 119GB (58%) | ETA 6.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4330/5000 | loss 5.2669 | ppl 193.8 | lr 1.73e-05 | gnorm 1.50 | tok/s 5,799 | VRAM 119GB (58%) | ETA 6.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4340/5000 | loss 5.2029 | ppl 181.8 | lr 1.73e-05 | gnorm 1.37 | tok/s 5,798 | VRAM 119GB (58%) | ETA 6.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4350/5000 | loss 5.2440 | ppl 189.4 | lr 1.72e-05 | gnorm 1.26 | tok/s 5,798 | VRAM 119GB (58%) | ETA 6.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4360/5000 | loss 5.4278 | ppl 227.7 | lr 1.72e-05 | gnorm 1.44 | tok/s 5,798 | VRAM 119GB (58%) | ETA 6.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4370/5000 | loss 5.3880 | ppl 218.8 | lr 1.71e-05 | gnorm 1.33 | tok/s 5,799 | VRAM 119GB (58%) | ETA 5.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4380/5000 | loss 5.3397 | ppl 208.5 | lr 1.71e-05 | gnorm 1.72 | tok/s 5,799 | VRAM 119GB (58%) | ETA 5.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4390/5000 | loss 5.3737 | ppl 215.7 | lr 1.70e-05 | gnorm 1.26 | tok/s 5,799 | VRAM 119GB (58%) | ETA 5.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4400/5000 | loss 5.3451 | ppl 209.6 | lr 1.69e-05 | gnorm 1.29 | tok/s 5,799 | VRAM 119GB (58%) | ETA 5.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.2215 ppl=185.2 (best=5.1359)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4400, full state + optimizer)
step 4410/5000 | loss 5.2720 | ppl 194.8 | lr 1.69e-05 | gnorm 1.45 | tok/s 5,648 | VRAM 119GB (58%) | ETA 5.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4420/5000 | loss 5.3076 | ppl 201.9 | lr 1.68e-05 | gnorm 1.42 | tok/s 5,501 | VRAM 119GB (58%) | ETA 5.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4430/5000 | loss 5.3635 | ppl 213.5 | lr 1.68e-05 | gnorm 1.46 | tok/s 5,363 | VRAM 119GB (58%) | ETA 5.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4440/5000 | loss 5.4320 | ppl 228.6 | lr 1.67e-05 | gnorm 1.56 | tok/s 5,232 | VRAM 119GB (58%) | ETA 5.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4450/5000 | loss 5.3418 | ppl 208.9 | lr 1.67e-05 | gnorm 1.40 | tok/s 5,107 | VRAM 119GB (58%) | ETA 5.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4460/5000 | loss 5.3990 | ppl 221.2 | lr 1.66e-05 | gnorm 1.55 | tok/s 5,104 | VRAM 119GB (58%) | ETA 5.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4470/5000 | loss 5.2649 | ppl 193.4 | lr 1.66e-05 | gnorm 1.26 | tok/s 5,105 | VRAM 119GB (58%) | ETA 5.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4480/5000 | loss 5.3802 | ppl 217.1 | lr 1.65e-05 | gnorm 1.33 | tok/s 5,105 | VRAM 119GB (58%) | ETA 5.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4490/5000 | loss 5.1792 | ppl 177.5 | lr 1.65e-05 | gnorm 1.45 | tok/s 5,104 | VRAM 119GB (58%) | ETA 5.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4500/5000 | loss 5.4291 | ppl 227.9 | lr 1.64e-05 | gnorm 1.23 | tok/s 5,104 | VRAM 119GB (58%) | ETA 5.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.2665 ppl=193.7 (best=5.1359)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4500, full state + optimizer)
step 4510/5000 | loss 5.2796 | ppl 196.3 | lr 1.64e-05 | gnorm 1.22 | tok/s 5,231 | VRAM 119GB (58%) | ETA 5.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4520/5000 | loss 5.2812 | ppl 196.6 | lr 1.63e-05 | gnorm 1.88 | tok/s 5,362 | VRAM 119GB (58%) | ETA 4.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4530/5000 | loss 5.2610 | ppl 192.7 | lr 1.63e-05 | gnorm 1.48 | tok/s 5,498 | VRAM 119GB (58%) | ETA 4.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4540/5000 | loss 5.3610 | ppl 212.9 | lr 1.62e-05 | gnorm 1.61 | tok/s 5,643 | VRAM 119GB (58%) | ETA 4.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4550/5000 | loss 5.2436 | ppl 189.3 | lr 1.62e-05 | gnorm 1.21 | tok/s 5,797 | VRAM 119GB (58%) | ETA 4.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4560/5000 | loss 5.3148 | ppl 203.3 | lr 1.61e-05 | gnorm 1.47 | tok/s 5,796 | VRAM 119GB (58%) | ETA 4.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4570/5000 | loss 5.3149 | ppl 203.3 | lr 1.61e-05 | gnorm 2.30 | tok/s 5,796 | VRAM 119GB (58%) | ETA 4.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4580/5000 | loss 5.3789 | ppl 216.8 | lr 1.60e-05 | gnorm 1.30 | tok/s 5,798 | VRAM 119GB (58%) | ETA 4.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4590/5000 | loss 5.3713 | ppl 215.1 | lr 1.60e-05 | gnorm 1.54 | tok/s 5,800 | VRAM 119GB (58%) | ETA 3.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4600/5000 | loss 5.3317 | ppl 206.8 | lr 1.59e-05 | gnorm 1.30 | tok/s 5,800 | VRAM 119GB (58%) | ETA 3.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.2795 ppl=196.3 (best=5.1359)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4600, full state + optimizer)
step 4610/5000 | loss 5.2330 | ppl 187.4 | lr 1.59e-05 | gnorm 1.36 | tok/s 5,800 | VRAM 119GB (58%) | ETA 3.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4620/5000 | loss 5.2370 | ppl 188.1 | lr 1.59e-05 | gnorm 1.22 | tok/s 5,800 | VRAM 119GB (58%) | ETA 3.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4630/5000 | loss 5.3151 | ppl 203.4 | lr 1.58e-05 | gnorm 1.45 | tok/s 5,800 | VRAM 119GB (58%) | ETA 3.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4640/5000 | loss 5.2922 | ppl 198.8 | lr 1.58e-05 | gnorm 1.48 | tok/s 5,799 | VRAM 119GB (58%) | ETA 3.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4650/5000 | loss 5.3756 | ppl 216.1 | lr 1.57e-05 | gnorm 1.49 | tok/s 5,696 | VRAM 119GB (58%) | ETA 3.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4660/5000 | loss 5.2680 | ppl 194.0 | lr 1.57e-05 | gnorm 1.09 | tok/s 5,547 | VRAM 119GB (58%) | ETA 3.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4670/5000 | loss 5.4040 | ppl 222.3 | lr 1.57e-05 | gnorm 1.57 | tok/s 5,408 | VRAM 119GB (58%) | ETA 3.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4680/5000 | loss 5.3783 | ppl 216.6 | lr 1.56e-05 | gnorm 1.31 | tok/s 5,274 | VRAM 119GB (58%) | ETA 3.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4690/5000 | loss 5.1276 | ppl 168.6 | lr 1.56e-05 | gnorm 1.34 | tok/s 5,147 | VRAM 119GB (58%) | ETA 3.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4700/5000 | loss 5.3363 | ppl 207.7 | lr 1.55e-05 | gnorm 1.16 | tok/s 5,106 | VRAM 119GB (58%) | ETA 3.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.2675 ppl=193.9 (best=5.1359)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4700, full state + optimizer)
step 4710/5000 | loss 5.2201 | ppl 185.0 | lr 1.55e-05 | gnorm 1.32 | tok/s 5,162 | VRAM 119GB (58%) | ETA 3.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4720/5000 | loss 5.3301 | ppl 206.5 | lr 1.55e-05 | gnorm 1.71 | tok/s 5,160 | VRAM 119GB (58%) | ETA 3.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4730/5000 | loss 5.3909 | ppl 219.4 | lr 1.54e-05 | gnorm 1.31 | tok/s 5,161 | VRAM 119GB (58%) | ETA 2.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4740/5000 | loss 5.2943 | ppl 199.2 | lr 1.54e-05 | gnorm 1.38 | tok/s 5,160 | VRAM 119GB (58%) | ETA 2.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4750/5000 | loss 5.3384 | ppl 208.2 | lr 1.54e-05 | gnorm 1.46 | tok/s 5,160 | VRAM 119GB (58%) | ETA 2.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4760/5000 | loss 5.2979 | ppl 199.9 | lr 1.54e-05 | gnorm 1.33 | tok/s 5,103 | VRAM 119GB (58%) | ETA 2.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4770/5000 | loss 5.3215 | ppl 204.7 | lr 1.53e-05 | gnorm 1.38 | tok/s 5,103 | VRAM 119GB (58%) | ETA 2.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4780/5000 | loss 5.3302 | ppl 206.5 | lr 1.53e-05 | gnorm 1.26 | tok/s 5,103 | VRAM 119GB (58%) | ETA 2.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4790/5000 | loss 5.3284 | ppl 206.1 | lr 1.53e-05 | gnorm 1.42 | tok/s 5,102 | VRAM 119GB (58%) | ETA 2.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4800/5000 | loss 5.4404 | ppl 230.5 | lr 1.52e-05 | gnorm 4.03 | tok/s 5,099 | VRAM 119GB (58%) | ETA 2.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.2457 ppl=189.7 (best=5.1359)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4800, full state + optimizer)
step 4810/5000 | loss 5.3835 | ppl 217.8 | lr 1.52e-05 | gnorm 1.70 | tok/s 5,223 | VRAM 119GB (58%) | ETA 2.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4820/5000 | loss 5.2967 | ppl 199.7 | lr 1.52e-05 | gnorm 1.25 | tok/s 5,355 | VRAM 119GB (58%) | ETA 1.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4830/5000 | loss 5.3657 | ppl 213.9 | lr 1.52e-05 | gnorm 1.38 | tok/s 5,494 | VRAM 119GB (58%) | ETA 1.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4840/5000 | loss 5.3776 | ppl 216.5 | lr 1.52e-05 | gnorm 1.31 | tok/s 5,641 | VRAM 119GB (58%) | ETA 1.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4850/5000 | loss 5.2128 | ppl 183.6 | lr 1.51e-05 | gnorm 1.30 | tok/s 5,798 | VRAM 119GB (58%) | ETA 1.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4860/5000 | loss 5.3282 | ppl 206.1 | lr 1.51e-05 | gnorm 2.41 | tok/s 5,799 | VRAM 119GB (58%) | ETA 1.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4870/5000 | loss 5.4324 | ppl 228.7 | lr 1.51e-05 | gnorm 1.20 | tok/s 5,799 | VRAM 119GB (58%) | ETA 1.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4880/5000 | loss 5.3635 | ppl 213.5 | lr 1.51e-05 | gnorm 1.22 | tok/s 5,800 | VRAM 119GB (58%) | ETA 1.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4890/5000 | loss 5.3306 | ppl 206.6 | lr 1.51e-05 | gnorm 1.13 | tok/s 5,799 | VRAM 119GB (58%) | ETA 1.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4900/5000 | loss 5.3643 | ppl 213.6 | lr 1.51e-05 | gnorm 1.07 | tok/s 5,799 | VRAM 119GB (58%) | ETA 0.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
>> EVAL: val_loss=5.2650 ppl=193.4 (best=5.1359)
>> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4900, full state + optimizer)
step 4910/5000 | loss 5.2922 | ppl 198.8 | lr 1.51e-05 | gnorm 1.30 | tok/s 5,800 | VRAM 119GB (58%) | ETA 0.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4920/5000 | loss 5.4508 | ppl 232.9 | lr 1.50e-05 | gnorm 1.67 | tok/s 5,800 | VRAM 119GB (58%) | ETA 0.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4930/5000 | loss 5.3588 | ppl 212.5 | lr 1.50e-05 | gnorm 1.20 | tok/s 5,798 | VRAM 119GB (58%) | ETA 0.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4940/5000 | loss 5.3917 | ppl 219.6 | lr 1.50e-05 | gnorm 1.53 | tok/s 5,799 | VRAM 119GB (58%) | ETA 0.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4950/5000 | loss 5.2779 | ppl 196.0 | lr 1.50e-05 | gnorm 1.73 | tok/s 5,798 | VRAM 119GB (58%) | ETA 0.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4960/5000 | loss 5.3938 | ppl 220.0 | lr 1.50e-05 | gnorm 1.26 | tok/s 5,796 | VRAM 119GB (58%) | ETA 0.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4970/5000 | loss 5.3809 | ppl 217.2 | lr 1.50e-05 | gnorm 1.12 | tok/s 5,797 | VRAM 119GB (58%) | ETA 0.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4980/5000 | loss 5.3169 | ppl 203.8 | lr 1.50e-05 | gnorm 1.31 | tok/s 5,798 | VRAM 119GB (58%) | ETA 0.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
step 4990/5000 | loss 5.2999 | ppl 200.3 | lr 1.50e-05 | gnorm 1.34 | tok/s 5,797 | VRAM 119GB (58%) | ETA 0.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73]
======================================================================
REALIGNMENT v2 COMPLETE
======================================================================
Steps: 5000
Total tokens: 0.98B
Best val_loss: 5.1359
Total time: 50.3h
Final: /mnt/scratch/checkpoints/frankenstein_v2_final.pt
Best: /mnt/scratch/checkpoints/frankenstein_v2_best.pt
EMA best: /mnt/scratch/checkpoints/frankenstein_v2_ema_best.pt
|