| ====================================================================== |
| FRANKENSTEIN REALIGNMENT v2 — FRESH START |
| ====================================================================== |
| Raw merge: /mnt/scratch/checkpoints/sentinel_prime_frankenstein.pt |
| Steps: 5000 |
| Unfreeze at: step 500 |
| Batch: 8 × 6 = 48 |
| Seq len: 4096 |
| Phase 1 LR: 0.0001 → 3e-05 (warmup 100) |
| Phase 2: SGDR 5 cycles, expert_scale=0.3 |
| aux_loss: 0.05, z_loss: 0.002 (from step 0) |
| EMA: decay=0.9995, every 10 steps |
| Eff tokens/step: 196,608 |
|
|
| [1/5] Building model... |
| 14.40B parameters |
|
|
| [2/5] Loading raw merge: /mnt/scratch/checkpoints/sentinel_prime_frankenstein.pt |
| Merge loaded. |
| ✓ Router aux_loss_weight = 0.05 (all layers) |
| Merge meta: Sentinel Prime (Frankenstein Edition) |
| attention_norms: NousResearch/Hermes-3-Llama-3.1-8B |
| ffn_experts_0_2: Salesforce/xLAM-7b-fc-r |
| ffn_experts_1_3: deepseek-ai/deepseek-coder-6.7b-instruct |
| embeddings: SentinelBrain-14B-MoE-v0.1 (original) |
| router: SentinelBrain-14B-MoE-v0.1 (original) |
| VRAM after load: 28.8GB |
|
|
| Enabling gradient checkpointing... |
| Gradient checkpointing enabled for 24 layers |
|
|
| [3/5] Progressive unfreezing setup... |
| Froze 192 expert params. Trainable: 290/482 |
|
|
| [4/5] Loading training data from /mnt/scratch/shards |
| [train] 1710 shards, 16.48B tokens |
| [val] 160 shards, 0.86B tokens |
|
|
| [5/5] Setting up optimizer (Phase 1)... |
| Trainable: 5.75B / 14.40B |
| Optimizer: AdamW (decay: 241, no-decay: 49) |
|
|
| Initial evaluation... |
| Initial val_loss=15.8210, val_ppl=7429962.9 |
| ✓ EMA initialized (482 params on CPU) |
|
|
| ====================================================================== |
| STARTING TRAINING |
| ====================================================================== |
| Batch: 8 x 6 = 48 effective |
| Tokens/step: 196,608 |
| VRAM: 28.9/206GB (14%) |
| SGDR cycles: 5 |
| Cycle 0: steps 500-700 (T=200), peak=5.0e-05, ramp=30 |
| Cycle 1: steps 700-1100 (T=400), peak=4.0e-05, ramp=40 |
| Cycle 2: steps 1100-1900 (T=800), peak=3.0e-05, ramp=60 |
| Cycle 3: steps 1900-3500 (T=1600), peak=2.5e-05, ramp=80 |
| Cycle 4: steps 3500-5000 (T=1500), peak=2.0e-05, ramp=100 |
| step 0/5000 | loss 15.8368 | ppl 7548307.7 | lr 1.00e-06 | gnorm 12.75 | tok/s 4,544 | VRAM 67GB (32%) | ETA 60.1h [FROZEN] | [E0:26% E1:24% E2:25% E3:24%] CF=[1.03 0.98 1.02 0.97] |
| step 10/5000 | loss 14.6608 | ppl 2328555.1 | lr 1.10e-05 | gnorm 13.44 | tok/s 6,258 | VRAM 67GB (32%) | ETA 43.5h [FROZEN] | [E0:26% E1:25% E2:25% E3:24%] CF=[1.04 0.98 1.01 0.96] |
| step 20/5000 | loss 12.4423 | ppl 253288.5 | lr 2.10e-05 | gnorm 13.94 | tok/s 6,368 | VRAM 67GB (32%) | ETA 42.7h [FROZEN] | [E0:28% E1:24% E2:25% E3:23%] CF=[1.11 0.95 1.02 0.92] |
| step 30/5000 | loss 10.6901 | ppl 43919.2 | lr 3.10e-05 | gnorm 5.16 | tok/s 6,412 | VRAM 67GB (32%) | ETA 42.3h [FROZEN] | [E0:30% E1:21% E2:26% E3:23%] CF=[1.19 0.84 1.06 0.92] |
| step 40/5000 | loss 9.5284 | ppl 13745.1 | lr 4.10e-05 | gnorm 3.09 | tok/s 6,438 | VRAM 67GB (32%) | ETA 42.1h [FROZEN] | [E0:31% E1:19% E2:28% E3:21%] CF=[1.24 0.77 1.13 0.85] |
| step 50/5000 | loss 8.8601 | ppl 7045.2 | lr 5.10e-05 | gnorm 3.36 | tok/s 6,512 | VRAM 67GB (32%) | ETA 41.5h [FROZEN] | [E0:30% E1:20% E2:29% E3:21%] CF=[1.18 0.81 1.17 0.84] |
| step 60/5000 | loss 8.2434 | ppl 3802.4 | lr 6.10e-05 | gnorm 3.39 | tok/s 6,521 | VRAM 67GB (32%) | ETA 41.4h [FROZEN] | [E0:29% E1:20% E2:32% E3:18%] CF=[1.18 0.79 1.29 0.74] |
| step 70/5000 | loss 8.0612 | ppl 3169.1 | lr 7.10e-05 | gnorm 2.05 | tok/s 6,528 | VRAM 67GB (32%) | ETA 41.2h [FROZEN] | [E0:31% E1:18% E2:33% E3:18%] CF=[1.24 0.72 1.32 0.72] |
| step 80/5000 | loss 7.7648 | ppl 2356.2 | lr 8.10e-05 | gnorm 4.00 | tok/s 6,532 | VRAM 67GB (32%) | ETA 41.1h [FROZEN] | [E0:31% E1:18% E2:33% E3:18%] CF=[1.25 0.73 1.31 0.71] |
| step 90/5000 | loss 7.6206 | ppl 2039.9 | lr 9.10e-05 | gnorm 2.97 | tok/s 6,533 | VRAM 67GB (32%) | ETA 41.0h [FROZEN] | [E0:31% E1:18% E2:32% E3:19%] CF=[1.25 0.72 1.29 0.74] |
| step 100/5000 | loss 7.4868 | ppl 1784.4 | lr 1.00e-04 | gnorm 4.38 | tok/s 6,530 | VRAM 67GB (32%) | ETA 41.0h [FROZEN] | [E0:31% E1:18% E2:32% E3:19%] CF=[1.25 0.72 1.28 0.75] |
| >> EVAL: val_loss=7.3564 ppl=1566.2 ★ NEW BEST → saved (+ EMA + full optimizer) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 100, full state + optimizer) |
| step 110/5000 | loss 7.3392 | ppl 1539.5 | lr 9.99e-05 | gnorm 2.39 | tok/s 6,524 | VRAM 67GB (32%) | ETA 40.9h [FROZEN] | [E0:31% E1:18% E2:32% E3:18%] CF=[1.25 0.73 1.28 0.74] |
| step 120/5000 | loss 7.2058 | ppl 1347.2 | lr 9.96e-05 | gnorm 3.05 | tok/s 6,517 | VRAM 67GB (32%) | ETA 40.9h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.26 0.73 1.27 0.73] |
| step 130/5000 | loss 7.1589 | ppl 1285.5 | lr 9.90e-05 | gnorm 3.62 | tok/s 6,507 | VRAM 67GB (32%) | ETA 40.9h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.26 0.73 1.27 0.74] |
| step 140/5000 | loss 6.9785 | ppl 1073.3 | lr 9.83e-05 | gnorm 3.08 | tok/s 6,483 | VRAM 67GB (32%) | ETA 40.9h [FROZEN] | [E0:32% E1:18% E2:32% E3:19%] CF=[1.26 0.72 1.27 0.74] |
| step 150/5000 | loss 6.9154 | ppl 1007.7 | lr 9.73e-05 | gnorm 2.44 | tok/s 6,472 | VRAM 67GB (32%) | ETA 40.9h [FROZEN] | [E0:32% E1:18% E2:32% E3:19%] CF=[1.27 0.72 1.27 0.74] |
| step 160/5000 | loss 6.7309 | ppl 837.9 | lr 9.62e-05 | gnorm 2.78 | tok/s 6,443 | VRAM 67GB (32%) | ETA 41.0h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.27 0.73] |
| step 170/5000 | loss 6.7939 | ppl 892.4 | lr 9.48e-05 | gnorm 3.16 | tok/s 6,437 | VRAM 67GB (32%) | ETA 41.0h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.27 0.74] |
| step 180/5000 | loss 6.7680 | ppl 869.6 | lr 9.33e-05 | gnorm 3.95 | tok/s 6,420 | VRAM 67GB (32%) | ETA 41.0h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.27 0.74] |
| step 190/5000 | loss 6.6892 | ppl 803.7 | lr 9.16e-05 | gnorm 1.92 | tok/s 6,430 | VRAM 67GB (32%) | ETA 40.9h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.26 0.74] |
| step 200/5000 | loss 6.5718 | ppl 714.6 | lr 8.97e-05 | gnorm 2.44 | tok/s 6,405 | VRAM 67GB (32%) | ETA 40.9h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.26 0.74] |
| >> EVAL: val_loss=6.4265 ppl=618.0 ★ NEW BEST → saved (+ EMA + full optimizer) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 200, full state + optimizer) |
| step 210/5000 | loss 6.3597 | ppl 578.1 | lr 8.77e-05 | gnorm 1.84 | tok/s 6,423 | VRAM 67GB (32%) | ETA 40.7h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.26 0.74] |
| step 220/5000 | loss 6.4999 | ppl 665.1 | lr 8.56e-05 | gnorm 3.22 | tok/s 6,417 | VRAM 67GB (32%) | ETA 40.7h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.26 0.74] |
| step 230/5000 | loss 6.3102 | ppl 550.1 | lr 8.33e-05 | gnorm 3.12 | tok/s 6,414 | VRAM 67GB (32%) | ETA 40.6h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.74] |
| step 240/5000 | loss 6.2094 | ppl 497.4 | lr 8.09e-05 | gnorm 2.61 | tok/s 6,406 | VRAM 67GB (32%) | ETA 40.6h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.26 0.74] |
| step 250/5000 | loss 6.1780 | ppl 482.0 | lr 7.84e-05 | gnorm 2.05 | tok/s 6,428 | VRAM 67GB (32%) | ETA 40.4h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| >> MILESTONE step 250 LOCKED → /mnt/scratch/checkpoints/frankenstein_v2_milestone_250.pt |
| step 260/5000 | loss 6.3846 | ppl 592.7 | lr 7.58e-05 | gnorm 1.93 | tok/s 6,426 | VRAM 67GB (32%) | ETA 40.3h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 270/5000 | loss 6.2122 | ppl 498.8 | lr 7.32e-05 | gnorm 1.94 | tok/s 6,427 | VRAM 67GB (32%) | ETA 40.2h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 280/5000 | loss 6.1144 | ppl 452.3 | lr 7.05e-05 | gnorm 1.58 | tok/s 6,427 | VRAM 67GB (32%) | ETA 40.1h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.27 0.73 1.26 0.73] |
| step 290/5000 | loss 6.0846 | ppl 439.0 | lr 6.77e-05 | gnorm 1.45 | tok/s 6,428 | VRAM 67GB (32%) | ETA 40.0h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 300/5000 | loss 6.0901 | ppl 441.5 | lr 6.50e-05 | gnorm 1.68 | tok/s 6,420 | VRAM 67GB (32%) | ETA 40.0h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| >> EVAL: val_loss=6.1254 ppl=457.3 ★ NEW BEST → saved (+ EMA + full optimizer) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 300, full state + optimizer) |
| step 310/5000 | loss 6.1131 | ppl 451.8 | lr 6.23e-05 | gnorm 1.55 | tok/s 6,421 | VRAM 67GB (32%) | ETA 39.9h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 320/5000 | loss 6.0769 | ppl 435.7 | lr 5.95e-05 | gnorm 1.51 | tok/s 6,420 | VRAM 67GB (32%) | ETA 39.8h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 330/5000 | loss 6.0861 | ppl 439.7 | lr 5.68e-05 | gnorm 1.85 | tok/s 6,437 | VRAM 67GB (32%) | ETA 39.6h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 340/5000 | loss 5.9268 | ppl 375.0 | lr 5.42e-05 | gnorm 1.74 | tok/s 6,449 | VRAM 67GB (32%) | ETA 39.5h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 350/5000 | loss 5.9748 | ppl 393.4 | lr 5.16e-05 | gnorm 1.25 | tok/s 6,462 | VRAM 67GB (32%) | ETA 39.3h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 360/5000 | loss 6.0018 | ppl 404.2 | lr 4.91e-05 | gnorm 1.42 | tok/s 6,468 | VRAM 67GB (32%) | ETA 39.2h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 370/5000 | loss 5.9796 | ppl 395.3 | lr 4.67e-05 | gnorm 1.62 | tok/s 6,473 | VRAM 67GB (32%) | ETA 39.1h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 380/5000 | loss 6.0423 | ppl 420.8 | lr 4.44e-05 | gnorm 1.37 | tok/s 6,450 | VRAM 67GB (32%) | ETA 39.1h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 390/5000 | loss 5.9497 | ppl 383.6 | lr 4.23e-05 | gnorm 1.46 | tok/s 6,444 | VRAM 67GB (32%) | ETA 39.1h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 400/5000 | loss 5.9011 | ppl 365.4 | lr 4.03e-05 | gnorm 1.16 | tok/s 6,431 | VRAM 67GB (32%) | ETA 39.1h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| >> EVAL: val_loss=5.8662 ppl=352.9 ★ NEW BEST → saved (+ EMA + full optimizer) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 400, full state + optimizer) |
| step 410/5000 | loss 5.9157 | ppl 370.8 | lr 3.84e-05 | gnorm 1.04 | tok/s 6,428 | VRAM 67GB (32%) | ETA 39.0h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 420/5000 | loss 5.8687 | ppl 353.8 | lr 3.67e-05 | gnorm 1.44 | tok/s 6,425 | VRAM 67GB (32%) | ETA 38.9h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 430/5000 | loss 5.8683 | ppl 353.7 | lr 3.52e-05 | gnorm 1.38 | tok/s 6,449 | VRAM 67GB (32%) | ETA 38.7h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 440/5000 | loss 5.8722 | ppl 355.0 | lr 3.38e-05 | gnorm 1.61 | tok/s 6,443 | VRAM 67GB (32%) | ETA 38.7h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 450/5000 | loss 5.8595 | ppl 350.6 | lr 3.27e-05 | gnorm 1.23 | tok/s 6,453 | VRAM 67GB (32%) | ETA 38.5h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 460/5000 | loss 5.8211 | ppl 337.3 | lr 3.17e-05 | gnorm 1.14 | tok/s 6,455 | VRAM 67GB (32%) | ETA 38.4h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 470/5000 | loss 5.9151 | ppl 370.6 | lr 3.10e-05 | gnorm 1.26 | tok/s 6,460 | VRAM 67GB (32%) | ETA 38.3h [FROZEN] | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 480/5000 | loss 5.8819 | ppl 358.5 | lr 3.04e-05 | gnorm 1.43 | tok/s 6,462 | VRAM 67GB (32%) | ETA 38.2h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 490/5000 | loss 5.8794 | ppl 357.6 | lr 3.01e-05 | gnorm 1.21 | tok/s 6,471 | VRAM 67GB (32%) | ETA 38.1h [FROZEN] | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
|
|
| >>> Step 500: UNFREEZING EXPERTS <<< |
| Unfroze all 482 params. |
| ✓ Pre-unfreeze checkpoint LOCKED: /mnt/scratch/checkpoints/frankenstein_v2_pre_unfreeze.pt |
| ✓ Pre-unfreeze FULL checkpoint LOCKED: /mnt/scratch/checkpoints/frankenstein_v2_pre_unfreeze_full.pt |
| Expert: 8.66B @ lr=4.85e-06 |
| Base: 5.75B @ lr=1.62e-05 |
| Spike guard: 3.0× EMA (tightened) |
| step 500/5000 | loss 5.9927 | ppl 400.5 | lr 1.62e-05 | gnorm 1.45 | tok/s 6,205 | VRAM 119GB (58%) | ETA 39.6h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| >> EVAL: val_loss=5.8451 ppl=345.5 ★ NEW BEST → saved (+ EMA + full optimizer) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 500, full state + optimizer) |
| >> MILESTONE step 500 LOCKED → /mnt/scratch/checkpoints/frankenstein_v2_milestone_500.pt |
| step 510/5000 | loss 5.7749 | ppl 322.1 | lr 2.78e-05 | gnorm 1.70 | tok/s 6,068 | VRAM 119GB (58%) | ETA 40.4h C0 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| >> MILESTONE step 510 LOCKED → /mnt/scratch/checkpoints/frankenstein_v2_milestone_510.pt |
| step 520/5000 | loss 5.7268 | ppl 307.0 | lr 3.95e-05 | gnorm 1.78 | tok/s 5,934 | VRAM 119GB (58%) | ETA 41.2h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 530/5000 | loss 5.8261 | ppl 339.0 | lr 5.00e-05 | gnorm 2.03 | tok/s 5,809 | VRAM 119GB (58%) | ETA 42.0h C0 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 540/5000 | loss 5.9697 | ppl 391.4 | lr 4.97e-05 | gnorm 1.78 | tok/s 5,690 | VRAM 119GB (58%) | ETA 42.8h C0 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 550/5000 | loss 5.7818 | ppl 324.3 | lr 4.88e-05 | gnorm 1.50 | tok/s 5,790 | VRAM 119GB (58%) | ETA 42.0h C0 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 560/5000 | loss 6.0418 | ppl 420.7 | lr 4.74e-05 | gnorm 1.41 | tok/s 5,791 | VRAM 119GB (58%) | ETA 41.9h C0 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 570/5000 | loss 5.7851 | ppl 325.4 | lr 4.54e-05 | gnorm 1.52 | tok/s 5,781 | VRAM 119GB (58%) | ETA 41.8h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 580/5000 | loss 5.9342 | ppl 377.7 | lr 4.30e-05 | gnorm 1.90 | tok/s 5,782 | VRAM 119GB (58%) | ETA 41.8h C0 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 590/5000 | loss 5.7303 | ppl 308.1 | lr 4.03e-05 | gnorm 1.63 | tok/s 5,779 | VRAM 119GB (58%) | ETA 41.7h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 600/5000 | loss 5.9043 | ppl 366.6 | lr 3.73e-05 | gnorm 1.30 | tok/s 5,779 | VRAM 119GB (58%) | ETA 41.6h C0 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| >> EVAL: val_loss=5.7113 ppl=302.3 ★ NEW BEST → saved (+ EMA + full optimizer) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 600, full state + optimizer) |
| step 610/5000 | loss 5.7160 | ppl 303.7 | lr 3.41e-05 | gnorm 1.27 | tok/s 5,780 | VRAM 119GB (58%) | ETA 41.5h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 620/5000 | loss 5.6105 | ppl 273.3 | lr 3.09e-05 | gnorm 1.41 | tok/s 5,793 | VRAM 119GB (58%) | ETA 41.3h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 630/5000 | loss 5.5641 | ppl 260.9 | lr 2.77e-05 | gnorm 1.41 | tok/s 5,795 | VRAM 119GB (58%) | ETA 41.2h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 640/5000 | loss 5.5701 | ppl 262.5 | lr 2.47e-05 | gnorm 1.95 | tok/s 5,798 | VRAM 119GB (58%) | ETA 41.1h C0 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 650/5000 | loss 5.6485 | ppl 283.9 | lr 2.20e-05 | gnorm 1.25 | tok/s 5,799 | VRAM 119GB (58%) | ETA 41.0h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 660/5000 | loss 5.7846 | ppl 325.3 | lr 1.96e-05 | gnorm 1.19 | tok/s 5,798 | VRAM 119GB (58%) | ETA 40.9h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 670/5000 | loss 5.8722 | ppl 355.0 | lr 1.76e-05 | gnorm 1.09 | tok/s 5,798 | VRAM 119GB (58%) | ETA 40.8h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 680/5000 | loss 5.7341 | ppl 309.2 | lr 1.62e-05 | gnorm 3.89 | tok/s 5,797 | VRAM 119GB (58%) | ETA 40.7h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 690/5000 | loss 5.8302 | ppl 340.4 | lr 1.53e-05 | gnorm 1.05 | tok/s 5,797 | VRAM 119GB (58%) | ETA 40.6h C0 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 700/5000 | loss 5.7496 | ppl 314.1 | lr 1.56e-05 | gnorm 1.15 | tok/s 5,797 | VRAM 119GB (58%) | ETA 40.5h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.6660 ppl=288.9 ★ NEW BEST → saved (+ EMA + full optimizer) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 700, full state + optimizer) |
| step 710/5000 | loss 5.7459 | ppl 312.9 | lr 2.19e-05 | gnorm 1.55 | tok/s 5,796 | VRAM 119GB (58%) | ETA 40.4h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 720/5000 | loss 5.7757 | ppl 322.4 | lr 2.81e-05 | gnorm 1.59 | tok/s 5,797 | VRAM 119GB (58%) | ETA 40.3h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 730/5000 | loss 5.7632 | ppl 318.4 | lr 3.44e-05 | gnorm 1.78 | tok/s 5,795 | VRAM 119GB (58%) | ETA 40.2h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 740/5000 | loss 5.7689 | ppl 320.2 | lr 4.00e-05 | gnorm 1.72 | tok/s 5,795 | VRAM 119GB (58%) | ETA 40.1h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 750/5000 | loss 5.6952 | ppl 297.4 | lr 4.00e-05 | gnorm 1.71 | tok/s 5,793 | VRAM 119GB (58%) | ETA 40.1h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 760/5000 | loss 5.6071 | ppl 272.4 | lr 3.98e-05 | gnorm 1.37 | tok/s 5,793 | VRAM 119GB (58%) | ETA 40.0h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 770/5000 | loss 5.7852 | ppl 325.4 | lr 3.96e-05 | gnorm 1.85 | tok/s 5,791 | VRAM 119GB (58%) | ETA 39.9h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 780/5000 | loss 5.7631 | ppl 318.3 | lr 3.92e-05 | gnorm 1.78 | tok/s 5,792 | VRAM 119GB (58%) | ETA 39.8h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 790/5000 | loss 5.6911 | ppl 296.2 | lr 3.88e-05 | gnorm 1.59 | tok/s 5,790 | VRAM 119GB (58%) | ETA 39.7h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 800/5000 | loss 5.8214 | ppl 337.4 | lr 3.83e-05 | gnorm 1.97 | tok/s 5,791 | VRAM 119GB (58%) | ETA 39.6h C1 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.6552 ppl=285.8 ★ NEW BEST → saved (+ EMA + full optimizer) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 800, full state + optimizer) |
| step 810/5000 | loss 5.6561 | ppl 286.0 | lr 3.77e-05 | gnorm 1.14 | tok/s 5,792 | VRAM 119GB (58%) | ETA 39.5h C1 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 820/5000 | loss 5.8421 | ppl 344.5 | lr 3.71e-05 | gnorm 1.85 | tok/s 5,794 | VRAM 119GB (58%) | ETA 39.4h C1 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 830/5000 | loss 5.6260 | ppl 277.6 | lr 3.63e-05 | gnorm 1.80 | tok/s 5,793 | VRAM 119GB (58%) | ETA 39.3h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 840/5000 | loss 5.7293 | ppl 307.7 | lr 3.55e-05 | gnorm 1.38 | tok/s 5,793 | VRAM 119GB (58%) | ETA 39.2h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 850/5000 | loss 5.7658 | ppl 319.2 | lr 3.47e-05 | gnorm 1.41 | tok/s 5,793 | VRAM 119GB (58%) | ETA 39.1h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 860/5000 | loss 5.7830 | ppl 324.7 | lr 3.38e-05 | gnorm 1.22 | tok/s 5,795 | VRAM 119GB (58%) | ETA 39.0h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 870/5000 | loss 5.6170 | ppl 275.1 | lr 3.28e-05 | gnorm 1.48 | tok/s 5,795 | VRAM 119GB (58%) | ETA 38.9h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 880/5000 | loss 5.6471 | ppl 283.5 | lr 3.18e-05 | gnorm 1.59 | tok/s 5,796 | VRAM 119GB (58%) | ETA 38.8h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 890/5000 | loss 5.5396 | ppl 254.6 | lr 3.07e-05 | gnorm 1.61 | tok/s 5,796 | VRAM 119GB (58%) | ETA 38.7h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 900/5000 | loss 5.6467 | ppl 283.4 | lr 2.97e-05 | gnorm 1.34 | tok/s 5,795 | VRAM 119GB (58%) | ETA 38.6h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.5526 ppl=257.9 ★ NEW BEST → saved (+ EMA + full optimizer) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 900, full state + optimizer) |
| step 910/5000 | loss 5.6857 | ppl 294.6 | lr 2.86e-05 | gnorm 1.75 | tok/s 5,794 | VRAM 119GB (58%) | ETA 38.6h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 920/5000 | loss 5.7536 | ppl 315.3 | lr 2.75e-05 | gnorm 1.62 | tok/s 5,794 | VRAM 119GB (58%) | ETA 38.5h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 930/5000 | loss 5.6720 | ppl 290.6 | lr 2.64e-05 | gnorm 1.50 | tok/s 5,795 | VRAM 119GB (58%) | ETA 38.4h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 940/5000 | loss 5.6297 | ppl 278.6 | lr 2.53e-05 | gnorm 1.27 | tok/s 5,795 | VRAM 119GB (58%) | ETA 38.3h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 950/5000 | loss 5.6664 | ppl 289.0 | lr 2.43e-05 | gnorm 1.41 | tok/s 5,797 | VRAM 119GB (58%) | ETA 38.2h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 960/5000 | loss 5.6227 | ppl 276.6 | lr 2.32e-05 | gnorm 1.35 | tok/s 5,797 | VRAM 119GB (58%) | ETA 38.1h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 970/5000 | loss 5.5974 | ppl 269.7 | lr 2.22e-05 | gnorm 1.43 | tok/s 5,797 | VRAM 119GB (58%) | ETA 38.0h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 980/5000 | loss 5.5043 | ppl 245.8 | lr 2.13e-05 | gnorm 1.21 | tok/s 5,797 | VRAM 119GB (58%) | ETA 37.9h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 990/5000 | loss 5.6915 | ppl 296.3 | lr 2.03e-05 | gnorm 1.30 | tok/s 5,798 | VRAM 119GB (58%) | ETA 37.8h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1000/5000 | loss 5.6183 | ppl 275.4 | lr 1.95e-05 | gnorm 1.14 | tok/s 5,799 | VRAM 119GB (58%) | ETA 37.7h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.5968 ppl=269.6 (best=5.5526) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1000, full state + optimizer) |
| >> MILESTONE step 1000 LOCKED → /mnt/scratch/checkpoints/frankenstein_v2_milestone_1000.pt |
| step 1010/5000 | loss 5.3524 | ppl 211.1 | lr 1.87e-05 | gnorm 1.30 | tok/s 5,652 | VRAM 119GB (58%) | ETA 38.6h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1020/5000 | loss 5.4961 | ppl 243.7 | lr 1.79e-05 | gnorm 1.25 | tok/s 5,507 | VRAM 119GB (58%) | ETA 39.5h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1030/5000 | loss 5.8695 | ppl 354.1 | lr 1.73e-05 | gnorm 1.33 | tok/s 5,368 | VRAM 119GB (58%) | ETA 40.4h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1040/5000 | loss 5.7147 | ppl 303.3 | lr 1.67e-05 | gnorm 1.63 | tok/s 5,236 | VRAM 119GB (58%) | ETA 41.3h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1050/5000 | loss 5.5944 | ppl 268.9 | lr 1.62e-05 | gnorm 1.27 | tok/s 5,111 | VRAM 119GB (58%) | ETA 42.2h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1060/5000 | loss 5.4514 | ppl 233.1 | lr 1.58e-05 | gnorm 1.82 | tok/s 5,106 | VRAM 119GB (58%) | ETA 42.1h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1070/5000 | loss 5.7486 | ppl 313.8 | lr 1.54e-05 | gnorm 3.20 | tok/s 5,107 | VRAM 119GB (58%) | ETA 42.0h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1080/5000 | loss 5.7669 | ppl 319.6 | lr 1.52e-05 | gnorm 1.33 | tok/s 5,105 | VRAM 119GB (58%) | ETA 41.9h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1090/5000 | loss 5.5144 | ppl 248.2 | lr 1.50e-05 | gnorm 1.87 | tok/s 5,105 | VRAM 119GB (58%) | ETA 41.8h C1 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1100/5000 | loss 5.6030 | ppl 271.2 | lr 1.52e-05 | gnorm 1.29 | tok/s 5,105 | VRAM 119GB (58%) | ETA 41.7h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.5704 ppl=262.5 (best=5.5526) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1100, full state + optimizer) |
| step 1110/5000 | loss 5.4279 | ppl 227.7 | lr 1.78e-05 | gnorm 1.45 | tok/s 5,230 | VRAM 119GB (58%) | ETA 40.6h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1120/5000 | loss 5.7553 | ppl 315.9 | lr 2.03e-05 | gnorm 1.13 | tok/s 5,362 | VRAM 119GB (58%) | ETA 39.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1130/5000 | loss 5.6633 | ppl 288.1 | lr 2.28e-05 | gnorm 1.34 | tok/s 5,501 | VRAM 119GB (58%) | ETA 38.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1140/5000 | loss 5.4698 | ppl 237.4 | lr 2.53e-05 | gnorm 1.23 | tok/s 5,646 | VRAM 119GB (58%) | ETA 37.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1150/5000 | loss 5.6485 | ppl 283.9 | lr 2.78e-05 | gnorm 1.44 | tok/s 5,799 | VRAM 119GB (58%) | ETA 36.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1160/5000 | loss 5.5408 | ppl 254.9 | lr 3.00e-05 | gnorm 1.52 | tok/s 5,798 | VRAM 119GB (58%) | ETA 36.2h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1170/5000 | loss 5.5947 | ppl 269.0 | lr 3.00e-05 | gnorm 1.47 | tok/s 5,798 | VRAM 119GB (58%) | ETA 36.1h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1180/5000 | loss 5.5501 | ppl 257.3 | lr 3.00e-05 | gnorm 1.77 | tok/s 5,797 | VRAM 119GB (58%) | ETA 36.0h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1190/5000 | loss 5.7825 | ppl 324.6 | lr 2.99e-05 | gnorm 1.48 | tok/s 5,798 | VRAM 119GB (58%) | ETA 35.9h C2 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1200/5000 | loss 5.4681 | ppl 237.0 | lr 2.99e-05 | gnorm 1.66 | tok/s 5,798 | VRAM 119GB (58%) | ETA 35.8h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.5044 ppl=245.8 ★ NEW BEST → saved (+ EMA + full optimizer) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1200, full state + optimizer) |
| step 1210/5000 | loss 5.5756 | ppl 263.9 | lr 2.98e-05 | gnorm 1.53 | tok/s 5,769 | VRAM 119GB (58%) | ETA 35.9h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1220/5000 | loss 5.5656 | ppl 261.3 | lr 2.98e-05 | gnorm 1.57 | tok/s 5,617 | VRAM 119GB (58%) | ETA 36.8h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1230/5000 | loss 5.6694 | ppl 289.9 | lr 2.97e-05 | gnorm 1.30 | tok/s 5,474 | VRAM 119GB (58%) | ETA 37.6h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1240/5000 | loss 5.6058 | ppl 272.0 | lr 2.96e-05 | gnorm 1.31 | tok/s 5,337 | VRAM 119GB (58%) | ETA 38.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1250/5000 | loss 5.7426 | ppl 311.9 | lr 2.95e-05 | gnorm 1.55 | tok/s 5,207 | VRAM 119GB (58%) | ETA 39.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1260/5000 | loss 5.6019 | ppl 270.9 | lr 2.93e-05 | gnorm 1.35 | tok/s 5,105 | VRAM 119GB (58%) | ETA 40.0h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1270/5000 | loss 5.5776 | ppl 264.4 | lr 2.92e-05 | gnorm 1.38 | tok/s 5,105 | VRAM 119GB (58%) | ETA 39.9h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1280/5000 | loss 5.6610 | ppl 287.4 | lr 2.90e-05 | gnorm 1.45 | tok/s 5,105 | VRAM 119GB (58%) | ETA 39.8h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1290/5000 | loss 5.4238 | ppl 226.7 | lr 2.89e-05 | gnorm 1.77 | tok/s 5,106 | VRAM 119GB (58%) | ETA 39.7h C2 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1300/5000 | loss 5.5268 | ppl 251.3 | lr 2.87e-05 | gnorm 1.48 | tok/s 5,107 | VRAM 119GB (58%) | ETA 39.6h C2 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.5083 ppl=246.7 (best=5.5044) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1300, full state + optimizer) |
| step 1310/5000 | loss 5.5639 | ppl 260.8 | lr 2.85e-05 | gnorm 1.43 | tok/s 5,115 | VRAM 119GB (58%) | ETA 39.4h C2 | [E0:32% E1:18% E2:32% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1320/5000 | loss 5.5524 | ppl 257.9 | lr 2.83e-05 | gnorm 1.22 | tok/s 5,116 | VRAM 119GB (58%) | ETA 39.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1330/5000 | loss 5.5659 | ppl 261.4 | lr 2.81e-05 | gnorm 1.39 | tok/s 5,117 | VRAM 119GB (58%) | ETA 39.2h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1340/5000 | loss 5.4635 | ppl 235.9 | lr 2.79e-05 | gnorm 1.48 | tok/s 5,119 | VRAM 119GB (58%) | ETA 39.0h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1350/5000 | loss 5.4772 | ppl 239.2 | lr 2.77e-05 | gnorm 1.53 | tok/s 5,119 | VRAM 119GB (58%) | ETA 38.9h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1360/5000 | loss 5.5250 | ppl 250.9 | lr 2.75e-05 | gnorm 1.62 | tok/s 5,114 | VRAM 119GB (58%) | ETA 38.9h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1370/5000 | loss 5.4610 | ppl 235.3 | lr 2.72e-05 | gnorm 1.45 | tok/s 5,115 | VRAM 119GB (58%) | ETA 38.8h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1380/5000 | loss 5.5452 | ppl 256.0 | lr 2.70e-05 | gnorm 1.59 | tok/s 5,114 | VRAM 119GB (58%) | ETA 38.7h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1390/5000 | loss 5.3807 | ppl 217.2 | lr 2.67e-05 | gnorm 1.44 | tok/s 5,113 | VRAM 119GB (58%) | ETA 38.6h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1400/5000 | loss 5.6586 | ppl 286.8 | lr 2.64e-05 | gnorm 1.30 | tok/s 5,112 | VRAM 119GB (58%) | ETA 38.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.3969 ppl=220.7 ★ NEW BEST → saved (+ EMA + full optimizer) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1400, full state + optimizer) |
| step 1410/5000 | loss 5.6199 | ppl 275.9 | lr 2.62e-05 | gnorm 1.24 | tok/s 5,235 | VRAM 119GB (58%) | ETA 37.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1420/5000 | loss 5.4698 | ppl 237.4 | lr 2.59e-05 | gnorm 1.67 | tok/s 5,365 | VRAM 119GB (58%) | ETA 36.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1430/5000 | loss 5.6563 | ppl 286.1 | lr 2.56e-05 | gnorm 1.32 | tok/s 5,501 | VRAM 119GB (58%) | ETA 35.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1440/5000 | loss 5.6030 | ppl 271.2 | lr 2.53e-05 | gnorm 1.48 | tok/s 5,646 | VRAM 119GB (58%) | ETA 34.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1450/5000 | loss 5.5052 | ppl 246.0 | lr 2.50e-05 | gnorm 1.45 | tok/s 5,797 | VRAM 119GB (58%) | ETA 33.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1460/5000 | loss 5.1654 | ppl 175.1 | lr 2.47e-05 | gnorm 1.45 | tok/s 5,796 | VRAM 119GB (58%) | ETA 33.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1470/5000 | loss 5.6636 | ppl 288.2 | lr 2.44e-05 | gnorm 1.36 | tok/s 5,796 | VRAM 119GB (58%) | ETA 33.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1480/5000 | loss 5.7023 | ppl 299.6 | lr 2.41e-05 | gnorm 1.36 | tok/s 5,796 | VRAM 119GB (58%) | ETA 33.2h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1490/5000 | loss 5.5606 | ppl 260.0 | lr 2.38e-05 | gnorm 1.38 | tok/s 5,795 | VRAM 119GB (58%) | ETA 33.1h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1500/5000 | loss 5.5311 | ppl 252.4 | lr 2.35e-05 | gnorm 1.47 | tok/s 5,796 | VRAM 119GB (58%) | ETA 33.0h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.4685 ppl=237.1 (best=5.3969) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1500, full state + optimizer) |
| step 1510/5000 | loss 5.5400 | ppl 254.7 | lr 2.31e-05 | gnorm 1.58 | tok/s 5,797 | VRAM 119GB (58%) | ETA 32.9h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1520/5000 | loss 5.5024 | ppl 245.3 | lr 2.28e-05 | gnorm 1.47 | tok/s 5,796 | VRAM 119GB (58%) | ETA 32.8h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1530/5000 | loss 5.6098 | ppl 273.1 | lr 2.25e-05 | gnorm 1.27 | tok/s 5,796 | VRAM 119GB (58%) | ETA 32.7h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1540/5000 | loss 5.4081 | ppl 223.2 | lr 2.22e-05 | gnorm 1.32 | tok/s 5,794 | VRAM 119GB (58%) | ETA 32.6h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1550/5000 | loss 5.5051 | ppl 245.9 | lr 2.19e-05 | gnorm 1.47 | tok/s 5,793 | VRAM 119GB (58%) | ETA 32.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1560/5000 | loss 5.5016 | ppl 245.1 | lr 2.15e-05 | gnorm 1.36 | tok/s 5,792 | VRAM 119GB (58%) | ETA 32.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1570/5000 | loss 5.5568 | ppl 259.0 | lr 2.12e-05 | gnorm 1.36 | tok/s 5,792 | VRAM 119GB (58%) | ETA 32.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1580/5000 | loss 5.4976 | ppl 244.1 | lr 2.09e-05 | gnorm 1.52 | tok/s 5,792 | VRAM 119GB (58%) | ETA 32.2h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1590/5000 | loss 5.6104 | ppl 273.2 | lr 2.06e-05 | gnorm 1.48 | tok/s 5,793 | VRAM 119GB (58%) | ETA 32.1h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1600/5000 | loss 5.4706 | ppl 237.6 | lr 2.03e-05 | gnorm 1.39 | tok/s 5,794 | VRAM 119GB (58%) | ETA 32.0h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.4872 ppl=241.6 (best=5.3969) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1600, full state + optimizer) |
| step 1610/5000 | loss 5.3893 | ppl 219.0 | lr 2.00e-05 | gnorm 1.32 | tok/s 5,707 | VRAM 119GB (58%) | ETA 32.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1620/5000 | loss 5.5226 | ppl 250.3 | lr 1.97e-05 | gnorm 1.21 | tok/s 5,560 | VRAM 119GB (58%) | ETA 33.2h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1630/5000 | loss 5.4062 | ppl 222.8 | lr 1.94e-05 | gnorm 1.38 | tok/s 5,419 | VRAM 119GB (58%) | ETA 34.0h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1640/5000 | loss 5.6297 | ppl 278.6 | lr 1.91e-05 | gnorm 1.23 | tok/s 5,286 | VRAM 119GB (58%) | ETA 34.7h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1650/5000 | loss 5.5368 | ppl 253.9 | lr 1.88e-05 | gnorm 1.43 | tok/s 5,159 | VRAM 119GB (58%) | ETA 35.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1660/5000 | loss 5.4787 | ppl 239.5 | lr 1.86e-05 | gnorm 1.18 | tok/s 5,106 | VRAM 119GB (58%) | ETA 35.7h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1670/5000 | loss 5.5194 | ppl 249.5 | lr 1.83e-05 | gnorm 1.15 | tok/s 5,106 | VRAM 119GB (58%) | ETA 35.6h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1680/5000 | loss 5.6273 | ppl 277.9 | lr 1.80e-05 | gnorm 1.62 | tok/s 5,107 | VRAM 119GB (58%) | ETA 35.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1690/5000 | loss 5.3792 | ppl 216.8 | lr 1.78e-05 | gnorm 1.38 | tok/s 5,107 | VRAM 119GB (58%) | ETA 35.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1700/5000 | loss 5.4360 | ppl 229.5 | lr 1.75e-05 | gnorm 1.28 | tok/s 5,108 | VRAM 119GB (58%) | ETA 35.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.4370 ppl=229.8 (best=5.3969) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1700, full state + optimizer) |
| step 1710/5000 | loss 5.5455 | ppl 256.1 | lr 1.73e-05 | gnorm 1.70 | tok/s 5,234 | VRAM 119GB (58%) | ETA 34.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1720/5000 | loss 5.4692 | ppl 237.3 | lr 1.71e-05 | gnorm 1.39 | tok/s 5,365 | VRAM 119GB (58%) | ETA 33.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1730/5000 | loss 5.3266 | ppl 205.7 | lr 1.69e-05 | gnorm 1.27 | tok/s 5,504 | VRAM 119GB (58%) | ETA 32.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1740/5000 | loss 5.5804 | ppl 265.2 | lr 1.67e-05 | gnorm 1.09 | tok/s 5,650 | VRAM 119GB (58%) | ETA 31.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1750/5000 | loss 5.4715 | ppl 237.8 | lr 1.65e-05 | gnorm 1.24 | tok/s 5,803 | VRAM 119GB (58%) | ETA 30.6h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1760/5000 | loss 5.4621 | ppl 235.6 | lr 1.63e-05 | gnorm 1.22 | tok/s 5,801 | VRAM 119GB (58%) | ETA 30.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1770/5000 | loss 5.4925 | ppl 242.9 | lr 1.61e-05 | gnorm 1.34 | tok/s 5,801 | VRAM 119GB (58%) | ETA 30.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1780/5000 | loss 5.4751 | ppl 238.7 | lr 1.60e-05 | gnorm 1.16 | tok/s 5,798 | VRAM 119GB (58%) | ETA 30.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1790/5000 | loss 5.4281 | ppl 227.7 | lr 1.58e-05 | gnorm 1.20 | tok/s 5,796 | VRAM 119GB (58%) | ETA 30.2h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1800/5000 | loss 5.4589 | ppl 234.8 | lr 1.57e-05 | gnorm 1.45 | tok/s 5,796 | VRAM 119GB (58%) | ETA 30.2h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.4080 ppl=223.2 (best=5.3969) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1800, full state + optimizer) |
| step 1810/5000 | loss 5.4631 | ppl 235.8 | lr 1.55e-05 | gnorm 1.38 | tok/s 5,668 | VRAM 119GB (58%) | ETA 30.7h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1820/5000 | loss 5.4203 | ppl 225.9 | lr 1.54e-05 | gnorm 1.24 | tok/s 5,522 | VRAM 119GB (58%) | ETA 31.4h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1830/5000 | loss 5.3795 | ppl 216.9 | lr 1.53e-05 | gnorm 1.23 | tok/s 5,380 | VRAM 119GB (58%) | ETA 32.2h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1840/5000 | loss 5.5472 | ppl 256.5 | lr 1.52e-05 | gnorm 1.21 | tok/s 5,247 | VRAM 119GB (58%) | ETA 32.9h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1850/5000 | loss 5.3745 | ppl 215.8 | lr 1.52e-05 | gnorm 1.16 | tok/s 5,118 | VRAM 119GB (58%) | ETA 33.6h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1860/5000 | loss 5.6542 | ppl 285.5 | lr 1.51e-05 | gnorm 1.08 | tok/s 5,094 | VRAM 119GB (58%) | ETA 33.7h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1870/5000 | loss 5.6051 | ppl 271.8 | lr 1.51e-05 | gnorm 1.12 | tok/s 5,089 | VRAM 119GB (58%) | ETA 33.6h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1880/5000 | loss 5.5013 | ppl 245.0 | lr 1.50e-05 | gnorm 1.36 | tok/s 5,093 | VRAM 119GB (58%) | ETA 33.5h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1890/5000 | loss 5.3653 | ppl 213.9 | lr 1.50e-05 | gnorm 2.14 | tok/s 5,094 | VRAM 119GB (58%) | ETA 33.3h C2 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1900/5000 | loss 5.4583 | ppl 234.7 | lr 1.51e-05 | gnorm 1.27 | tok/s 5,097 | VRAM 119GB (58%) | ETA 33.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.4447 ppl=231.5 (best=5.3969) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 1900, full state + optimizer) |
| step 1910/5000 | loss 5.5599 | ppl 259.8 | lr 1.64e-05 | gnorm 1.20 | tok/s 5,154 | VRAM 119GB (58%) | ETA 32.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1920/5000 | loss 5.7572 | ppl 316.5 | lr 1.76e-05 | gnorm 1.41 | tok/s 5,158 | VRAM 119GB (58%) | ETA 32.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1930/5000 | loss 5.5125 | ppl 247.8 | lr 1.89e-05 | gnorm 1.48 | tok/s 5,158 | VRAM 119GB (58%) | ETA 32.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1940/5000 | loss 5.6845 | ppl 294.3 | lr 2.01e-05 | gnorm 1.52 | tok/s 5,160 | VRAM 119GB (58%) | ETA 32.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1950/5000 | loss 5.4571 | ppl 234.4 | lr 2.14e-05 | gnorm 1.20 | tok/s 5,160 | VRAM 119GB (58%) | ETA 32.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1960/5000 | loss 5.4911 | ppl 242.5 | lr 2.26e-05 | gnorm 1.27 | tok/s 5,106 | VRAM 119GB (58%) | ETA 32.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1970/5000 | loss 5.6330 | ppl 279.5 | lr 2.39e-05 | gnorm 1.71 | tok/s 5,106 | VRAM 119GB (58%) | ETA 32.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1980/5000 | loss 5.4440 | ppl 231.4 | lr 2.50e-05 | gnorm 1.47 | tok/s 5,106 | VRAM 119GB (58%) | ETA 32.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 1990/5000 | loss 5.4148 | ppl 224.7 | lr 2.50e-05 | gnorm 1.99 | tok/s 5,105 | VRAM 119GB (58%) | ETA 32.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2000/5000 | loss 5.4218 | ppl 226.3 | lr 2.50e-05 | gnorm 1.95 | tok/s 5,106 | VRAM 119GB (58%) | ETA 32.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.4088 ppl=223.4 (best=5.3969) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2000, full state + optimizer) |
| >> MILESTONE step 2000 LOCKED → /mnt/scratch/checkpoints/frankenstein_v2_milestone_2000.pt |
| step 2010/5000 | loss 5.4288 | ppl 227.9 | lr 2.50e-05 | gnorm 1.47 | tok/s 5,106 | VRAM 119GB (58%) | ETA 32.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2020/5000 | loss 5.4680 | ppl 237.0 | lr 2.50e-05 | gnorm 1.41 | tok/s 5,106 | VRAM 119GB (58%) | ETA 31.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2030/5000 | loss 5.4027 | ppl 222.0 | lr 2.50e-05 | gnorm 1.52 | tok/s 5,107 | VRAM 119GB (58%) | ETA 31.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2040/5000 | loss 5.5772 | ppl 264.3 | lr 2.50e-05 | gnorm 1.46 | tok/s 5,106 | VRAM 119GB (58%) | ETA 31.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2050/5000 | loss 5.4623 | ppl 235.6 | lr 2.49e-05 | gnorm 1.43 | tok/s 5,107 | VRAM 119GB (58%) | ETA 31.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2060/5000 | loss 5.6075 | ppl 272.5 | lr 2.49e-05 | gnorm 1.66 | tok/s 5,106 | VRAM 119GB (58%) | ETA 31.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2070/5000 | loss 5.4214 | ppl 226.2 | lr 2.49e-05 | gnorm 1.64 | tok/s 5,106 | VRAM 119GB (58%) | ETA 31.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2080/5000 | loss 5.3698 | ppl 214.8 | lr 2.49e-05 | gnorm 1.35 | tok/s 5,105 | VRAM 119GB (58%) | ETA 31.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2090/5000 | loss 5.3795 | ppl 216.9 | lr 2.49e-05 | gnorm 1.62 | tok/s 5,104 | VRAM 119GB (58%) | ETA 31.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2100/5000 | loss 5.4180 | ppl 225.4 | lr 2.48e-05 | gnorm 1.45 | tok/s 5,102 | VRAM 119GB (58%) | ETA 31.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.3973 ppl=220.8 (best=5.3969) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2100, full state + optimizer) |
| step 2110/5000 | loss 5.4504 | ppl 232.9 | lr 2.48e-05 | gnorm 1.77 | tok/s 5,226 | VRAM 119GB (58%) | ETA 30.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2120/5000 | loss 5.3458 | ppl 209.7 | lr 2.48e-05 | gnorm 1.35 | tok/s 5,357 | VRAM 119GB (58%) | ETA 29.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2130/5000 | loss 5.2910 | ppl 198.5 | lr 2.48e-05 | gnorm 1.52 | tok/s 5,496 | VRAM 119GB (58%) | ETA 28.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2140/5000 | loss 5.4221 | ppl 226.4 | lr 2.47e-05 | gnorm 1.45 | tok/s 5,643 | VRAM 119GB (58%) | ETA 27.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2150/5000 | loss 5.5331 | ppl 252.9 | lr 2.47e-05 | gnorm 1.34 | tok/s 5,799 | VRAM 119GB (58%) | ETA 26.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2160/5000 | loss 5.4783 | ppl 239.4 | lr 2.47e-05 | gnorm 1.49 | tok/s 5,801 | VRAM 119GB (58%) | ETA 26.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2170/5000 | loss 5.4636 | ppl 235.9 | lr 2.46e-05 | gnorm 1.26 | tok/s 5,802 | VRAM 119GB (58%) | ETA 26.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2180/5000 | loss 5.5501 | ppl 257.3 | lr 2.46e-05 | gnorm 1.55 | tok/s 5,803 | VRAM 119GB (58%) | ETA 26.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2190/5000 | loss 5.4925 | ppl 242.9 | lr 2.45e-05 | gnorm 1.69 | tok/s 5,804 | VRAM 119GB (58%) | ETA 26.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2200/5000 | loss 5.4496 | ppl 232.7 | lr 2.45e-05 | gnorm 1.43 | tok/s 5,801 | VRAM 119GB (58%) | ETA 26.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.4529 ppl=233.4 (best=5.3969) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2200, full state + optimizer) |
| step 2210/5000 | loss 5.5127 | ppl 247.8 | lr 2.44e-05 | gnorm 1.30 | tok/s 5,802 | VRAM 119GB (58%) | ETA 26.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2220/5000 | loss 5.6177 | ppl 275.3 | lr 2.44e-05 | gnorm 1.77 | tok/s 5,799 | VRAM 119GB (58%) | ETA 26.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2230/5000 | loss 5.5439 | ppl 255.7 | lr 2.43e-05 | gnorm 1.44 | tok/s 5,796 | VRAM 119GB (58%) | ETA 26.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2240/5000 | loss 5.5208 | ppl 249.8 | lr 2.43e-05 | gnorm 1.66 | tok/s 5,794 | VRAM 119GB (58%) | ETA 26.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2250/5000 | loss 5.3475 | ppl 210.1 | lr 2.42e-05 | gnorm 1.31 | tok/s 5,795 | VRAM 119GB (58%) | ETA 25.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2260/5000 | loss 5.4192 | ppl 225.7 | lr 2.42e-05 | gnorm 1.48 | tok/s 5,793 | VRAM 119GB (58%) | ETA 25.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2270/5000 | loss 5.3100 | ppl 202.4 | lr 2.41e-05 | gnorm 1.54 | tok/s 5,792 | VRAM 119GB (58%) | ETA 25.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2280/5000 | loss 5.5559 | ppl 258.8 | lr 2.41e-05 | gnorm 1.25 | tok/s 5,792 | VRAM 119GB (58%) | ETA 25.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2290/5000 | loss 5.3483 | ppl 210.3 | lr 2.40e-05 | gnorm 1.50 | tok/s 5,792 | VRAM 119GB (58%) | ETA 25.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2300/5000 | loss 5.3261 | ppl 205.6 | lr 2.39e-05 | gnorm 1.72 | tok/s 5,792 | VRAM 119GB (58%) | ETA 25.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.4253 ppl=227.1 (best=5.3969) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2300, full state + optimizer) |
| step 2310/5000 | loss 5.5541 | ppl 258.3 | lr 2.39e-05 | gnorm 1.62 | tok/s 5,792 | VRAM 119GB (58%) | ETA 25.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2320/5000 | loss 5.4427 | ppl 231.1 | lr 2.38e-05 | gnorm 1.51 | tok/s 5,795 | VRAM 119GB (58%) | ETA 25.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2330/5000 | loss 5.4080 | ppl 223.2 | lr 2.37e-05 | gnorm 1.51 | tok/s 5,795 | VRAM 119GB (58%) | ETA 25.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2340/5000 | loss 5.6911 | ppl 296.2 | lr 2.37e-05 | gnorm 1.88 | tok/s 5,786 | VRAM 119GB (58%) | ETA 25.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2350/5000 | loss 5.3312 | ppl 206.7 | lr 2.36e-05 | gnorm 1.43 | tok/s 5,779 | VRAM 119GB (58%) | ETA 25.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2360/5000 | loss 5.3899 | ppl 219.2 | lr 2.35e-05 | gnorm 1.48 | tok/s 5,772 | VRAM 119GB (58%) | ETA 25.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2370/5000 | loss 5.2940 | ppl 199.1 | lr 2.35e-05 | gnorm 1.30 | tok/s 5,765 | VRAM 119GB (58%) | ETA 24.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2380/5000 | loss 5.4886 | ppl 241.9 | lr 2.34e-05 | gnorm 1.41 | tok/s 5,760 | VRAM 119GB (58%) | ETA 24.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2390/5000 | loss 5.3869 | ppl 218.5 | lr 2.33e-05 | gnorm 1.41 | tok/s 5,759 | VRAM 119GB (58%) | ETA 24.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2400/5000 | loss 5.5158 | ppl 248.6 | lr 2.32e-05 | gnorm 1.42 | tok/s 5,767 | VRAM 119GB (58%) | ETA 24.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.4569 ppl=234.4 (best=5.3969) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2400, full state + optimizer) |
| step 2410/5000 | loss 5.4799 | ppl 239.8 | lr 2.32e-05 | gnorm 1.37 | tok/s 5,772 | VRAM 119GB (58%) | ETA 24.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2420/5000 | loss 5.4761 | ppl 238.9 | lr 2.31e-05 | gnorm 1.62 | tok/s 5,778 | VRAM 119GB (58%) | ETA 24.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2430/5000 | loss 5.3421 | ppl 208.9 | lr 2.30e-05 | gnorm 1.17 | tok/s 5,781 | VRAM 119GB (58%) | ETA 24.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2440/5000 | loss 5.4317 | ppl 228.5 | lr 2.29e-05 | gnorm 1.51 | tok/s 5,790 | VRAM 119GB (58%) | ETA 24.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2450/5000 | loss 5.5262 | ppl 251.2 | lr 2.28e-05 | gnorm 1.63 | tok/s 5,792 | VRAM 119GB (58%) | ETA 24.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2460/5000 | loss 5.3786 | ppl 216.7 | lr 2.27e-05 | gnorm 1.29 | tok/s 5,769 | VRAM 119GB (58%) | ETA 24.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2470/5000 | loss 5.1914 | ppl 179.7 | lr 2.26e-05 | gnorm 1.53 | tok/s 5,619 | VRAM 119GB (58%) | ETA 24.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2480/5000 | loss 5.4186 | ppl 225.6 | lr 2.26e-05 | gnorm 2.28 | tok/s 5,479 | VRAM 119GB (58%) | ETA 25.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2490/5000 | loss 5.3507 | ppl 210.8 | lr 2.25e-05 | gnorm 1.38 | tok/s 5,343 | VRAM 119GB (58%) | ETA 25.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2500/5000 | loss 5.5629 | ppl 260.6 | lr 2.24e-05 | gnorm 1.82 | tok/s 5,212 | VRAM 119GB (58%) | ETA 26.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.3151 ppl=203.4 ★ NEW BEST → saved (+ EMA + full optimizer) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2500, full state + optimizer) |
| step 2510/5000 | loss 5.4881 | ppl 241.8 | lr 2.23e-05 | gnorm 1.70 | tok/s 5,233 | VRAM 119GB (58%) | ETA 26.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2520/5000 | loss 5.5318 | ppl 252.6 | lr 2.22e-05 | gnorm 1.57 | tok/s 5,365 | VRAM 119GB (58%) | ETA 25.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2530/5000 | loss 5.2904 | ppl 198.4 | lr 2.21e-05 | gnorm 1.32 | tok/s 5,503 | VRAM 119GB (58%) | ETA 24.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2540/5000 | loss 5.3733 | ppl 215.6 | lr 2.20e-05 | gnorm 1.50 | tok/s 5,649 | VRAM 119GB (58%) | ETA 23.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2550/5000 | loss 5.2482 | ppl 190.2 | lr 2.19e-05 | gnorm 1.33 | tok/s 5,802 | VRAM 119GB (58%) | ETA 23.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2560/5000 | loss 5.3167 | ppl 203.7 | lr 2.18e-05 | gnorm 1.31 | tok/s 5,801 | VRAM 119GB (58%) | ETA 23.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2570/5000 | loss 5.4672 | ppl 236.8 | lr 2.17e-05 | gnorm 1.70 | tok/s 5,801 | VRAM 119GB (58%) | ETA 22.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2580/5000 | loss 5.2871 | ppl 197.8 | lr 2.16e-05 | gnorm 1.20 | tok/s 5,800 | VRAM 119GB (58%) | ETA 22.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2590/5000 | loss 5.5072 | ppl 246.5 | lr 2.15e-05 | gnorm 1.24 | tok/s 5,800 | VRAM 119GB (58%) | ETA 22.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2600/5000 | loss 5.4751 | ppl 238.7 | lr 2.14e-05 | gnorm 1.52 | tok/s 5,800 | VRAM 119GB (58%) | ETA 22.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.2780 ppl=196.0 ★ NEW BEST → saved (+ EMA + full optimizer) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2600, full state + optimizer) |
| step 2610/5000 | loss 5.4770 | ppl 239.1 | lr 2.13e-05 | gnorm 1.53 | tok/s 5,661 | VRAM 119GB (58%) | ETA 23.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2620/5000 | loss 5.5508 | ppl 257.5 | lr 2.12e-05 | gnorm 1.20 | tok/s 5,514 | VRAM 119GB (58%) | ETA 23.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2630/5000 | loss 5.5070 | ppl 246.4 | lr 2.11e-05 | gnorm 1.48 | tok/s 5,374 | VRAM 119GB (58%) | ETA 24.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2640/5000 | loss 5.3541 | ppl 211.5 | lr 2.10e-05 | gnorm 1.52 | tok/s 5,240 | VRAM 119GB (58%) | ETA 24.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 2650/5000 | loss 5.2494 | ppl 190.4 | lr 2.09e-05 | gnorm 1.45 | tok/s 5,114 | VRAM 119GB (58%) | ETA 25.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2660/5000 | loss 5.4428 | ppl 231.1 | lr 2.08e-05 | gnorm 1.55 | tok/s 5,102 | VRAM 119GB (58%) | ETA 25.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2670/5000 | loss 5.3762 | ppl 216.2 | lr 2.07e-05 | gnorm 1.45 | tok/s 5,102 | VRAM 119GB (58%) | ETA 24.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2680/5000 | loss 5.4470 | ppl 232.1 | lr 2.06e-05 | gnorm 2.48 | tok/s 5,102 | VRAM 119GB (58%) | ETA 24.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2690/5000 | loss 5.5052 | ppl 246.0 | lr 2.05e-05 | gnorm 1.78 | tok/s 5,101 | VRAM 119GB (58%) | ETA 24.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2700/5000 | loss 5.3143 | ppl 203.2 | lr 2.04e-05 | gnorm 1.28 | tok/s 5,101 | VRAM 119GB (58%) | ETA 24.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.3290 ppl=206.2 (best=5.2780) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2700, full state + optimizer) |
| step 2710/5000 | loss 5.4231 | ppl 226.6 | lr 2.03e-05 | gnorm 1.38 | tok/s 5,143 | VRAM 119GB (58%) | ETA 24.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2720/5000 | loss 5.5591 | ppl 259.6 | lr 2.02e-05 | gnorm 1.53 | tok/s 5,144 | VRAM 119GB (58%) | ETA 24.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2730/5000 | loss 5.4649 | ppl 236.2 | lr 2.01e-05 | gnorm 1.17 | tok/s 5,145 | VRAM 119GB (58%) | ETA 24.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2740/5000 | loss 5.3355 | ppl 207.6 | lr 2.00e-05 | gnorm 1.28 | tok/s 5,147 | VRAM 119GB (58%) | ETA 24.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2750/5000 | loss 5.2728 | ppl 195.0 | lr 1.99e-05 | gnorm 1.85 | tok/s 5,148 | VRAM 119GB (58%) | ETA 23.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2760/5000 | loss 5.4497 | ppl 232.7 | lr 1.98e-05 | gnorm 1.47 | tok/s 5,105 | VRAM 119GB (58%) | ETA 24.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2770/5000 | loss 5.4827 | ppl 240.5 | lr 1.97e-05 | gnorm 1.34 | tok/s 5,104 | VRAM 119GB (58%) | ETA 23.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2780/5000 | loss 5.4144 | ppl 224.6 | lr 1.96e-05 | gnorm 1.55 | tok/s 5,104 | VRAM 119GB (58%) | ETA 23.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2790/5000 | loss 5.5006 | ppl 244.8 | lr 1.95e-05 | gnorm 1.24 | tok/s 5,103 | VRAM 119GB (58%) | ETA 23.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2800/5000 | loss 5.4444 | ppl 231.5 | lr 1.94e-05 | gnorm 1.48 | tok/s 5,100 | VRAM 119GB (58%) | ETA 23.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.3721 ppl=215.3 (best=5.2780) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2800, full state + optimizer) |
| step 2810/5000 | loss 5.3741 | ppl 215.7 | lr 1.93e-05 | gnorm 1.17 | tok/s 5,224 | VRAM 119GB (58%) | ETA 22.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2820/5000 | loss 5.6197 | ppl 275.8 | lr 1.92e-05 | gnorm 2.17 | tok/s 5,355 | VRAM 119GB (58%) | ETA 22.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2830/5000 | loss 5.3919 | ppl 219.6 | lr 1.91e-05 | gnorm 1.27 | tok/s 5,493 | VRAM 119GB (58%) | ETA 21.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2840/5000 | loss 5.3163 | ppl 203.6 | lr 1.90e-05 | gnorm 1.41 | tok/s 5,638 | VRAM 119GB (58%) | ETA 20.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2850/5000 | loss 5.4198 | ppl 225.8 | lr 1.89e-05 | gnorm 1.44 | tok/s 5,794 | VRAM 119GB (58%) | ETA 20.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2860/5000 | loss 5.2469 | ppl 190.0 | lr 1.88e-05 | gnorm 1.37 | tok/s 5,796 | VRAM 119GB (58%) | ETA 20.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2870/5000 | loss 5.5315 | ppl 252.5 | lr 1.87e-05 | gnorm 1.55 | tok/s 5,797 | VRAM 119GB (58%) | ETA 20.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2880/5000 | loss 5.4689 | ppl 237.2 | lr 1.86e-05 | gnorm 1.25 | tok/s 5,798 | VRAM 119GB (58%) | ETA 20.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2890/5000 | loss 5.2487 | ppl 190.3 | lr 1.85e-05 | gnorm 1.24 | tok/s 5,797 | VRAM 119GB (58%) | ETA 19.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2900/5000 | loss 5.3440 | ppl 209.3 | lr 1.84e-05 | gnorm 1.41 | tok/s 5,797 | VRAM 119GB (58%) | ETA 19.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.3601 ppl=212.7 (best=5.2780) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 2900, full state + optimizer) |
| step 2910/5000 | loss 5.2046 | ppl 182.1 | lr 1.83e-05 | gnorm 1.49 | tok/s 5,798 | VRAM 119GB (58%) | ETA 19.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2920/5000 | loss 5.1114 | ppl 165.9 | lr 1.82e-05 | gnorm 1.38 | tok/s 5,798 | VRAM 119GB (58%) | ETA 19.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2930/5000 | loss 5.3641 | ppl 213.6 | lr 1.81e-05 | gnorm 1.46 | tok/s 5,798 | VRAM 119GB (58%) | ETA 19.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2940/5000 | loss 5.5069 | ppl 246.4 | lr 1.80e-05 | gnorm 1.45 | tok/s 5,798 | VRAM 119GB (58%) | ETA 19.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2950/5000 | loss 5.2951 | ppl 199.4 | lr 1.79e-05 | gnorm 1.23 | tok/s 5,799 | VRAM 119GB (58%) | ETA 19.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2960/5000 | loss 5.3876 | ppl 218.7 | lr 1.78e-05 | gnorm 1.40 | tok/s 5,799 | VRAM 119GB (58%) | ETA 19.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2970/5000 | loss 5.4361 | ppl 229.5 | lr 1.77e-05 | gnorm 1.27 | tok/s 5,799 | VRAM 119GB (58%) | ETA 19.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2980/5000 | loss 5.3690 | ppl 214.7 | lr 1.76e-05 | gnorm 1.24 | tok/s 5,799 | VRAM 119GB (58%) | ETA 19.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 2990/5000 | loss 5.3255 | ppl 205.5 | lr 1.75e-05 | gnorm 1.56 | tok/s 5,799 | VRAM 119GB (58%) | ETA 18.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3000/5000 | loss 5.3644 | ppl 213.7 | lr 1.74e-05 | gnorm 2.14 | tok/s 5,798 | VRAM 119GB (58%) | ETA 18.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.2933 ppl=199.0 (best=5.2780) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3000, full state + optimizer) |
| >> MILESTONE step 3000 LOCKED → /mnt/scratch/checkpoints/frankenstein_v2_milestone_3000.pt |
| step 3010/5000 | loss 5.3409 | ppl 208.7 | lr 1.74e-05 | gnorm 1.38 | tok/s 5,799 | VRAM 119GB (58%) | ETA 18.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3020/5000 | loss 5.3559 | ppl 211.8 | lr 1.73e-05 | gnorm 1.37 | tok/s 5,799 | VRAM 119GB (58%) | ETA 18.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3030/5000 | loss 5.2910 | ppl 198.5 | lr 1.72e-05 | gnorm 1.23 | tok/s 5,801 | VRAM 119GB (58%) | ETA 18.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3040/5000 | loss 5.4000 | ppl 221.4 | lr 1.71e-05 | gnorm 1.20 | tok/s 5,802 | VRAM 119GB (58%) | ETA 18.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3050/5000 | loss 5.3424 | ppl 209.0 | lr 1.70e-05 | gnorm 1.34 | tok/s 5,803 | VRAM 119GB (58%) | ETA 18.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3060/5000 | loss 5.2612 | ppl 192.7 | lr 1.69e-05 | gnorm 1.39 | tok/s 5,803 | VRAM 119GB (58%) | ETA 18.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3070/5000 | loss 5.4217 | ppl 226.3 | lr 1.68e-05 | gnorm 1.41 | tok/s 5,803 | VRAM 119GB (58%) | ETA 18.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3080/5000 | loss 5.3010 | ppl 200.5 | lr 1.68e-05 | gnorm 1.29 | tok/s 5,802 | VRAM 119GB (58%) | ETA 18.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3090/5000 | loss 5.3627 | ppl 213.3 | lr 1.67e-05 | gnorm 1.23 | tok/s 5,802 | VRAM 119GB (58%) | ETA 18.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3100/5000 | loss 5.3135 | ppl 203.1 | lr 1.66e-05 | gnorm 1.12 | tok/s 5,802 | VRAM 119GB (58%) | ETA 17.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.2799 ppl=196.4 (best=5.2780) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3100, full state + optimizer) |
| step 3110/5000 | loss 5.4153 | ppl 224.8 | lr 1.65e-05 | gnorm 1.43 | tok/s 5,803 | VRAM 119GB (58%) | ETA 17.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3120/5000 | loss 5.3672 | ppl 214.3 | lr 1.65e-05 | gnorm 1.20 | tok/s 5,803 | VRAM 119GB (58%) | ETA 17.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3130/5000 | loss 5.4669 | ppl 236.7 | lr 1.64e-05 | gnorm 1.30 | tok/s 5,802 | VRAM 119GB (58%) | ETA 17.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3140/5000 | loss 5.4039 | ppl 222.3 | lr 1.63e-05 | gnorm 1.24 | tok/s 5,802 | VRAM 119GB (58%) | ETA 17.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3150/5000 | loss 5.2486 | ppl 190.3 | lr 1.63e-05 | gnorm 1.16 | tok/s 5,802 | VRAM 119GB (58%) | ETA 17.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3160/5000 | loss 5.3892 | ppl 219.0 | lr 1.62e-05 | gnorm 1.38 | tok/s 5,802 | VRAM 119GB (58%) | ETA 17.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3170/5000 | loss 5.4131 | ppl 224.3 | lr 1.61e-05 | gnorm 1.31 | tok/s 5,801 | VRAM 119GB (58%) | ETA 17.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3180/5000 | loss 5.3139 | ppl 203.1 | lr 1.61e-05 | gnorm 1.34 | tok/s 5,804 | VRAM 119GB (58%) | ETA 17.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3190/5000 | loss 5.3839 | ppl 217.9 | lr 1.60e-05 | gnorm 1.19 | tok/s 5,804 | VRAM 119GB (58%) | ETA 17.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3200/5000 | loss 5.4903 | ppl 242.3 | lr 1.59e-05 | gnorm 1.38 | tok/s 5,804 | VRAM 119GB (58%) | ETA 16.9h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.4004 ppl=221.5 (best=5.2780) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3200, full state + optimizer) |
| step 3210/5000 | loss 5.2450 | ppl 189.6 | lr 1.59e-05 | gnorm 1.24 | tok/s 5,804 | VRAM 119GB (58%) | ETA 16.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3220/5000 | loss 5.2446 | ppl 189.5 | lr 1.58e-05 | gnorm 1.69 | tok/s 5,803 | VRAM 119GB (58%) | ETA 16.8h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3230/5000 | loss 5.4406 | ppl 230.6 | lr 1.58e-05 | gnorm 1.26 | tok/s 5,801 | VRAM 119GB (58%) | ETA 16.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3240/5000 | loss 5.2381 | ppl 188.3 | lr 1.57e-05 | gnorm 1.20 | tok/s 5,800 | VRAM 119GB (58%) | ETA 16.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3250/5000 | loss 5.4181 | ppl 225.4 | lr 1.57e-05 | gnorm 1.26 | tok/s 5,798 | VRAM 119GB (58%) | ETA 16.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3260/5000 | loss 5.4927 | ppl 242.9 | lr 1.56e-05 | gnorm 1.42 | tok/s 5,797 | VRAM 119GB (58%) | ETA 16.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3270/5000 | loss 5.3934 | ppl 220.0 | lr 1.56e-05 | gnorm 1.38 | tok/s 5,797 | VRAM 119GB (58%) | ETA 16.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3280/5000 | loss 5.5290 | ppl 251.9 | lr 1.55e-05 | gnorm 1.24 | tok/s 5,798 | VRAM 119GB (58%) | ETA 16.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3290/5000 | loss 5.5065 | ppl 246.3 | lr 1.55e-05 | gnorm 1.44 | tok/s 5,798 | VRAM 119GB (58%) | ETA 16.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3300/5000 | loss 5.4718 | ppl 237.9 | lr 1.54e-05 | gnorm 1.27 | tok/s 5,799 | VRAM 119GB (58%) | ETA 16.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.3304 ppl=206.5 (best=5.2780) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3300, full state + optimizer) |
| step 3310/5000 | loss 5.5244 | ppl 250.7 | lr 1.54e-05 | gnorm 1.25 | tok/s 5,649 | VRAM 119GB (58%) | ETA 16.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3320/5000 | loss 5.3519 | ppl 211.0 | lr 1.53e-05 | gnorm 1.36 | tok/s 5,504 | VRAM 119GB (58%) | ETA 16.7h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3330/5000 | loss 5.3470 | ppl 210.0 | lr 1.53e-05 | gnorm 1.41 | tok/s 5,367 | VRAM 119GB (58%) | ETA 17.0h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3340/5000 | loss 5.4374 | ppl 229.8 | lr 1.53e-05 | gnorm 1.27 | tok/s 5,235 | VRAM 119GB (58%) | ETA 17.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3350/5000 | loss 5.2242 | ppl 185.7 | lr 1.52e-05 | gnorm 1.44 | tok/s 5,109 | VRAM 119GB (58%) | ETA 17.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3360/5000 | loss 5.2989 | ppl 200.1 | lr 1.52e-05 | gnorm 1.40 | tok/s 5,107 | VRAM 119GB (58%) | ETA 17.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3370/5000 | loss 5.4252 | ppl 227.0 | lr 1.52e-05 | gnorm 1.23 | tok/s 5,106 | VRAM 119GB (58%) | ETA 17.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3380/5000 | loss 5.3276 | ppl 205.9 | lr 1.52e-05 | gnorm 1.49 | tok/s 5,106 | VRAM 119GB (58%) | ETA 17.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3390/5000 | loss 5.3532 | ppl 211.3 | lr 1.51e-05 | gnorm 1.36 | tok/s 5,107 | VRAM 119GB (58%) | ETA 17.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3400/5000 | loss 5.3068 | ppl 201.7 | lr 1.51e-05 | gnorm 1.52 | tok/s 5,107 | VRAM 119GB (58%) | ETA 17.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.3810 ppl=217.2 (best=5.2780) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3400, full state + optimizer) |
| step 3410/5000 | loss 5.3361 | ppl 207.7 | lr 1.51e-05 | gnorm 1.18 | tok/s 5,232 | VRAM 119GB (58%) | ETA 16.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3420/5000 | loss 5.3199 | ppl 204.4 | lr 1.51e-05 | gnorm 1.35 | tok/s 5,364 | VRAM 119GB (58%) | ETA 16.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3430/5000 | loss 5.3233 | ppl 205.1 | lr 1.51e-05 | gnorm 1.18 | tok/s 5,502 | VRAM 119GB (58%) | ETA 15.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3440/5000 | loss 5.3585 | ppl 212.4 | lr 1.50e-05 | gnorm 1.52 | tok/s 5,647 | VRAM 119GB (58%) | ETA 15.1h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3450/5000 | loss 5.3438 | ppl 209.3 | lr 1.50e-05 | gnorm 1.23 | tok/s 5,801 | VRAM 119GB (58%) | ETA 14.6h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3460/5000 | loss 5.3168 | ppl 203.7 | lr 1.50e-05 | gnorm 1.30 | tok/s 5,801 | VRAM 119GB (58%) | ETA 14.5h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3470/5000 | loss 5.2030 | ppl 181.8 | lr 1.50e-05 | gnorm 1.28 | tok/s 5,802 | VRAM 119GB (58%) | ETA 14.4h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3480/5000 | loss 5.3649 | ppl 213.8 | lr 1.50e-05 | gnorm 1.52 | tok/s 5,802 | VRAM 119GB (58%) | ETA 14.3h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3490/5000 | loss 5.3546 | ppl 211.6 | lr 1.50e-05 | gnorm 1.18 | tok/s 5,801 | VRAM 119GB (58%) | ETA 14.2h C3 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3500/5000 | loss 5.3840 | ppl 217.9 | lr 1.50e-05 | gnorm 1.26 | tok/s 5,802 | VRAM 119GB (58%) | ETA 14.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.3131 ppl=203.0 (best=5.2780) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3500, full state + optimizer) |
| step 3510/5000 | loss 5.4281 | ppl 227.7 | lr 1.56e-05 | gnorm 1.52 | tok/s 5,801 | VRAM 119GB (58%) | ETA 14.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3520/5000 | loss 5.3775 | ppl 216.5 | lr 1.61e-05 | gnorm 1.29 | tok/s 5,801 | VRAM 119GB (58%) | ETA 13.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3530/5000 | loss 5.4500 | ppl 232.8 | lr 1.66e-05 | gnorm 1.25 | tok/s 5,801 | VRAM 119GB (58%) | ETA 13.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3540/5000 | loss 5.2171 | ppl 184.4 | lr 1.71e-05 | gnorm 1.27 | tok/s 5,800 | VRAM 119GB (58%) | ETA 13.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3550/5000 | loss 5.3167 | ppl 203.7 | lr 1.75e-05 | gnorm 1.41 | tok/s 5,801 | VRAM 119GB (58%) | ETA 13.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3560/5000 | loss 5.4094 | ppl 223.5 | lr 1.81e-05 | gnorm 1.34 | tok/s 5,801 | VRAM 119GB (58%) | ETA 13.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3570/5000 | loss 5.2578 | ppl 192.1 | lr 1.86e-05 | gnorm 1.38 | tok/s 5,800 | VRAM 119GB (58%) | ETA 13.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3580/5000 | loss 5.3372 | ppl 207.9 | lr 1.91e-05 | gnorm 1.16 | tok/s 5,800 | VRAM 119GB (58%) | ETA 13.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3590/5000 | loss 5.2401 | ppl 188.7 | lr 1.96e-05 | gnorm 1.39 | tok/s 5,800 | VRAM 119GB (58%) | ETA 13.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3600/5000 | loss 5.4146 | ppl 224.7 | lr 2.00e-05 | gnorm 1.57 | tok/s 5,799 | VRAM 119GB (58%) | ETA 13.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.3594 ppl=212.6 (best=5.2780) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3600, full state + optimizer) |
| step 3610/5000 | loss 5.3050 | ppl 201.3 | lr 2.00e-05 | gnorm 1.45 | tok/s 5,799 | VRAM 119GB (58%) | ETA 13.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3620/5000 | loss 5.5146 | ppl 248.3 | lr 2.00e-05 | gnorm 1.53 | tok/s 5,800 | VRAM 119GB (58%) | ETA 13.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3630/5000 | loss 5.4447 | ppl 231.5 | lr 2.00e-05 | gnorm 1.62 | tok/s 5,800 | VRAM 119GB (58%) | ETA 12.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3640/5000 | loss 5.3102 | ppl 202.4 | lr 2.00e-05 | gnorm 1.59 | tok/s 5,799 | VRAM 119GB (58%) | ETA 12.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3650/5000 | loss 5.5038 | ppl 245.6 | lr 2.00e-05 | gnorm 1.31 | tok/s 5,799 | VRAM 119GB (58%) | ETA 12.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3660/5000 | loss 5.4059 | ppl 222.7 | lr 2.00e-05 | gnorm 1.48 | tok/s 5,799 | VRAM 119GB (58%) | ETA 12.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3670/5000 | loss 5.3952 | ppl 220.3 | lr 2.00e-05 | gnorm 1.60 | tok/s 5,798 | VRAM 119GB (58%) | ETA 12.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3680/5000 | loss 5.3411 | ppl 208.7 | lr 2.00e-05 | gnorm 1.50 | tok/s 5,799 | VRAM 119GB (58%) | ETA 12.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3690/5000 | loss 5.4777 | ppl 239.3 | lr 1.99e-05 | gnorm 1.41 | tok/s 5,800 | VRAM 119GB (58%) | ETA 12.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3700/5000 | loss 5.4171 | ppl 225.2 | lr 1.99e-05 | gnorm 1.80 | tok/s 5,800 | VRAM 119GB (58%) | ETA 12.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.1359 ppl=170.0 ★ NEW BEST → saved (+ EMA + full optimizer) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3700, full state + optimizer) |
| step 3710/5000 | loss 5.3428 | ppl 209.1 | lr 1.99e-05 | gnorm 1.38 | tok/s 5,799 | VRAM 119GB (58%) | ETA 12.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3720/5000 | loss 5.3910 | ppl 219.4 | lr 1.99e-05 | gnorm 1.45 | tok/s 5,800 | VRAM 119GB (58%) | ETA 12.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3730/5000 | loss 5.4490 | ppl 232.5 | lr 1.99e-05 | gnorm 1.39 | tok/s 5,799 | VRAM 119GB (58%) | ETA 12.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3740/5000 | loss 5.4098 | ppl 223.6 | lr 1.99e-05 | gnorm 1.27 | tok/s 5,799 | VRAM 119GB (58%) | ETA 11.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3750/5000 | loss 5.3475 | ppl 210.1 | lr 1.99e-05 | gnorm 1.31 | tok/s 5,799 | VRAM 119GB (58%) | ETA 11.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3760/5000 | loss 5.3055 | ppl 201.4 | lr 1.98e-05 | gnorm 1.39 | tok/s 5,800 | VRAM 119GB (58%) | ETA 11.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3770/5000 | loss 5.4678 | ppl 236.9 | lr 1.98e-05 | gnorm 1.23 | tok/s 5,727 | VRAM 119GB (58%) | ETA 11.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3780/5000 | loss 5.3639 | ppl 213.6 | lr 1.98e-05 | gnorm 1.44 | tok/s 5,579 | VRAM 119GB (58%) | ETA 11.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3790/5000 | loss 5.3999 | ppl 221.4 | lr 1.98e-05 | gnorm 1.28 | tok/s 5,438 | VRAM 119GB (58%) | ETA 12.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3800/5000 | loss 5.4344 | ppl 229.2 | lr 1.98e-05 | gnorm 1.35 | tok/s 5,304 | VRAM 119GB (58%) | ETA 12.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.3704 ppl=214.9 (best=5.1359) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3800, full state + optimizer) |
| step 3810/5000 | loss 5.4967 | ppl 243.9 | lr 1.97e-05 | gnorm 1.23 | tok/s 5,305 | VRAM 119GB (58%) | ETA 12.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3820/5000 | loss 5.2320 | ppl 187.2 | lr 1.97e-05 | gnorm 1.29 | tok/s 5,367 | VRAM 119GB (58%) | ETA 12.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3830/5000 | loss 5.1748 | ppl 176.8 | lr 1.97e-05 | gnorm 1.30 | tok/s 5,504 | VRAM 119GB (58%) | ETA 11.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3840/5000 | loss 5.2995 | ppl 200.2 | lr 1.96e-05 | gnorm 1.52 | tok/s 5,650 | VRAM 119GB (58%) | ETA 11.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3850/5000 | loss 5.4423 | ppl 231.0 | lr 1.96e-05 | gnorm 1.49 | tok/s 5,803 | VRAM 119GB (58%) | ETA 10.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3860/5000 | loss 5.3073 | ppl 201.8 | lr 1.96e-05 | gnorm 1.13 | tok/s 5,802 | VRAM 119GB (58%) | ETA 10.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3870/5000 | loss 5.4388 | ppl 230.2 | lr 1.96e-05 | gnorm 1.30 | tok/s 5,802 | VRAM 119GB (58%) | ETA 10.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3880/5000 | loss 5.3153 | ppl 203.4 | lr 1.95e-05 | gnorm 1.48 | tok/s 5,802 | VRAM 119GB (58%) | ETA 10.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3890/5000 | loss 5.2094 | ppl 183.0 | lr 1.95e-05 | gnorm 1.41 | tok/s 5,801 | VRAM 119GB (58%) | ETA 10.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3900/5000 | loss 5.3008 | ppl 200.5 | lr 1.95e-05 | gnorm 1.34 | tok/s 5,801 | VRAM 119GB (58%) | ETA 10.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.3238 ppl=205.2 (best=5.1359) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 3900, full state + optimizer) |
| step 3910/5000 | loss 5.3193 | ppl 204.2 | lr 1.94e-05 | gnorm 1.52 | tok/s 5,665 | VRAM 119GB (58%) | ETA 10.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3920/5000 | loss 5.4208 | ppl 226.1 | lr 1.94e-05 | gnorm 1.45 | tok/s 5,518 | VRAM 119GB (58%) | ETA 10.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3930/5000 | loss 5.3002 | ppl 200.4 | lr 1.93e-05 | gnorm 1.27 | tok/s 5,378 | VRAM 119GB (58%) | ETA 10.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3940/5000 | loss 5.2733 | ppl 195.1 | lr 1.93e-05 | gnorm 1.37 | tok/s 5,246 | VRAM 119GB (58%) | ETA 11.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3950/5000 | loss 5.3122 | ppl 202.8 | lr 1.93e-05 | gnorm 1.45 | tok/s 5,120 | VRAM 119GB (58%) | ETA 11.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3960/5000 | loss 5.3368 | ppl 207.8 | lr 1.92e-05 | gnorm 1.55 | tok/s 5,106 | VRAM 119GB (58%) | ETA 11.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3970/5000 | loss 5.3380 | ppl 208.1 | lr 1.92e-05 | gnorm 1.30 | tok/s 5,106 | VRAM 119GB (58%) | ETA 11.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3980/5000 | loss 5.3551 | ppl 211.7 | lr 1.91e-05 | gnorm 1.45 | tok/s 5,106 | VRAM 119GB (58%) | ETA 10.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 3990/5000 | loss 5.3380 | ppl 208.1 | lr 1.91e-05 | gnorm 1.48 | tok/s 5,106 | VRAM 119GB (58%) | ETA 10.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4000/5000 | loss 5.2200 | ppl 184.9 | lr 1.91e-05 | gnorm 1.62 | tok/s 5,106 | VRAM 119GB (58%) | ETA 10.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.2679 ppl=194.0 (best=5.1359) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4000, full state + optimizer) |
| >> MILESTONE step 4000 LOCKED → /mnt/scratch/checkpoints/frankenstein_v2_milestone_4000.pt |
| step 4010/5000 | loss 5.2624 | ppl 192.9 | lr 1.90e-05 | gnorm 1.23 | tok/s 5,231 | VRAM 119GB (58%) | ETA 10.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4020/5000 | loss 5.3521 | ppl 211.1 | lr 1.90e-05 | gnorm 1.54 | tok/s 5,363 | VRAM 119GB (58%) | ETA 10.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4030/5000 | loss 5.3664 | ppl 214.1 | lr 1.89e-05 | gnorm 1.34 | tok/s 5,501 | VRAM 119GB (58%) | ETA 9.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4040/5000 | loss 5.2996 | ppl 200.2 | lr 1.89e-05 | gnorm 1.35 | tok/s 5,647 | VRAM 119GB (58%) | ETA 9.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4050/5000 | loss 5.2745 | ppl 195.3 | lr 1.88e-05 | gnorm 1.32 | tok/s 5,801 | VRAM 119GB (58%) | ETA 8.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4060/5000 | loss 5.2770 | ppl 195.8 | lr 1.88e-05 | gnorm 1.58 | tok/s 5,802 | VRAM 119GB (58%) | ETA 8.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4070/5000 | loss 5.3921 | ppl 219.7 | lr 1.87e-05 | gnorm 1.35 | tok/s 5,803 | VRAM 119GB (58%) | ETA 8.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4080/5000 | loss 5.2833 | ppl 197.0 | lr 1.87e-05 | gnorm 1.59 | tok/s 5,803 | VRAM 119GB (58%) | ETA 8.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4090/5000 | loss 5.4707 | ppl 237.6 | lr 1.86e-05 | gnorm 1.24 | tok/s 5,803 | VRAM 119GB (58%) | ETA 8.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4100/5000 | loss 5.3238 | ppl 205.2 | lr 1.86e-05 | gnorm 1.59 | tok/s 5,804 | VRAM 119GB (58%) | ETA 8.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.3574 ppl=212.2 (best=5.1359) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4100, full state + optimizer) |
| step 4110/5000 | loss 5.3696 | ppl 214.8 | lr 1.85e-05 | gnorm 1.28 | tok/s 5,804 | VRAM 119GB (58%) | ETA 8.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4120/5000 | loss 5.2697 | ppl 194.3 | lr 1.85e-05 | gnorm 1.65 | tok/s 5,804 | VRAM 119GB (58%) | ETA 8.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4130/5000 | loss 5.2788 | ppl 196.1 | lr 1.84e-05 | gnorm 1.10 | tok/s 5,803 | VRAM 119GB (58%) | ETA 8.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4140/5000 | loss 5.3446 | ppl 209.5 | lr 1.84e-05 | gnorm 1.30 | tok/s 5,802 | VRAM 119GB (58%) | ETA 8.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4150/5000 | loss 5.1764 | ppl 177.0 | lr 1.83e-05 | gnorm 1.35 | tok/s 5,802 | VRAM 119GB (58%) | ETA 8.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4160/5000 | loss 5.2608 | ppl 192.6 | lr 1.83e-05 | gnorm 1.51 | tok/s 5,801 | VRAM 119GB (58%) | ETA 7.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4170/5000 | loss 5.3678 | ppl 214.4 | lr 1.82e-05 | gnorm 1.24 | tok/s 5,799 | VRAM 119GB (58%) | ETA 7.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4180/5000 | loss 5.2964 | ppl 199.6 | lr 1.82e-05 | gnorm 1.54 | tok/s 5,799 | VRAM 119GB (58%) | ETA 7.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4190/5000 | loss 5.4885 | ppl 241.9 | lr 1.81e-05 | gnorm 1.15 | tok/s 5,800 | VRAM 119GB (58%) | ETA 7.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4200/5000 | loss 5.4148 | ppl 224.7 | lr 1.81e-05 | gnorm 1.41 | tok/s 5,798 | VRAM 119GB (58%) | ETA 7.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.2861 ppl=197.6 (best=5.1359) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4200, full state + optimizer) |
| step 4210/5000 | loss 5.3041 | ppl 201.2 | lr 1.80e-05 | gnorm 1.33 | tok/s 5,799 | VRAM 119GB (58%) | ETA 7.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4220/5000 | loss 5.3657 | ppl 213.9 | lr 1.79e-05 | gnorm 1.62 | tok/s 5,798 | VRAM 119GB (58%) | ETA 7.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4230/5000 | loss 5.3724 | ppl 215.4 | lr 1.79e-05 | gnorm 1.45 | tok/s 5,799 | VRAM 119GB (58%) | ETA 7.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4240/5000 | loss 5.2941 | ppl 199.2 | lr 1.78e-05 | gnorm 1.55 | tok/s 5,799 | VRAM 119GB (58%) | ETA 7.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4250/5000 | loss 5.4146 | ppl 224.7 | lr 1.78e-05 | gnorm 1.48 | tok/s 5,799 | VRAM 119GB (58%) | ETA 7.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4260/5000 | loss 5.4285 | ppl 227.8 | lr 1.77e-05 | gnorm 1.37 | tok/s 5,800 | VRAM 119GB (58%) | ETA 7.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.73 1.26 0.73] |
| step 4270/5000 | loss 5.2140 | ppl 183.8 | lr 1.77e-05 | gnorm 1.35 | tok/s 5,801 | VRAM 119GB (58%) | ETA 6.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4280/5000 | loss 5.1953 | ppl 180.4 | lr 1.76e-05 | gnorm 1.50 | tok/s 5,800 | VRAM 119GB (58%) | ETA 6.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4290/5000 | loss 5.2776 | ppl 195.9 | lr 1.76e-05 | gnorm 1.45 | tok/s 5,800 | VRAM 119GB (58%) | ETA 6.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4300/5000 | loss 5.3880 | ppl 218.8 | lr 1.75e-05 | gnorm 1.26 | tok/s 5,800 | VRAM 119GB (58%) | ETA 6.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.2932 ppl=199.0 (best=5.1359) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4300, full state + optimizer) |
| step 4310/5000 | loss 5.2318 | ppl 187.1 | lr 1.74e-05 | gnorm 1.37 | tok/s 5,800 | VRAM 119GB (58%) | ETA 6.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4320/5000 | loss 5.1866 | ppl 178.9 | lr 1.74e-05 | gnorm 1.57 | tok/s 5,799 | VRAM 119GB (58%) | ETA 6.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4330/5000 | loss 5.2669 | ppl 193.8 | lr 1.73e-05 | gnorm 1.50 | tok/s 5,799 | VRAM 119GB (58%) | ETA 6.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4340/5000 | loss 5.2029 | ppl 181.8 | lr 1.73e-05 | gnorm 1.37 | tok/s 5,798 | VRAM 119GB (58%) | ETA 6.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4350/5000 | loss 5.2440 | ppl 189.4 | lr 1.72e-05 | gnorm 1.26 | tok/s 5,798 | VRAM 119GB (58%) | ETA 6.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4360/5000 | loss 5.4278 | ppl 227.7 | lr 1.72e-05 | gnorm 1.44 | tok/s 5,798 | VRAM 119GB (58%) | ETA 6.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4370/5000 | loss 5.3880 | ppl 218.8 | lr 1.71e-05 | gnorm 1.33 | tok/s 5,799 | VRAM 119GB (58%) | ETA 5.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4380/5000 | loss 5.3397 | ppl 208.5 | lr 1.71e-05 | gnorm 1.72 | tok/s 5,799 | VRAM 119GB (58%) | ETA 5.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4390/5000 | loss 5.3737 | ppl 215.7 | lr 1.70e-05 | gnorm 1.26 | tok/s 5,799 | VRAM 119GB (58%) | ETA 5.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4400/5000 | loss 5.3451 | ppl 209.6 | lr 1.69e-05 | gnorm 1.29 | tok/s 5,799 | VRAM 119GB (58%) | ETA 5.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.2215 ppl=185.2 (best=5.1359) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4400, full state + optimizer) |
| step 4410/5000 | loss 5.2720 | ppl 194.8 | lr 1.69e-05 | gnorm 1.45 | tok/s 5,648 | VRAM 119GB (58%) | ETA 5.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4420/5000 | loss 5.3076 | ppl 201.9 | lr 1.68e-05 | gnorm 1.42 | tok/s 5,501 | VRAM 119GB (58%) | ETA 5.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4430/5000 | loss 5.3635 | ppl 213.5 | lr 1.68e-05 | gnorm 1.46 | tok/s 5,363 | VRAM 119GB (58%) | ETA 5.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4440/5000 | loss 5.4320 | ppl 228.6 | lr 1.67e-05 | gnorm 1.56 | tok/s 5,232 | VRAM 119GB (58%) | ETA 5.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4450/5000 | loss 5.3418 | ppl 208.9 | lr 1.67e-05 | gnorm 1.40 | tok/s 5,107 | VRAM 119GB (58%) | ETA 5.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4460/5000 | loss 5.3990 | ppl 221.2 | lr 1.66e-05 | gnorm 1.55 | tok/s 5,104 | VRAM 119GB (58%) | ETA 5.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4470/5000 | loss 5.2649 | ppl 193.4 | lr 1.66e-05 | gnorm 1.26 | tok/s 5,105 | VRAM 119GB (58%) | ETA 5.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4480/5000 | loss 5.3802 | ppl 217.1 | lr 1.65e-05 | gnorm 1.33 | tok/s 5,105 | VRAM 119GB (58%) | ETA 5.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4490/5000 | loss 5.1792 | ppl 177.5 | lr 1.65e-05 | gnorm 1.45 | tok/s 5,104 | VRAM 119GB (58%) | ETA 5.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4500/5000 | loss 5.4291 | ppl 227.9 | lr 1.64e-05 | gnorm 1.23 | tok/s 5,104 | VRAM 119GB (58%) | ETA 5.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.2665 ppl=193.7 (best=5.1359) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4500, full state + optimizer) |
| step 4510/5000 | loss 5.2796 | ppl 196.3 | lr 1.64e-05 | gnorm 1.22 | tok/s 5,231 | VRAM 119GB (58%) | ETA 5.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4520/5000 | loss 5.2812 | ppl 196.6 | lr 1.63e-05 | gnorm 1.88 | tok/s 5,362 | VRAM 119GB (58%) | ETA 4.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4530/5000 | loss 5.2610 | ppl 192.7 | lr 1.63e-05 | gnorm 1.48 | tok/s 5,498 | VRAM 119GB (58%) | ETA 4.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4540/5000 | loss 5.3610 | ppl 212.9 | lr 1.62e-05 | gnorm 1.61 | tok/s 5,643 | VRAM 119GB (58%) | ETA 4.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4550/5000 | loss 5.2436 | ppl 189.3 | lr 1.62e-05 | gnorm 1.21 | tok/s 5,797 | VRAM 119GB (58%) | ETA 4.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4560/5000 | loss 5.3148 | ppl 203.3 | lr 1.61e-05 | gnorm 1.47 | tok/s 5,796 | VRAM 119GB (58%) | ETA 4.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4570/5000 | loss 5.3149 | ppl 203.3 | lr 1.61e-05 | gnorm 2.30 | tok/s 5,796 | VRAM 119GB (58%) | ETA 4.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4580/5000 | loss 5.3789 | ppl 216.8 | lr 1.60e-05 | gnorm 1.30 | tok/s 5,798 | VRAM 119GB (58%) | ETA 4.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4590/5000 | loss 5.3713 | ppl 215.1 | lr 1.60e-05 | gnorm 1.54 | tok/s 5,800 | VRAM 119GB (58%) | ETA 3.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4600/5000 | loss 5.3317 | ppl 206.8 | lr 1.59e-05 | gnorm 1.30 | tok/s 5,800 | VRAM 119GB (58%) | ETA 3.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.2795 ppl=196.3 (best=5.1359) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4600, full state + optimizer) |
| step 4610/5000 | loss 5.2330 | ppl 187.4 | lr 1.59e-05 | gnorm 1.36 | tok/s 5,800 | VRAM 119GB (58%) | ETA 3.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4620/5000 | loss 5.2370 | ppl 188.1 | lr 1.59e-05 | gnorm 1.22 | tok/s 5,800 | VRAM 119GB (58%) | ETA 3.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4630/5000 | loss 5.3151 | ppl 203.4 | lr 1.58e-05 | gnorm 1.45 | tok/s 5,800 | VRAM 119GB (58%) | ETA 3.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4640/5000 | loss 5.2922 | ppl 198.8 | lr 1.58e-05 | gnorm 1.48 | tok/s 5,799 | VRAM 119GB (58%) | ETA 3.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4650/5000 | loss 5.3756 | ppl 216.1 | lr 1.57e-05 | gnorm 1.49 | tok/s 5,696 | VRAM 119GB (58%) | ETA 3.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4660/5000 | loss 5.2680 | ppl 194.0 | lr 1.57e-05 | gnorm 1.09 | tok/s 5,547 | VRAM 119GB (58%) | ETA 3.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4670/5000 | loss 5.4040 | ppl 222.3 | lr 1.57e-05 | gnorm 1.57 | tok/s 5,408 | VRAM 119GB (58%) | ETA 3.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4680/5000 | loss 5.3783 | ppl 216.6 | lr 1.56e-05 | gnorm 1.31 | tok/s 5,274 | VRAM 119GB (58%) | ETA 3.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4690/5000 | loss 5.1276 | ppl 168.6 | lr 1.56e-05 | gnorm 1.34 | tok/s 5,147 | VRAM 119GB (58%) | ETA 3.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4700/5000 | loss 5.3363 | ppl 207.7 | lr 1.55e-05 | gnorm 1.16 | tok/s 5,106 | VRAM 119GB (58%) | ETA 3.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.2675 ppl=193.9 (best=5.1359) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4700, full state + optimizer) |
| step 4710/5000 | loss 5.2201 | ppl 185.0 | lr 1.55e-05 | gnorm 1.32 | tok/s 5,162 | VRAM 119GB (58%) | ETA 3.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4720/5000 | loss 5.3301 | ppl 206.5 | lr 1.55e-05 | gnorm 1.71 | tok/s 5,160 | VRAM 119GB (58%) | ETA 3.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4730/5000 | loss 5.3909 | ppl 219.4 | lr 1.54e-05 | gnorm 1.31 | tok/s 5,161 | VRAM 119GB (58%) | ETA 2.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4740/5000 | loss 5.2943 | ppl 199.2 | lr 1.54e-05 | gnorm 1.38 | tok/s 5,160 | VRAM 119GB (58%) | ETA 2.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4750/5000 | loss 5.3384 | ppl 208.2 | lr 1.54e-05 | gnorm 1.46 | tok/s 5,160 | VRAM 119GB (58%) | ETA 2.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4760/5000 | loss 5.2979 | ppl 199.9 | lr 1.54e-05 | gnorm 1.33 | tok/s 5,103 | VRAM 119GB (58%) | ETA 2.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4770/5000 | loss 5.3215 | ppl 204.7 | lr 1.53e-05 | gnorm 1.38 | tok/s 5,103 | VRAM 119GB (58%) | ETA 2.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4780/5000 | loss 5.3302 | ppl 206.5 | lr 1.53e-05 | gnorm 1.26 | tok/s 5,103 | VRAM 119GB (58%) | ETA 2.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4790/5000 | loss 5.3284 | ppl 206.1 | lr 1.53e-05 | gnorm 1.42 | tok/s 5,102 | VRAM 119GB (58%) | ETA 2.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4800/5000 | loss 5.4404 | ppl 230.5 | lr 1.52e-05 | gnorm 4.03 | tok/s 5,099 | VRAM 119GB (58%) | ETA 2.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.2457 ppl=189.7 (best=5.1359) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4800, full state + optimizer) |
| step 4810/5000 | loss 5.3835 | ppl 217.8 | lr 1.52e-05 | gnorm 1.70 | tok/s 5,223 | VRAM 119GB (58%) | ETA 2.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4820/5000 | loss 5.2967 | ppl 199.7 | lr 1.52e-05 | gnorm 1.25 | tok/s 5,355 | VRAM 119GB (58%) | ETA 1.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4830/5000 | loss 5.3657 | ppl 213.9 | lr 1.52e-05 | gnorm 1.38 | tok/s 5,494 | VRAM 119GB (58%) | ETA 1.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4840/5000 | loss 5.3776 | ppl 216.5 | lr 1.52e-05 | gnorm 1.31 | tok/s 5,641 | VRAM 119GB (58%) | ETA 1.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4850/5000 | loss 5.2128 | ppl 183.6 | lr 1.51e-05 | gnorm 1.30 | tok/s 5,798 | VRAM 119GB (58%) | ETA 1.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4860/5000 | loss 5.3282 | ppl 206.1 | lr 1.51e-05 | gnorm 2.41 | tok/s 5,799 | VRAM 119GB (58%) | ETA 1.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4870/5000 | loss 5.4324 | ppl 228.7 | lr 1.51e-05 | gnorm 1.20 | tok/s 5,799 | VRAM 119GB (58%) | ETA 1.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4880/5000 | loss 5.3635 | ppl 213.5 | lr 1.51e-05 | gnorm 1.22 | tok/s 5,800 | VRAM 119GB (58%) | ETA 1.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4890/5000 | loss 5.3306 | ppl 206.6 | lr 1.51e-05 | gnorm 1.13 | tok/s 5,799 | VRAM 119GB (58%) | ETA 1.0h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4900/5000 | loss 5.3643 | ppl 213.6 | lr 1.51e-05 | gnorm 1.07 | tok/s 5,799 | VRAM 119GB (58%) | ETA 0.9h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| >> EVAL: val_loss=5.2650 ppl=193.4 (best=5.1359) |
| >> Saved /mnt/scratch/checkpoints/frankenstein_v2_latest.pt (step 4900, full state + optimizer) |
| step 4910/5000 | loss 5.2922 | ppl 198.8 | lr 1.51e-05 | gnorm 1.30 | tok/s 5,800 | VRAM 119GB (58%) | ETA 0.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4920/5000 | loss 5.4508 | ppl 232.9 | lr 1.50e-05 | gnorm 1.67 | tok/s 5,800 | VRAM 119GB (58%) | ETA 0.8h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4930/5000 | loss 5.3588 | ppl 212.5 | lr 1.50e-05 | gnorm 1.20 | tok/s 5,798 | VRAM 119GB (58%) | ETA 0.7h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4940/5000 | loss 5.3917 | ppl 219.6 | lr 1.50e-05 | gnorm 1.53 | tok/s 5,799 | VRAM 119GB (58%) | ETA 0.6h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4950/5000 | loss 5.2779 | ppl 196.0 | lr 1.50e-05 | gnorm 1.73 | tok/s 5,798 | VRAM 119GB (58%) | ETA 0.5h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4960/5000 | loss 5.3938 | ppl 220.0 | lr 1.50e-05 | gnorm 1.26 | tok/s 5,796 | VRAM 119GB (58%) | ETA 0.4h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4970/5000 | loss 5.3809 | ppl 217.2 | lr 1.50e-05 | gnorm 1.12 | tok/s 5,797 | VRAM 119GB (58%) | ETA 0.3h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4980/5000 | loss 5.3169 | ppl 203.8 | lr 1.50e-05 | gnorm 1.31 | tok/s 5,798 | VRAM 119GB (58%) | ETA 0.2h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
| step 4990/5000 | loss 5.2999 | ppl 200.3 | lr 1.50e-05 | gnorm 1.34 | tok/s 5,797 | VRAM 119GB (58%) | ETA 0.1h C4 | [E0:32% E1:18% E2:31% E3:18%] CF=[1.28 0.74 1.26 0.73] |
|
|
| ====================================================================== |
| REALIGNMENT v2 COMPLETE |
| ====================================================================== |
| Steps: 5000 |
| Total tokens: 0.98B |
| Best val_loss: 5.1359 |
| Total time: 50.3h |
| Final: /mnt/scratch/checkpoints/frankenstein_v2_final.pt |
| Best: /mnt/scratch/checkpoints/frankenstein_v2_best.pt |
| EMA best: /mnt/scratch/checkpoints/frankenstein_v2_ema_best.pt |
|
|