opt-babylm1-ntb_seed-1024_5e-6

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.9423

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 32
  • eval_batch_size: 64
  • seed: 1024
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 20.0

Training results

Training Loss Epoch Step Validation Loss
5.2931 0.4284 1000 5.2469
4.7248 0.8567 2000 4.7222
4.2333 1.2849 3000 4.2283
3.8715 1.7132 4000 3.8683
3.56 2.1414 5000 3.5667
3.4165 2.5697 6000 3.4090
3.3154 2.9981 7000 3.3184
3.2294 3.4262 8000 3.2480
3.1789 3.8546 9000 3.1970
3.1107 4.2827 10000 3.1586
3.0822 4.7111 11000 3.1289
3.0348 5.1392 12000 3.1093
3.0284 5.5676 13000 3.0849
3.0121 5.9959 14000 3.0679
2.9631 6.4241 15000 3.0565
2.9636 6.8524 16000 3.0366
2.9202 7.2806 17000 3.0331
2.9172 7.7089 18000 3.0174
2.8756 8.1371 19000 3.0138
2.88 8.5654 20000 3.0041
2.8866 8.9938 21000 2.9964
2.8505 9.4219 22000 2.9931
2.857 9.8503 23000 2.9814
2.8132 10.2784 24000 2.9835
2.8305 10.7068 25000 2.9775
2.7874 11.1349 26000 2.9745
2.806 11.5633 27000 2.9664
2.8069 11.9916 28000 2.9603
2.776 12.4198 29000 2.9656
2.7852 12.8481 30000 2.9596
2.754 13.2763 31000 2.9598
2.7609 13.7046 32000 2.9539
2.7254 14.1328 33000 2.9552
2.74 14.5611 34000 2.9548
2.7416 14.9895 35000 2.9452
2.7196 15.4176 36000 2.9497
2.7254 15.8460 37000 2.9453
2.6957 16.2741 38000 2.9476
2.7028 16.7025 39000 2.9434
2.6772 17.1306 40000 2.9465
2.6905 17.5590 41000 2.9442
2.6925 17.9874 42000 2.9403
2.6761 18.4155 43000 2.9433
2.6742 18.8439 44000 2.9416
2.6641 19.2720 45000 2.9428
2.6606 19.7004 46000 2.9423

Framework versions

  • Transformers 4.54.0
  • Pytorch 2.10.0+cu128
  • Datasets 3.2.0
  • Tokenizers 0.21.4
Downloads last month
67
Safetensors
Model size
97.8M params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support