opt-babylm1_seed-42_1e-6

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.9858

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 32
  • eval_batch_size: 64
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 20.0

Training results

Training Loss Epoch Step Validation Loss
5.964 0.4206 1000 5.8746
4.9145 0.8413 2000 4.8984
4.5344 1.2616 3000 4.5433
4.2322 1.6823 4000 4.2203
3.9694 2.1026 5000 3.9752
3.7958 2.5233 6000 3.7851
3.6053 2.9439 7000 3.5977
3.4888 3.3643 8000 3.4823
3.404 3.7849 9000 3.3924
3.3238 4.2053 10000 3.3380
3.2816 4.6259 11000 3.2862
3.2171 5.0463 12000 3.2522
3.1938 5.4669 13000 3.2163
3.1599 5.8875 14000 3.1851
3.124 6.3079 15000 3.1696
3.1094 6.7285 16000 3.1485
3.0677 7.1489 17000 3.1319
3.0715 7.5695 18000 3.1178
3.0578 7.9902 19000 3.1009
3.0319 8.4105 20000 3.0907
3.0204 8.8312 21000 3.0804
2.9903 9.2515 22000 3.0694
2.9874 9.6722 23000 3.0618
2.9539 10.0925 24000 3.0564
2.9538 10.5132 25000 3.0468
2.9552 10.9338 26000 3.0397
2.9319 11.3542 27000 3.0366
2.9305 11.7748 28000 3.0280
2.9145 12.1952 29000 3.0254
2.9091 12.6158 30000 3.0211
2.8855 13.0362 31000 3.0164
2.8941 13.4568 32000 3.0127
2.886 13.8774 33000 3.0080
2.8712 14.2978 34000 3.0073
2.8764 14.7184 35000 3.0029
2.8622 15.1388 36000 3.0007
2.865 15.5594 37000 2.9975
2.862 15.9801 38000 2.9947
2.8394 16.4004 39000 2.9937
2.85 16.8211 40000 2.9913
2.8272 17.2414 41000 2.9909
2.8336 17.6621 42000 2.9888
2.8245 18.0824 43000 2.9882
2.8293 18.5031 44000 2.9874
2.8366 18.9237 45000 2.9864
2.8243 19.3441 46000 2.9861
2.8229 19.7647 47000 2.9858

Framework versions

  • Transformers 4.54.0
  • Pytorch 2.10.0+cu128
  • Datasets 3.2.0
  • Tokenizers 0.21.4
Downloads last month
76
Safetensors
Model size
97.8M params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support