opt-babylm1_seed-211_5e-6

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.9352

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 32
  • eval_batch_size: 64
  • seed: 211
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 20.0

Training results

Training Loss Epoch Step Validation Loss
5.2657 0.4206 1000 5.2429
4.7058 0.8413 2000 4.7120
4.2262 1.2616 3000 4.2310
3.8735 1.6823 4000 3.8682
3.5661 2.1026 5000 3.5738
3.4174 2.5233 6000 3.4080
3.3118 2.9439 7000 3.3058
3.2284 3.3643 8000 3.2433
3.18 3.7849 9000 3.1947
3.1179 4.2053 10000 3.1561
3.0937 4.6259 11000 3.1282
3.0223 5.0463 12000 3.1030
3.0283 5.4669 13000 3.0828
3.0162 5.8875 14000 3.0614
2.9568 6.3079 15000 3.0483
2.965 6.7285 16000 3.0331
2.9084 7.1489 17000 3.0258
2.9168 7.5695 18000 3.0127
2.9277 7.9902 19000 3.0023
2.8789 8.4105 20000 2.9999
2.887 8.8312 21000 2.9910
2.85 9.2515 22000 2.9881
2.8541 9.6722 23000 2.9817
2.8065 10.0925 24000 2.9763
2.827 10.5132 25000 2.9726
2.8302 10.9338 26000 2.9610
2.7886 11.3542 27000 2.9628
2.8112 11.7748 28000 2.9584
2.7724 12.1952 29000 2.9601
2.7749 12.6158 30000 2.9572
2.7427 13.0362 31000 2.9548
2.7541 13.4568 32000 2.9505
2.763 13.8774 33000 2.9443
2.7283 14.2978 34000 2.9480
2.7383 14.7184 35000 2.9427
2.7111 15.1388 36000 2.9461
2.7177 15.5594 37000 2.9415
2.7221 15.9801 38000 2.9372
2.6956 16.4004 39000 2.9407
2.7097 16.8211 40000 2.9359
2.6858 17.2414 41000 2.9396
2.686 17.6621 42000 2.9358
2.6683 18.0824 43000 2.9378
2.6763 18.5031 44000 2.9361
2.6785 18.9237 45000 2.9346
2.6581 19.3441 46000 2.9354
2.6601 19.7647 47000 2.9352

Framework versions

  • Transformers 4.54.0
  • Pytorch 2.10.0+cu128
  • Datasets 3.2.0
  • Tokenizers 0.21.4
Downloads last month
28
Safetensors
Model size
97.8M params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support