opt-babylm1_seed-42_1e-5

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.4634

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 32
  • eval_batch_size: 64
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 20.0

Training results

Training Loss Epoch Step Validation Loss
5.3691 0.4206 1000 5.3019
9.4778 0.8413 2000 7.7662
11.7536 1.2616 3000 10.9107
6.2443 1.6823 4000 5.9901
5.0938 2.1026 5000 5.0813
4.951 2.5233 6000 4.9423
4.8344 2.9439 7000 4.8479
4.7846 3.3643 8000 4.7772
4.7222 3.7849 9000 4.7218
4.6755 4.2053 10000 4.6585
4.621 4.6259 11000 4.5979
4.5495 5.0463 12000 4.5372
4.4998 5.4669 13000 4.4716
4.4136 5.8875 14000 4.4219
4.3616 6.3079 15000 4.3583
4.2967 6.7285 16000 4.2875
4.2367 7.1489 17000 4.2283
4.1903 7.5695 18000 4.1773
4.1288 7.9902 19000 4.1198
4.09 8.4105 20000 4.0740
4.042 8.8312 21000 4.0270
3.9811 9.2515 22000 3.9675
3.924 9.6722 23000 3.9094
3.8737 10.0925 24000 3.8639
3.833 10.5132 25000 3.8097
3.7976 10.9338 26000 3.7833
3.7641 11.3542 27000 3.7449
3.7239 11.7748 28000 3.7115
3.7045 12.1952 29000 3.6885
3.6655 12.6158 30000 3.6515
3.637 13.0362 31000 3.6258
3.6274 13.4568 32000 3.6069
3.5978 13.8774 33000 3.5860
3.5794 14.2978 34000 3.5686
3.5657 14.7184 35000 3.5510
3.5505 15.1388 36000 3.5370
3.5407 15.5594 37000 3.5238
3.5272 15.9801 38000 3.5150
3.4993 16.4004 39000 3.5013
3.5021 16.8211 40000 3.4901
3.48 17.2414 41000 3.4835
3.4798 17.6621 42000 3.4765
3.4723 18.0824 43000 3.4723
3.4706 18.5031 44000 3.4689
3.4776 18.9237 45000 3.4657
3.4685 19.3441 46000 3.4641
3.4625 19.7647 47000 3.4634

Framework versions

  • Transformers 4.54.0
  • Pytorch 2.10.0+cu128
  • Datasets 3.2.0
  • Tokenizers 0.21.4
Downloads last month
29
Safetensors
Model size
97.8M params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support