opt-babylm1_seed-1024_5e-6

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.9422

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 32
  • eval_batch_size: 64
  • seed: 1024
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 20.0

Training results

Training Loss Epoch Step Validation Loss
5.2297 0.4206 1000 5.1826
4.7637 0.8413 2000 4.7401
4.299 1.2616 3000 4.2874
3.9607 1.6823 4000 3.9489
3.6494 2.1026 5000 3.6517
3.4719 2.5233 6000 3.4667
3.3668 2.9439 7000 3.3572
3.2732 3.3643 8000 3.2821
3.2214 3.7849 9000 3.2332
3.1418 4.2053 10000 3.1875
3.1246 4.6259 11000 3.1543
3.0548 5.0463 12000 3.1296
3.051 5.4669 13000 3.1023
3.0457 5.8875 14000 3.0826
2.9936 6.3079 15000 3.0715
2.9888 6.7285 16000 3.0553
2.9411 7.1489 17000 3.0484
2.9517 7.5695 18000 3.0344
2.9439 7.9902 19000 3.0212
2.9073 8.4105 20000 3.0138
2.9174 8.8312 21000 3.0022
2.8815 9.2515 22000 3.0026
2.8825 9.6722 23000 2.9974
2.8312 10.0925 24000 2.9940
2.8472 10.5132 25000 2.9875
2.8536 10.9338 26000 2.9748
2.8264 11.3542 27000 2.9771
2.8321 11.7748 28000 2.9682
2.7887 12.1952 29000 2.9709
2.7964 12.6158 30000 2.9657
2.7693 13.0362 31000 2.9662
2.7821 13.4568 32000 2.9598
2.7789 13.8774 33000 2.9547
2.7499 14.2978 34000 2.9573
2.7644 14.7184 35000 2.9529
2.7347 15.1388 36000 2.9533
2.736 15.5594 37000 2.9505
2.7476 15.9801 38000 2.9454
2.7259 16.4004 39000 2.9481
2.7222 16.8211 40000 2.9446
2.7054 17.2414 41000 2.9468
2.7133 17.6621 42000 2.9437
2.6935 18.0824 43000 2.9455
2.6976 18.5031 44000 2.9438
2.7072 18.9237 45000 2.9419
2.6934 19.3441 46000 2.9426
2.6919 19.7647 47000 2.9422

Framework versions

  • Transformers 4.54.0
  • Pytorch 2.10.0+cu128
  • Datasets 3.2.0
  • Tokenizers 0.21.4
Downloads last month
28
Safetensors
Model size
97.8M params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support