opt-babylm1-randomremoval_seed-1024_5e-6

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.9421

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 32
  • eval_batch_size: 64
  • seed: 1024
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 20.0

Training results

Training Loss Epoch Step Validation Loss
5.2575 0.4236 1000 5.2274
4.6932 0.8472 2000 4.6903
4.2391 1.2707 3000 4.2385
3.872 1.6943 4000 3.8540
3.5682 2.1178 5000 3.5667
3.4179 2.5414 6000 3.4043
3.3127 2.9649 7000 3.3098
3.2297 3.3884 8000 3.2466
3.1928 3.8120 9000 3.1968
3.1198 4.2355 10000 3.1613
3.102 4.6591 11000 3.1299
3.0285 5.0826 12000 3.1074
3.0355 5.5062 13000 3.0867
3.0242 5.9298 14000 3.0685
2.9815 6.3533 15000 3.0631
2.9801 6.7769 16000 3.0397
2.932 7.2004 17000 3.0326
2.9292 7.6240 18000 3.0192
2.8848 8.0474 19000 3.0173
2.8934 8.4710 20000 3.0084
2.8941 8.8946 21000 2.9943
2.8572 9.3181 22000 2.9946
2.8679 9.7417 23000 2.9851
2.8201 10.1652 24000 2.9837
2.8429 10.5888 25000 2.9795
2.8276 11.0123 26000 2.9780
2.8103 11.4359 27000 2.9710
2.8168 11.8595 28000 2.9672
2.788 12.2830 29000 2.9668
2.7959 12.7066 30000 2.9622
2.7593 13.1300 31000 2.9603
2.7712 13.5536 32000 2.9570
2.7757 13.9772 33000 2.9500
2.7449 14.4007 34000 2.9553
2.7557 14.8243 35000 2.9492
2.7215 15.2478 36000 2.9536
2.7413 15.6714 37000 2.9473
2.7039 16.0949 38000 2.9495
2.7163 16.5185 39000 2.9461
2.7198 16.9421 40000 2.9426
2.6968 17.3656 41000 2.9458
2.7002 17.7892 42000 2.9421
2.6841 18.2126 43000 2.9447
2.6839 18.6362 44000 2.9430
2.6723 19.0597 45000 2.9432
2.6754 19.4833 46000 2.9427
2.6792 19.9069 47000 2.9421

Framework versions

  • Transformers 4.54.0
  • Pytorch 2.10.0+cu128
  • Datasets 3.2.0
  • Tokenizers 0.21.4
Downloads last month
38
Safetensors
Model size
97.8M params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support