opt-babylm1-randomremoval_seed-1024_5e-6
This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 2.9421
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 32
- eval_batch_size: 64
- seed: 1024
- gradient_accumulation_steps: 8
- total_train_batch_size: 256
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 20.0
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 5.2575 | 0.4236 | 1000 | 5.2274 |
| 4.6932 | 0.8472 | 2000 | 4.6903 |
| 4.2391 | 1.2707 | 3000 | 4.2385 |
| 3.872 | 1.6943 | 4000 | 3.8540 |
| 3.5682 | 2.1178 | 5000 | 3.5667 |
| 3.4179 | 2.5414 | 6000 | 3.4043 |
| 3.3127 | 2.9649 | 7000 | 3.3098 |
| 3.2297 | 3.3884 | 8000 | 3.2466 |
| 3.1928 | 3.8120 | 9000 | 3.1968 |
| 3.1198 | 4.2355 | 10000 | 3.1613 |
| 3.102 | 4.6591 | 11000 | 3.1299 |
| 3.0285 | 5.0826 | 12000 | 3.1074 |
| 3.0355 | 5.5062 | 13000 | 3.0867 |
| 3.0242 | 5.9298 | 14000 | 3.0685 |
| 2.9815 | 6.3533 | 15000 | 3.0631 |
| 2.9801 | 6.7769 | 16000 | 3.0397 |
| 2.932 | 7.2004 | 17000 | 3.0326 |
| 2.9292 | 7.6240 | 18000 | 3.0192 |
| 2.8848 | 8.0474 | 19000 | 3.0173 |
| 2.8934 | 8.4710 | 20000 | 3.0084 |
| 2.8941 | 8.8946 | 21000 | 2.9943 |
| 2.8572 | 9.3181 | 22000 | 2.9946 |
| 2.8679 | 9.7417 | 23000 | 2.9851 |
| 2.8201 | 10.1652 | 24000 | 2.9837 |
| 2.8429 | 10.5888 | 25000 | 2.9795 |
| 2.8276 | 11.0123 | 26000 | 2.9780 |
| 2.8103 | 11.4359 | 27000 | 2.9710 |
| 2.8168 | 11.8595 | 28000 | 2.9672 |
| 2.788 | 12.2830 | 29000 | 2.9668 |
| 2.7959 | 12.7066 | 30000 | 2.9622 |
| 2.7593 | 13.1300 | 31000 | 2.9603 |
| 2.7712 | 13.5536 | 32000 | 2.9570 |
| 2.7757 | 13.9772 | 33000 | 2.9500 |
| 2.7449 | 14.4007 | 34000 | 2.9553 |
| 2.7557 | 14.8243 | 35000 | 2.9492 |
| 2.7215 | 15.2478 | 36000 | 2.9536 |
| 2.7413 | 15.6714 | 37000 | 2.9473 |
| 2.7039 | 16.0949 | 38000 | 2.9495 |
| 2.7163 | 16.5185 | 39000 | 2.9461 |
| 2.7198 | 16.9421 | 40000 | 2.9426 |
| 2.6968 | 17.3656 | 41000 | 2.9458 |
| 2.7002 | 17.7892 | 42000 | 2.9421 |
| 2.6841 | 18.2126 | 43000 | 2.9447 |
| 2.6839 | 18.6362 | 44000 | 2.9430 |
| 2.6723 | 19.0597 | 45000 | 2.9432 |
| 2.6754 | 19.4833 | 46000 | 2.9427 |
| 2.6792 | 19.9069 | 47000 | 2.9421 |
Framework versions
- Transformers 4.54.0
- Pytorch 2.10.0+cu128
- Datasets 3.2.0
- Tokenizers 0.21.4
- Downloads last month
- 38