opt-babylm1_seed-42_1e-6
This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 2.9858
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 32
- eval_batch_size: 64
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 256
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 20.0
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 5.964 | 0.4206 | 1000 | 5.8746 |
| 4.9145 | 0.8413 | 2000 | 4.8984 |
| 4.5344 | 1.2616 | 3000 | 4.5433 |
| 4.2322 | 1.6823 | 4000 | 4.2203 |
| 3.9694 | 2.1026 | 5000 | 3.9752 |
| 3.7958 | 2.5233 | 6000 | 3.7851 |
| 3.6053 | 2.9439 | 7000 | 3.5977 |
| 3.4888 | 3.3643 | 8000 | 3.4823 |
| 3.404 | 3.7849 | 9000 | 3.3924 |
| 3.3238 | 4.2053 | 10000 | 3.3380 |
| 3.2816 | 4.6259 | 11000 | 3.2862 |
| 3.2171 | 5.0463 | 12000 | 3.2522 |
| 3.1938 | 5.4669 | 13000 | 3.2163 |
| 3.1599 | 5.8875 | 14000 | 3.1851 |
| 3.124 | 6.3079 | 15000 | 3.1696 |
| 3.1094 | 6.7285 | 16000 | 3.1485 |
| 3.0677 | 7.1489 | 17000 | 3.1319 |
| 3.0715 | 7.5695 | 18000 | 3.1178 |
| 3.0578 | 7.9902 | 19000 | 3.1009 |
| 3.0319 | 8.4105 | 20000 | 3.0907 |
| 3.0204 | 8.8312 | 21000 | 3.0804 |
| 2.9903 | 9.2515 | 22000 | 3.0694 |
| 2.9874 | 9.6722 | 23000 | 3.0618 |
| 2.9539 | 10.0925 | 24000 | 3.0564 |
| 2.9538 | 10.5132 | 25000 | 3.0468 |
| 2.9552 | 10.9338 | 26000 | 3.0397 |
| 2.9319 | 11.3542 | 27000 | 3.0366 |
| 2.9305 | 11.7748 | 28000 | 3.0280 |
| 2.9145 | 12.1952 | 29000 | 3.0254 |
| 2.9091 | 12.6158 | 30000 | 3.0211 |
| 2.8855 | 13.0362 | 31000 | 3.0164 |
| 2.8941 | 13.4568 | 32000 | 3.0127 |
| 2.886 | 13.8774 | 33000 | 3.0080 |
| 2.8712 | 14.2978 | 34000 | 3.0073 |
| 2.8764 | 14.7184 | 35000 | 3.0029 |
| 2.8622 | 15.1388 | 36000 | 3.0007 |
| 2.865 | 15.5594 | 37000 | 2.9975 |
| 2.862 | 15.9801 | 38000 | 2.9947 |
| 2.8394 | 16.4004 | 39000 | 2.9937 |
| 2.85 | 16.8211 | 40000 | 2.9913 |
| 2.8272 | 17.2414 | 41000 | 2.9909 |
| 2.8336 | 17.6621 | 42000 | 2.9888 |
| 2.8245 | 18.0824 | 43000 | 2.9882 |
| 2.8293 | 18.5031 | 44000 | 2.9874 |
| 2.8366 | 18.9237 | 45000 | 2.9864 |
| 2.8243 | 19.3441 | 46000 | 2.9861 |
| 2.8229 | 19.7647 | 47000 | 2.9858 |
Framework versions
- Transformers 4.54.0
- Pytorch 2.10.0+cu128
- Datasets 3.2.0
- Tokenizers 0.21.4
- Downloads last month
- 76