opt-babylm1_seed-42_5e-6
This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 2.9386
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 32
- eval_batch_size: 64
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 256
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 20.0
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 5.3454 | 0.4206 | 1000 | 5.2974 |
| 4.7218 | 0.8413 | 2000 | 4.7145 |
| 4.2551 | 1.2616 | 3000 | 4.2445 |
| 3.8988 | 1.6823 | 4000 | 3.8906 |
| 3.5861 | 2.1026 | 5000 | 3.5887 |
| 3.4352 | 2.5233 | 6000 | 3.4222 |
| 3.3183 | 2.9439 | 7000 | 3.3194 |
| 3.2419 | 3.3643 | 8000 | 3.2551 |
| 3.19 | 3.7849 | 9000 | 3.2046 |
| 3.1198 | 4.2053 | 10000 | 3.1657 |
| 3.1022 | 4.6259 | 11000 | 3.1361 |
| 3.0287 | 5.0463 | 12000 | 3.1071 |
| 3.0287 | 5.4669 | 13000 | 3.0858 |
| 3.0104 | 5.8875 | 14000 | 3.0664 |
| 2.9693 | 6.3079 | 15000 | 3.0543 |
| 2.9655 | 6.7285 | 16000 | 3.0415 |
| 2.9171 | 7.1489 | 17000 | 3.0311 |
| 2.9326 | 7.5695 | 18000 | 3.0213 |
| 2.9255 | 7.9902 | 19000 | 3.0083 |
| 2.8958 | 8.4105 | 20000 | 3.0031 |
| 2.8911 | 8.8312 | 21000 | 2.9939 |
| 2.8555 | 9.2515 | 22000 | 2.9919 |
| 2.8593 | 9.6722 | 23000 | 2.9841 |
| 2.815 | 10.0925 | 24000 | 2.9802 |
| 2.8263 | 10.5132 | 25000 | 2.9754 |
| 2.8338 | 10.9338 | 26000 | 2.9659 |
| 2.8013 | 11.3542 | 27000 | 2.9701 |
| 2.8076 | 11.7748 | 28000 | 2.9636 |
| 2.7792 | 12.1952 | 29000 | 2.9647 |
| 2.7825 | 12.6158 | 30000 | 2.9587 |
| 2.7472 | 13.0362 | 31000 | 2.9578 |
| 2.7633 | 13.4568 | 32000 | 2.9546 |
| 2.7613 | 13.8774 | 33000 | 2.9482 |
| 2.7369 | 14.2978 | 34000 | 2.9511 |
| 2.7461 | 14.7184 | 35000 | 2.9460 |
| 2.72 | 15.1388 | 36000 | 2.9488 |
| 2.7303 | 15.5594 | 37000 | 2.9444 |
| 2.7309 | 15.9801 | 38000 | 2.9412 |
| 2.7003 | 16.4004 | 39000 | 2.9435 |
| 2.713 | 16.8211 | 40000 | 2.9393 |
| 2.6812 | 17.2414 | 41000 | 2.9434 |
| 2.6914 | 17.6621 | 42000 | 2.9400 |
| 2.6731 | 18.0824 | 43000 | 2.9412 |
| 2.6814 | 18.5031 | 44000 | 2.9400 |
| 2.6883 | 18.9237 | 45000 | 2.9379 |
| 2.6706 | 19.3441 | 46000 | 2.9386 |
| 2.6699 | 19.7647 | 47000 | 2.9386 |
Framework versions
- Transformers 4.54.0
- Pytorch 2.10.0+cu128
- Datasets 3.2.0
- Tokenizers 0.21.4
- Downloads last month
- 87