opt-babylm1_seed-211_5e-6
This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 2.9352
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 32
- eval_batch_size: 64
- seed: 211
- gradient_accumulation_steps: 8
- total_train_batch_size: 256
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 20.0
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 5.2657 | 0.4206 | 1000 | 5.2429 |
| 4.7058 | 0.8413 | 2000 | 4.7120 |
| 4.2262 | 1.2616 | 3000 | 4.2310 |
| 3.8735 | 1.6823 | 4000 | 3.8682 |
| 3.5661 | 2.1026 | 5000 | 3.5738 |
| 3.4174 | 2.5233 | 6000 | 3.4080 |
| 3.3118 | 2.9439 | 7000 | 3.3058 |
| 3.2284 | 3.3643 | 8000 | 3.2433 |
| 3.18 | 3.7849 | 9000 | 3.1947 |
| 3.1179 | 4.2053 | 10000 | 3.1561 |
| 3.0937 | 4.6259 | 11000 | 3.1282 |
| 3.0223 | 5.0463 | 12000 | 3.1030 |
| 3.0283 | 5.4669 | 13000 | 3.0828 |
| 3.0162 | 5.8875 | 14000 | 3.0614 |
| 2.9568 | 6.3079 | 15000 | 3.0483 |
| 2.965 | 6.7285 | 16000 | 3.0331 |
| 2.9084 | 7.1489 | 17000 | 3.0258 |
| 2.9168 | 7.5695 | 18000 | 3.0127 |
| 2.9277 | 7.9902 | 19000 | 3.0023 |
| 2.8789 | 8.4105 | 20000 | 2.9999 |
| 2.887 | 8.8312 | 21000 | 2.9910 |
| 2.85 | 9.2515 | 22000 | 2.9881 |
| 2.8541 | 9.6722 | 23000 | 2.9817 |
| 2.8065 | 10.0925 | 24000 | 2.9763 |
| 2.827 | 10.5132 | 25000 | 2.9726 |
| 2.8302 | 10.9338 | 26000 | 2.9610 |
| 2.7886 | 11.3542 | 27000 | 2.9628 |
| 2.8112 | 11.7748 | 28000 | 2.9584 |
| 2.7724 | 12.1952 | 29000 | 2.9601 |
| 2.7749 | 12.6158 | 30000 | 2.9572 |
| 2.7427 | 13.0362 | 31000 | 2.9548 |
| 2.7541 | 13.4568 | 32000 | 2.9505 |
| 2.763 | 13.8774 | 33000 | 2.9443 |
| 2.7283 | 14.2978 | 34000 | 2.9480 |
| 2.7383 | 14.7184 | 35000 | 2.9427 |
| 2.7111 | 15.1388 | 36000 | 2.9461 |
| 2.7177 | 15.5594 | 37000 | 2.9415 |
| 2.7221 | 15.9801 | 38000 | 2.9372 |
| 2.6956 | 16.4004 | 39000 | 2.9407 |
| 2.7097 | 16.8211 | 40000 | 2.9359 |
| 2.6858 | 17.2414 | 41000 | 2.9396 |
| 2.686 | 17.6621 | 42000 | 2.9358 |
| 2.6683 | 18.0824 | 43000 | 2.9378 |
| 2.6763 | 18.5031 | 44000 | 2.9361 |
| 2.6785 | 18.9237 | 45000 | 2.9346 |
| 2.6581 | 19.3441 | 46000 | 2.9354 |
| 2.6601 | 19.7647 | 47000 | 2.9352 |
Framework versions
- Transformers 4.54.0
- Pytorch 2.10.0+cu128
- Datasets 3.2.0
- Tokenizers 0.21.4
- Downloads last month
- 28