opt-babylm1-randomremoval_seed-42_5e-6
This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 2.9395
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 32
- eval_batch_size: 64
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 256
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 20.0
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 5.2712 | 0.4236 | 1000 | 5.2297 |
| 4.7125 | 0.8472 | 2000 | 4.7089 |
| 4.2105 | 1.2707 | 3000 | 4.2034 |
| 3.8531 | 1.6943 | 4000 | 3.8440 |
| 3.5679 | 2.1178 | 5000 | 3.5593 |
| 3.408 | 2.5414 | 6000 | 3.4013 |
| 3.3092 | 2.9649 | 7000 | 3.3081 |
| 3.2229 | 3.3884 | 8000 | 3.2421 |
| 3.1825 | 3.8120 | 9000 | 3.1910 |
| 3.1175 | 4.2355 | 10000 | 3.1605 |
| 3.0947 | 4.6591 | 11000 | 3.1276 |
| 3.024 | 5.0826 | 12000 | 3.1013 |
| 3.0297 | 5.5062 | 13000 | 3.0816 |
| 3.0068 | 5.9298 | 14000 | 3.0617 |
| 2.9613 | 6.3533 | 15000 | 3.0500 |
| 2.9614 | 6.7769 | 16000 | 3.0341 |
| 2.9233 | 7.2004 | 17000 | 3.0316 |
| 2.9264 | 7.6240 | 18000 | 3.0157 |
| 2.8709 | 8.0474 | 19000 | 3.0090 |
| 2.8893 | 8.4710 | 20000 | 3.0024 |
| 2.8951 | 8.8946 | 21000 | 2.9894 |
| 2.8582 | 9.3181 | 22000 | 2.9951 |
| 2.8656 | 9.7417 | 23000 | 2.9820 |
| 2.8161 | 10.1652 | 24000 | 2.9869 |
| 2.8393 | 10.5888 | 25000 | 2.9735 |
| 2.8198 | 11.0123 | 26000 | 2.9739 |
| 2.8067 | 11.4359 | 27000 | 2.9667 |
| 2.8062 | 11.8595 | 28000 | 2.9608 |
| 2.7781 | 12.2830 | 29000 | 2.9631 |
| 2.7868 | 12.7066 | 30000 | 2.9567 |
| 2.7511 | 13.1300 | 31000 | 2.9622 |
| 2.7625 | 13.5536 | 32000 | 2.9537 |
| 2.7698 | 13.9772 | 33000 | 2.9473 |
| 2.7375 | 14.4007 | 34000 | 2.9522 |
| 2.7515 | 14.8243 | 35000 | 2.9454 |
| 2.7198 | 15.2478 | 36000 | 2.9490 |
| 2.7254 | 15.6714 | 37000 | 2.9462 |
| 2.7019 | 16.0949 | 38000 | 2.9481 |
| 2.7064 | 16.5185 | 39000 | 2.9441 |
| 2.7156 | 16.9421 | 40000 | 2.9404 |
| 2.6925 | 17.3656 | 41000 | 2.9435 |
| 2.6967 | 17.7892 | 42000 | 2.9396 |
| 2.6798 | 18.2126 | 43000 | 2.9414 |
| 2.6831 | 18.6362 | 44000 | 2.9400 |
| 2.6751 | 19.0597 | 45000 | 2.9399 |
| 2.6752 | 19.4833 | 46000 | 2.9398 |
| 2.6779 | 19.9069 | 47000 | 2.9395 |
Framework versions
- Transformers 4.54.0
- Pytorch 2.10.0+cu128
- Datasets 3.2.0
- Tokenizers 0.21.4
- Downloads last month
- 45