opt-babylm1-ntb_seed-42_5e-6

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.9430

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 32
  • eval_batch_size: 64
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 20.0

Training results

Training Loss Epoch Step Validation Loss
5.2628 0.4284 1000 5.2240
4.6939 0.8567 2000 4.7039
4.2394 1.2849 3000 4.2252
3.8775 1.7132 4000 3.8713
3.5715 2.1414 5000 3.5832
3.4254 2.5697 6000 3.4170
3.3181 2.9981 7000 3.3120
3.2325 3.4262 8000 3.2545
3.178 3.8546 9000 3.1972
3.1138 4.2827 10000 3.1649
3.0877 4.7111 11000 3.1289
3.029 5.1392 12000 3.1055
3.0266 5.5676 13000 3.0898
3.0196 5.9959 14000 3.0696
2.9691 6.4241 15000 3.0585
2.9671 6.8524 16000 3.0440
2.9281 7.2806 17000 3.0326
2.9242 7.7089 18000 3.0206
2.8772 8.1371 19000 3.0156
2.8874 8.5654 20000 3.0068
2.8931 8.9938 21000 2.9946
2.8553 9.4219 22000 2.9949
2.8518 9.8503 23000 2.9862
2.8246 10.2784 24000 2.9834
2.8336 10.7068 25000 2.9747
2.7922 11.1349 26000 2.9783
2.8071 11.5633 27000 2.9705
2.8117 11.9916 28000 2.9613
2.7766 12.4198 29000 2.9631
2.7804 12.8481 30000 2.9598
2.7576 13.2763 31000 2.9605
2.7614 13.7046 32000 2.9543
2.7294 14.1328 33000 2.9561
2.7397 14.5611 34000 2.9536
2.7472 14.9895 35000 2.9491
2.7139 15.4176 36000 2.9510
2.7276 15.8460 37000 2.9466
2.7032 16.2741 38000 2.9491
2.714 16.7025 39000 2.9455
2.6882 17.1306 40000 2.9476
2.693 17.5590 41000 2.9455
2.6907 17.9874 42000 2.9425
2.6765 18.4155 43000 2.9444
2.681 18.8439 44000 2.9423
2.6668 19.2720 45000 2.9433
2.6672 19.7004 46000 2.9430

Framework versions

  • Transformers 4.54.0
  • Pytorch 2.10.0+cu128
  • Datasets 3.2.0
  • Tokenizers 0.21.4
Downloads last month
63
Safetensors
Model size
97.8M params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support