opt-babylm1-randomremoval_seed-42_5e-6

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.9395

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 32
  • eval_batch_size: 64
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 20.0

Training results

Training Loss Epoch Step Validation Loss
5.2712 0.4236 1000 5.2297
4.7125 0.8472 2000 4.7089
4.2105 1.2707 3000 4.2034
3.8531 1.6943 4000 3.8440
3.5679 2.1178 5000 3.5593
3.408 2.5414 6000 3.4013
3.3092 2.9649 7000 3.3081
3.2229 3.3884 8000 3.2421
3.1825 3.8120 9000 3.1910
3.1175 4.2355 10000 3.1605
3.0947 4.6591 11000 3.1276
3.024 5.0826 12000 3.1013
3.0297 5.5062 13000 3.0816
3.0068 5.9298 14000 3.0617
2.9613 6.3533 15000 3.0500
2.9614 6.7769 16000 3.0341
2.9233 7.2004 17000 3.0316
2.9264 7.6240 18000 3.0157
2.8709 8.0474 19000 3.0090
2.8893 8.4710 20000 3.0024
2.8951 8.8946 21000 2.9894
2.8582 9.3181 22000 2.9951
2.8656 9.7417 23000 2.9820
2.8161 10.1652 24000 2.9869
2.8393 10.5888 25000 2.9735
2.8198 11.0123 26000 2.9739
2.8067 11.4359 27000 2.9667
2.8062 11.8595 28000 2.9608
2.7781 12.2830 29000 2.9631
2.7868 12.7066 30000 2.9567
2.7511 13.1300 31000 2.9622
2.7625 13.5536 32000 2.9537
2.7698 13.9772 33000 2.9473
2.7375 14.4007 34000 2.9522
2.7515 14.8243 35000 2.9454
2.7198 15.2478 36000 2.9490
2.7254 15.6714 37000 2.9462
2.7019 16.0949 38000 2.9481
2.7064 16.5185 39000 2.9441
2.7156 16.9421 40000 2.9404
2.6925 17.3656 41000 2.9435
2.6967 17.7892 42000 2.9396
2.6798 18.2126 43000 2.9414
2.6831 18.6362 44000 2.9400
2.6751 19.0597 45000 2.9399
2.6752 19.4833 46000 2.9398
2.6779 19.9069 47000 2.9395

Framework versions

  • Transformers 4.54.0
  • Pytorch 2.10.0+cu128
  • Datasets 3.2.0
  • Tokenizers 0.21.4
Downloads last month
45
Safetensors
Model size
97.8M params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support