olmo3-lean-sft

This model is a fine-tuned version of allenai/OLMo-3-1B on the lean_sft dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0655

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • total_eval_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss
0.2558 0.0194 100 0.2367
0.1315 0.0388 200 0.1327
0.11 0.0582 300 0.1104
0.1012 0.0775 400 0.1001
0.0926 0.0969 500 0.0948
0.092 0.1163 600 0.0913
0.0866 0.1357 700 0.0884
0.0846 0.1551 800 0.0864
0.0843 0.1745 900 0.0848
0.08 0.1939 1000 0.0834
0.0813 0.2132 1100 0.0823
0.0801 0.2326 1200 0.0812
0.0814 0.2520 1300 0.0801
0.0781 0.2714 1400 0.0792
0.0774 0.2908 1500 0.0785
0.0754 0.3102 1600 0.0777
0.078 0.3296 1700 0.0771
0.0758 0.3489 1800 0.0763
0.074 0.3683 1900 0.0758
0.0731 0.3877 2000 0.0751
0.0739 0.4071 2100 0.0746
0.0736 0.4265 2200 0.0740
0.0739 0.4459 2300 0.0737
0.0692 0.4653 2400 0.0731
0.0712 0.4846 2500 0.0726
0.0714 0.5040 2600 0.0721
0.0698 0.5234 2700 0.0718
0.0718 0.5428 2800 0.0712
0.0691 0.5622 2900 0.0709
0.0693 0.5816 3000 0.0705
0.0691 0.6009 3100 0.0700
0.0707 0.6203 3200 0.0696
0.0693 0.6397 3300 0.0693
0.0694 0.6591 3400 0.0689
0.0687 0.6785 3500 0.0686
0.0659 0.6979 3600 0.0682
0.0658 0.7173 3700 0.0678
0.0677 0.7366 3800 0.0676
0.0671 0.7560 3900 0.0672
0.0662 0.7754 4000 0.0670
0.0675 0.7948 4100 0.0667
0.0665 0.8142 4200 0.0665
0.0644 0.8336 4300 0.0662
0.0643 0.8530 4400 0.0661
0.0636 0.8723 4500 0.0659
0.0667 0.8917 4600 0.0658
0.0644 0.9111 4700 0.0657
0.0654 0.9305 4800 0.0656
0.0662 0.9499 4900 0.0655
0.0651 0.9693 5000 0.0655
0.0642 0.9887 5100 0.0655

Framework versions

  • Transformers 4.57.3
  • Pytorch 2.9.0+cu129
  • Datasets 4.0.0
  • Tokenizers 0.22.1
Downloads last month
264
Safetensors
Model size
528k params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support