train_rte_42_1774791065

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the rte dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1407
  • Num Input Tokens Seen: 2035272

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.2508 0.2527 71 0.1407 105024
0.1769 0.5053 142 0.1558 209536
0.1924 0.7580 213 0.1600 312576
0.1956 1.0107 284 0.1684 414040
0.1589 1.2633 355 0.1601 517656
0.1947 1.5160 426 0.1815 624344
0.1825 1.7687 497 0.1647 725656
0.1568 2.0214 568 0.1555 821416
0.1597 2.2740 639 0.1567 926760
0.1431 2.5267 710 0.1639 1025320
0.1986 2.7794 781 0.1541 1128104
0.137 3.0320 852 0.1852 1229440
0.1422 3.2847 923 0.1646 1332544
0.0911 3.5374 994 0.1804 1438336
0.1203 3.7900 1065 0.1771 1539072
0.0551 4.0427 1136 0.1983 1642696
0.0577 4.2954 1207 0.3402 1743624
0.0319 4.5480 1278 0.3532 1849416
0.0846 4.8007 1349 0.3423 1954568

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
181
Safetensors
Model size
1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_rte_42_1774791065

Finetuned
(1595)
this model