train_mrpc_42_1774791061

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the mrpc dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1740
  • Num Input Tokens Seen: 1780000

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.1681 0.2518 104 0.1740 89600
0.2715 0.5036 208 0.2312 178688
0.2276 0.7554 312 0.2285 267968
0.5572 1.0073 416 0.2625 357488
0.1881 1.2591 520 0.1977 446896
0.1809 1.5109 624 0.1926 536176
0.1949 1.7627 728 0.1982 626992
0.256 2.0145 832 0.1935 716344
0.1601 2.2663 936 0.3867 806712
0.1768 2.5182 1040 0.1944 895736
0.1964 2.7700 1144 0.1932 985592
0.1436 3.0218 1248 0.2053 1074624
0.2252 3.2736 1352 0.2092 1164544
0.1328 3.5254 1456 0.3492 1253248
0.1842 3.7772 1560 0.2190 1344000
0.0897 4.0291 1664 0.2532 1432880
0.0337 4.2809 1768 0.4315 1522544
0.126 4.5327 1872 0.4220 1611760
0.0336 4.7845 1976 0.4348 1702832

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
209
Safetensors
Model size
1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_mrpc_42_1774791061

Finetuned
(1596)
this model