train_cola_42_1774791067

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2517
  • Num Input Tokens Seen: 1932608

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.2682 0.2505 241 0.3847 97664
0.3383 0.5010 482 0.4081 194560
0.2962 0.7516 723 0.2960 291712
0.2807 1.0021 964 0.2739 387464
0.2836 1.2526 1205 0.2581 485192
0.2936 1.5031 1446 0.2570 581704
0.2705 1.7536 1687 0.2560 677576
0.2243 2.0042 1928 0.2575 775312
0.2477 2.2547 2169 0.2924 873104
0.2379 2.5052 2410 0.2577 969360
0.2934 2.7557 2651 0.2561 1065232
0.2209 3.0062 2892 0.2571 1162016
0.2647 3.2568 3133 0.2563 1259168
0.2795 3.5073 3374 0.2642 1355552
0.2751 3.7578 3615 0.2587 1453088
0.279 4.0083 3856 0.2559 1549360
0.2511 4.2588 4097 0.2517 1645808
0.2709 4.5094 4338 0.2577 1742960
0.2582 4.7599 4579 0.2605 1839344

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
223
Safetensors
Model size
1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_cola_42_1774791067

Finetuned
(1596)
this model