train_mrpc_42_1774791061

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the mrpc dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1681	0.2518	104	0.1740	89600
0.2715	0.5036	208	0.2312	178688
0.2276	0.7554	312	0.2285	267968
0.5572	1.0073	416	0.2625	357488
0.1881	1.2591	520	0.1977	446896
0.1809	1.5109	624	0.1926	536176
0.1949	1.7627	728	0.1982	626992
0.256	2.0145	832	0.1935	716344
0.1601	2.2663	936	0.3867	806712
0.1768	2.5182	1040	0.1944	895736
0.1964	2.7700	1144	0.1932	985592
0.1436	3.0218	1248	0.2053	1074624
0.2252	3.2736	1352	0.2092	1164544
0.1328	3.5254	1456	0.3492	1253248
0.1842	3.7772	1560	0.2190	1344000
0.0897	4.0291	1664	0.2532	1432880
0.0337	4.2809	1768	0.4315	1522544
0.126	4.5327	1872	0.4220	1611760
0.0336	4.7845	1976	0.4348	1702832

Safetensors

Model size

1B params

Tensor type

F32

Base model

Finetuned

this model