train_cola_42_1774791066

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.257	0.2505	241	0.2653	97664
0.2465	0.5010	482	0.2394	194560
0.2246	0.7516	723	0.2675	291712
0.2126	1.0021	964	0.2090	387464
0.2112	1.2526	1205	0.2047	485192
0.2259	1.5031	1446	0.2002	581704
0.0688	1.7536	1687	0.1908	677576
0.1317	2.0042	1928	0.2061	775312
0.1631	2.2547	2169	0.1976	873104
0.1441	2.5052	2410	0.1862	969360
0.0995	2.7557	2651	0.1822	1065232
0.1275	3.0062	2892	0.1780	1162016
0.2642	3.2568	3133	0.1875	1259168
0.1987	3.5073	3374	0.1781	1355552
0.2233	3.7578	3615	0.1860	1453088
0.1377	4.0083	3856	0.1792	1549360
0.0611	4.2588	4097	0.1857	1645808
0.1136	4.5094	4338	0.1867	1742960
0.1672	4.7599	4579	0.1868	1839344

Base model

Adapter

(599)

this model