train_rte_42_1774791065

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the rte dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2508	0.2527	71	0.1407	105024
0.1769	0.5053	142	0.1558	209536
0.1924	0.7580	213	0.1600	312576
0.1956	1.0107	284	0.1684	414040
0.1589	1.2633	355	0.1601	517656
0.1947	1.5160	426	0.1815	624344
0.1825	1.7687	497	0.1647	725656
0.1568	2.0214	568	0.1555	821416
0.1597	2.2740	639	0.1567	926760
0.1431	2.5267	710	0.1639	1025320
0.1986	2.7794	781	0.1541	1128104
0.137	3.0320	852	0.1852	1229440
0.1422	3.2847	923	0.1646	1332544
0.0911	3.5374	994	0.1804	1438336
0.1203	3.7900	1065	0.1771	1539072
0.0551	4.0427	1136	0.1983	1642696
0.0577	4.2954	1207	0.3402	1743624
0.0319	4.5480	1278	0.3532	1849416
0.0846	4.8007	1349	0.3423	1954568

Safetensors

Model size

1B params

Tensor type

F32

Base model

Finetuned

this model