train_qqp_42_1773765557

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the qqp dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1833	0.2500	10234	0.1541	6910656
0.176	0.5000	20468	0.1903	13780928
0.171	0.7501	30702	0.1805	20680640
0.1392	1.0001	40936	0.1833	27591776
0.1229	1.2501	51170	0.1731	34492320
0.1602	1.5001	61404	0.1609	41393504
0.1392	1.7501	71638	0.1587	48287456
0.148	2.0001	81872	0.1649	55178600
0.1611	2.2502	92106	0.1666	62093992
0.069	2.5002	102340	0.1668	68988456
0.2083	2.7502	112574	0.1612	75874280
0.1196	3.0002	122808	0.1590	82772304
0.1143	3.2502	133042	0.2080	89675984
0.1297	3.5003	143276	0.1980	96560720
0.093	3.7503	153510	0.2051	103465808
0.0722	4.0003	163744	0.2060	110357352
0.137	4.2503	173978	0.2432	117230952
0.1015	4.5003	184212	0.2535	124100264
0.0344	4.7503	194446	0.2454	131030440

Safetensors

Model size

1B params

Tensor type

F32

Base model

Finetuned

this model