train_qnli_42_1773765556

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the qnli dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1284	0.25	2946	0.1156	2823680
0.3564	0.5	5892	0.1599	5652800
0.1437	0.75	8838	0.1423	8482944
0.1379	1.0	11784	0.1355	11312256
0.1616	1.25	14730	0.1326	14142784
0.1161	1.5	17676	0.1362	16969472
0.1136	1.75	20622	0.1173	19782400
0.1123	2.0	23568	0.1118	22629440
0.0635	2.25	26514	0.1285	25460032
0.0991	2.5	29460	0.1183	28284608
0.0864	2.75	32406	0.1215	31130432
0.0972	3.0	35352	0.1074	33947392
0.0982	3.25	38298	0.1493	36783040
0.0146	3.5	41244	0.1507	39604544
0.0408	3.75	44190	0.1402	42421440
0.0259	4.0	47136	0.1417	45265344
0.0213	4.25	50082	0.2081	48098944
0.014	4.5	53028	0.1985	50906176
0.0426	4.75	55974	0.1994	53746240
0.0186	5.0	58920	0.1999	56574368

Safetensors

Model size

1B params

Tensor type

F32

Base model

Finetuned

this model