train_boolq_42_1774791063

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the boolq dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3397	0.2507	266	0.3339	618432
0.3515	0.5014	532	0.3600	1225408
0.3116	0.7521	798	0.3553	1851072
0.3267	1.0028	1064	0.3294	2475808
0.3519	1.2535	1330	0.3309	3091552
0.3512	1.5042	1596	0.3332	3699104
0.3549	1.7549	1862	0.3334	4324256
0.3755	2.0057	2128	0.3262	4940992
0.3193	2.2564	2394	0.3295	5558144
0.37	2.5071	2660	0.3561	6183872
0.2998	2.7578	2926	0.3229	6806208
0.3151	3.0085	3192	0.3395	7421856
0.2325	3.2592	3458	0.3571	8043744
0.2696	3.5099	3724	0.3871	8660768
0.2628	3.7606	3990	0.3420	9286304
0.2222	4.0113	4256	0.3638	9894624
0.1352	4.2620	4522	0.5489	10512416
0.1785	4.5127	4788	0.5068	11115040
0.2304	4.7634	5054	0.5038	11736672

Safetensors

Model size

1B params

Tensor type

F32

Base model

Finetuned

this model