train_boolq_42_1774791062

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the boolq dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3271	0.2507	266	0.3127	618432
0.2623	0.5014	532	0.2774	1225408
0.2987	0.7521	798	0.2546	1851072
0.3082	1.0028	1064	0.2344	2475808
0.2513	1.2535	1330	0.2254	3091552
0.2693	1.5042	1596	0.2147	3699104
0.1687	1.7549	1862	0.2256	4324256
0.2361	2.0057	2128	0.2062	4940992
0.2205	2.2564	2394	0.2048	5558144
0.2933	2.5071	2660	0.2098	6183872
0.1911	2.7578	2926	0.2094	6806208
0.2383	3.0085	3192	0.2040	7421856
0.172	3.2592	3458	0.1998	8043744
0.2434	3.5099	3724	0.2094	8660768
0.1925	3.7606	3990	0.2062	9286304
0.2996	4.0113	4256	0.2077	9894624
0.186	4.2620	4522	0.2098	10512416
0.1804	4.5127	4788	0.2097	11115040
0.1211	4.7634	5054	0.2100	11736672

Base model

Adapter

(598)

this model