train_sst2_42_1773765558

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3401	0.2501	1895	0.2325	930944
0.2433	0.5002	3790	0.1713	1864128
0.1159	0.7503	5685	0.1951	2790656
0.0509	1.0004	7580	0.1712	3726464
0.1346	1.2505	9475	0.1622	4658240
0.1178	1.5006	11370	0.1559	5591680
0.2638	1.7507	13265	0.1337	6528448
0.0583	2.0008	15160	0.1541	7463024
0.0138	2.2509	17055	0.1594	8395632
0.2127	2.5010	18950	0.1499	9326256
0.1088	2.7511	20845	0.1446	10259504
0.0039	3.0012	22740	0.1714	11196096
0.0007	3.2513	24635	0.1864	12128448
0.042	3.5014	26530	0.1618	13069824
0.2528	3.7515	28425	0.1563	13996672
0.0009	4.0016	30320	0.1654	14924944
0.0941	4.2517	32215	0.1819	15859920
0.0022	4.5018	34110	0.1833	16790288
0.0004	4.7519	36005	0.1827	17721744

Safetensors

Model size

1B params

Tensor type

F32

Base model

Finetuned

this model