TinyStoriesV2_Llama-3.2-1B-7whtiyy8

This model is a fine-tuned version of meta-llama/Llama-3.2-1B on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 16
eval_batch_size: 64
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.001
num_epochs: 100

Training Loss	Epoch	Step	Validation Loss	Model Preparation Time	Token Accuracy	Token Error Rate	Perplexity
No log	0	0	12.1341	0.006	0.0000	1.0000	inf
3.8331	0.16	100	4.1262	0.006	0.2894	0.7106	inf
3.4808	0.32	200	3.5019	0.006	0.3497	0.6503	inf
3.0294	0.48	300	3.1814	0.006	0.3898	0.6102	inf
2.9986	0.64	400	2.9675	0.006	0.4171	0.5829	inf
2.898	0.8	500	2.8123	0.006	0.4359	0.5641	inf
2.6878	0.96	600	2.6999	0.006	0.4483	0.5517	inf
2.2593	1.12	700	2.6065	0.006	0.4630	0.5370	inf
2.362	1.28	800	2.5501	0.006	0.4697	0.5303	inf
2.2866	1.44	900	2.4962	0.006	0.4781	0.5219	inf
2.4233	1.6	1000	2.4508	0.006	0.4830	0.5170	inf
2.1492	1.76	1100	2.3965	0.006	0.4908	0.5092	inf
2.3272	1.92	1200	2.3582	0.006	0.4940	0.5060	inf
1.5257	2.08	1300	2.3568	0.006	0.5007	0.4993	inf
1.5882	2.24	1400	2.3598	0.006	0.4999	0.5001	inf
1.5947	2.4	1500	2.3523	0.006	0.5006	0.4994	inf
1.6552	2.56	1600	2.3391	0.006	0.5016	0.4984	inf
1.3803	2.7200	1700	2.3301	0.006	0.5025	0.4975	inf
1.6149	2.88	1800	2.3259	0.006	0.5029	0.4971	inf
0.9457	3.04	1900	2.3909	0.006	0.5030	0.4970	inf
0.9481	3.2	2000	2.4530	0.006	0.4993	0.5007	inf
0.7384	3.36	2100	2.4962	0.006	0.4976	0.5024	inf
0.8767	3.52	2200	2.5152	0.006	0.4955	0.5045	inf
0.7595	3.68	2300	2.5172	0.006	0.4961	0.5039	inf
0.8285	3.84	2400	2.5394	0.006	0.4944	0.5056	inf

Safetensors

Model size

1B params

Tensor type

BF16

Base model

Finetuned

(903)

this model

Quantizations