TinyStoriesV2_Llama-3.2-1B-cqilj5xm

This model is a fine-tuned version of meta-llama/Llama-3.2-1B on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 16
eval_batch_size: 64
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.001
num_epochs: 100

Training Loss	Epoch	Step	Validation Loss	Model Preparation Time	Token Accuracy	Token Error Rate	Perplexity
No log	0	0	12.1341	0.0017	0.0000	1.0000	inf
3.8329	0.16	100	4.1258	0.0017	0.2893	0.7107	inf
3.4807	0.32	200	3.5020	0.0017	0.3494	0.6506	inf
3.0304	0.48	300	3.1815	0.0017	0.3897	0.6103	inf
2.9981	0.64	400	2.9675	0.0017	0.4173	0.5827	inf
2.8975	0.8	500	2.8119	0.0017	0.4357	0.5643	inf
2.691	0.96	600	2.6998	0.0017	0.4486	0.5514	inf
2.2577	1.12	700	2.6060	0.0017	0.4634	0.5366	inf
2.3614	1.28	800	2.5507	0.0017	0.4697	0.5303	inf
2.2865	1.44	900	2.4960	0.0017	0.4776	0.5224	inf
2.4233	1.6	1000	2.4489	0.0017	0.4835	0.5165	inf
2.1458	1.76	1100	2.3967	0.0017	0.4906	0.5094	inf
2.3283	1.92	1200	2.3582	0.0017	0.4943	0.5057	inf
1.5291	2.08	1300	2.3555	0.0017	0.5009	0.4991	inf
1.5903	2.24	1400	2.3568	0.0017	0.4996	0.5004	inf
1.5963	2.4	1500	2.3527	0.0017	0.5003	0.4997	inf
1.6494	2.56	1600	2.3405	0.0017	0.5015	0.4985	inf
1.3717	2.7200	1700	2.3306	0.0017	0.5023	0.4977	inf
1.619	2.88	1800	2.3244	0.0017	0.5021	0.4979	inf
0.9459	3.04	1900	2.3887	0.0017	0.5036	0.4964	inf
0.9456	3.2	2000	2.4541	0.0017	0.4995	0.5005	inf
0.734	3.36	2100	2.4907	0.0017	0.4967	0.5033	inf
0.8771	3.52	2200	2.5155	0.0017	0.4954	0.5046	inf
0.7737	3.68	2300	2.5191	0.0017	0.4958	0.5042	inf
0.8325	3.84	2400	2.5384	0.0017	0.4938	0.5062	inf

Safetensors

Model size

1B params

Tensor type

BF16

Base model

Finetuned

(903)

this model