TinyStoriesV2_Llama-3.2-1B-q38bgt8z

This model is a fine-tuned version of meta-llama/Llama-3.2-1B on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 16
eval_batch_size: 64
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.001
num_epochs: 100

Training Loss	Epoch	Step	Validation Loss	Model Preparation Time	Token Accuracy	Token Error Rate	Perplexity
No log	0	0	12.1341	0.0027	0.0000	1.0000	inf
3.8313	0.16	100	4.1267	0.0027	0.2895	0.7105	inf
3.4817	0.32	200	3.5024	0.0027	0.3494	0.6506	inf
3.0293	0.48	300	3.1821	0.0027	0.3893	0.6107	inf
2.9999	0.64	400	2.9671	0.0027	0.4170	0.5830	inf
2.8975	0.8	500	2.8111	0.0027	0.4360	0.5640	inf
2.6878	0.96	600	2.7002	0.0027	0.4485	0.5515	inf
2.2631	1.12	700	2.6067	0.0027	0.4634	0.5366	inf
2.3587	1.28	800	2.5508	0.0027	0.4698	0.5302	inf
2.2857	1.44	900	2.4960	0.0027	0.4779	0.5221	inf
2.4221	1.6	1000	2.4482	0.0027	0.4835	0.5165	inf
2.1476	1.76	1100	2.3961	0.0027	0.4905	0.5095	inf
2.3249	1.92	1200	2.3586	0.0027	0.4943	0.5057	inf
1.5277	2.08	1300	2.3577	0.0027	0.5012	0.4988	inf
1.5914	2.24	1400	2.3577	0.0027	0.5001	0.4999	inf
1.5934	2.4	1500	2.3522	0.0027	0.5002	0.4998	inf
1.6612	2.56	1600	2.3411	0.0027	0.5013	0.4987	inf
1.3713	2.7200	1700	2.3314	0.0027	0.5028	0.4972	inf
1.6183	2.88	1800	2.3228	0.0027	0.5027	0.4973	inf
0.9434	3.04	1900	2.3908	0.0027	0.5025	0.4975	inf
0.9399	3.2	2000	2.4527	0.0027	0.5003	0.4997	inf
0.7391	3.36	2100	2.4972	0.0027	0.4971	0.5029	inf
0.8745	3.52	2200	2.5165	0.0027	0.4957	0.5043	inf
0.7738	3.68	2300	2.5191	0.0027	0.4958	0.5042	inf
0.8281	3.84	2400	2.5401	0.0027	0.4942	0.5058	inf

Safetensors

Model size

1B params

Tensor type

BF16

Base model

Finetuned

(904)

this model