GSM8K-Binary_Llama-3.2-1B-f8096090

This model is a fine-tuned version of meta-llama/Llama-3.2-1B on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.6336
Model Preparation Time: 0.0058
Mdl: 2262.3589
Accumulated Loss: 1568.1477
Correct Preds: 1973.0
Total Preds: 2475.0
Accuracy: 0.7972
Correct Gen Preds: 369.0
Gen Accuracy: 0.1491
Correct Gen Preds 34192: 0.0
Correct Preds 34192: 974.0
Total Labels 34192: 1196.0
Accuracy 34192: 0.8144
Gen Accuracy 34192: 0.0
Correct Gen Preds 41568: 362.0
Correct Preds 41568: 999.0
Total Labels 41568: 1267.0
Accuracy 41568: 0.7885
Gen Accuracy 41568: 0.2857

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 32
eval_batch_size: 64
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
lr_scheduler_warmup_ratio: 0.001
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss	Model Preparation Time	Mdl	Accumulated Loss	Correct Preds	Total Preds	Accuracy	Correct Gen Preds	Gen Accuracy	Correct Gen Preds 34192	Correct Preds 34192	Total Labels 34192	Accuracy 34192	Gen Accuracy 34192	Correct Gen Preds 41568	Correct Preds 41568	Total Labels 41568	Accuracy 41568	Gen Accuracy 41568
No log	0	0	1.4656	0.0058	5233.1723	3627.3586	1196.0	2475.0	0.4832	1204.0	0.4865	1196.0	1196.0	1196.0	1.0	1.0	0.0	0.0	1267.0	0.0	0.0
0.5859	1.0	52	0.5818	0.0058	2077.5047	1440.0165	1847.0	2475.0	0.7463	8.0	0.0032	0.0	857.0	1196.0	0.7166	0.0	0.0	990.0	1267.0	0.7814	0.0
0.6145	2.0	104	0.5168	0.0058	1845.2524	1279.0315	1948.0	2475.0	0.7871	69.0	0.0279	0.0	1063.0	1196.0	0.8888	0.0	61.0	885.0	1267.0	0.6985	0.0481
0.2879	3.0	156	0.5778	0.0058	2063.1398	1430.0595	1868.0	2475.0	0.7547	53.0	0.0214	0.0	1106.0	1196.0	0.9247	0.0	46.0	762.0	1267.0	0.6014	0.0363
0.0501	4.0	208	0.6336	0.0058	2262.3589	1568.1477	1973.0	2475.0	0.7972	369.0	0.1491	0.0	974.0	1196.0	0.8144	0.0	362.0	999.0	1267.0	0.7885	0.2857
0.3604	5.0	260	1.7321	0.0058	6184.7525	4286.9438	1864.0	2475.0	0.7531	1135.0	0.4586	634.0	1105.0	1196.0	0.9239	0.5301	494.0	759.0	1267.0	0.5991	0.3899
0.0662	6.0	312	1.2469	0.0058	4452.3018	3086.1004	1972.0	2475.0	0.7968	1028.0	0.4154	359.0	1028.0	1196.0	0.8595	0.3002	661.0	944.0	1267.0	0.7451	0.5217
0.0	7.0	364	1.4682	0.0058	5242.5624	3633.8673	1970.0	2475.0	0.7960	1223.0	0.4941	464.0	1033.0	1196.0	0.8637	0.3880	751.0	937.0	1267.0	0.7395	0.5927
0.0003	8.0	416	1.9052	0.0058	6802.8127	4715.3504	1925.0	2475.0	0.7778	1504.0	0.6077	583.0	948.0	1196.0	0.7926	0.4875	914.0	977.0	1267.0	0.7711	0.7214
0.5881	9.0	468	1.9828	0.0058	7079.8847	4907.4021	1957.0	2475.0	0.7907	1879.0	0.7592	920.0	983.0	1196.0	0.8219	0.7692	952.0	974.0	1267.0	0.7687	0.7514
0.0	10.0	520	1.9968	0.0058	7129.8865	4942.0607	1957.0	2475.0	0.7907	1886.0	0.7620	913.0	972.0	1196.0	0.8127	0.7634	966.0	985.0	1267.0	0.7774	0.7624
0.5881	11.0	572	2.0014	0.0058	7146.2344	4953.3922	1959.0	2475.0	0.7915	1892.0	0.7644	918.0	972.0	1196.0	0.8127	0.7676	967.0	987.0	1267.0	0.7790	0.7632
0.0	12.0	624	2.0068	0.0058	7165.7013	4966.8857	1959.0	2475.0	0.7915	1890.0	0.7636	916.0	972.0	1196.0	0.8127	0.7659	967.0	987.0	1267.0	0.7790	0.7632
0.5882	13.0	676	2.0059	0.0058	7162.3520	4964.5641	1959.0	2475.0	0.7915	1893.0	0.7648	919.0	973.0	1196.0	0.8135	0.7684	967.0	986.0	1267.0	0.7782	0.7632
0.0	14.0	728	2.0106	0.0058	7179.0242	4976.1204	1958.0	2475.0	0.7911	1891.0	0.7640	918.0	972.0	1196.0	0.8127	0.7676	966.0	986.0	1267.0	0.7782	0.7624

Framework versions

Transformers 4.51.3
Pytorch 2.6.0+cu124
Datasets 3.5.0
Tokenizers 0.21.1

Downloads last month: 2

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for donoway/GSM8K-Binary_Llama-3.2-1B-f8096090

Base model

meta-llama/Llama-3.2-1B

Finetuned

(903)

this model