GSM8K-Binary_Llama-3.2-1B-18tzjprl

This model is a fine-tuned version of meta-llama/Llama-3.2-1B on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.8522
Model Preparation Time: 0.0063
Mdl: 6613.4816
Accumulated Loss: 4584.1161
Correct Preds: 1948.0
Total Preds: 2475.0
Accuracy: 0.7871
Correct Gen Preds: 1842.0
Gen Accuracy: 0.7442
Correct Gen Preds 34192: 932.0
Correct Preds 34192: 998.0
Total Labels 34192: 1196.0
Accuracy 34192: 0.8344
Gen Accuracy 34192: 0.7793
Correct Gen Preds 41568: 903.0
Correct Preds 41568: 950.0
Total Labels 41568: 1267.0
Accuracy 41568: 0.7498
Gen Accuracy 41568: 0.7127

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 32
eval_batch_size: 64
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
lr_scheduler_warmup_ratio: 0.001
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss	Model Preparation Time	Mdl	Accumulated Loss	Correct Preds	Total Preds	Accuracy	Correct Gen Preds	Gen Accuracy	Correct Gen Preds 34192	Correct Preds 34192	Total Labels 34192	Accuracy 34192	Gen Accuracy 34192	Correct Gen Preds 41568	Correct Preds 41568	Total Labels 41568	Accuracy 41568	Gen Accuracy 41568
No log	0	0	1.4656	0.0063	5233.1723	3627.3586	1196.0	2475.0	0.4832	1204.0	0.4865	1196.0	1196.0	1196.0	1.0	1.0	0.0	0.0	1267.0	0.0	0.0
0.6986	1.0	39	0.5737	0.0063	2048.4536	1419.8798	1819.0	2475.0	0.7349	8.0	0.0032	0.0	1032.0	1196.0	0.8629	0.0	0.0	787.0	1267.0	0.6212	0.0
0.0873	2.0	78	0.9806	0.0063	3501.2903	2426.9095	1495.0	2475.0	0.6040	7.0	0.0028	0.0	1190.0	1196.0	0.9950	0.0	0.0	305.0	1267.0	0.2407	0.0
0.0457	3.0	117	0.5995	0.0063	2140.5990	1483.7501	1932.0	2475.0	0.7806	8.0	0.0032	0.0	935.0	1196.0	0.7818	0.0	1.0	997.0	1267.0	0.7869	0.0008
0.1391	4.0	156	0.8943	0.0063	3193.1325	2213.3108	1788.0	2475.0	0.7224	20.0	0.0081	6.0	1132.0	1196.0	0.9465	0.0050	6.0	656.0	1267.0	0.5178	0.0047
0.0048	5.0	195	0.8029	0.0063	2866.8084	1987.1202	1937.0	2475.0	0.7826	317.0	0.1281	177.0	936.0	1196.0	0.7826	0.1480	132.0	1001.0	1267.0	0.7901	0.1042
0.0029	6.0	234	1.3201	0.0063	4713.7690	3267.3357	1897.0	2475.0	0.7665	1697.0	0.6857	732.0	879.0	1196.0	0.7349	0.6120	959.0	1018.0	1267.0	0.8035	0.7569
0.0	7.0	273	1.5486	0.0063	5529.4640	3832.7324	1913.0	2475.0	0.7729	1780.0	0.7192	807.0	890.0	1196.0	0.7441	0.6747	966.0	1023.0	1267.0	0.8074	0.7624
0.0	8.0	312	1.8522	0.0063	6613.4816	4584.1161	1948.0	2475.0	0.7871	1842.0	0.7442	932.0	998.0	1196.0	0.8344	0.7793	903.0	950.0	1267.0	0.7498	0.7127
0.3208	9.0	351	2.7180	0.0063	9704.9736	6726.9751	1795.0	2475.0	0.7253	1514.0	0.6117	1011.0	1128.0	1196.0	0.9431	0.8453	497.0	667.0	1267.0	0.5264	0.3923
0.0001	10.0	390	1.7598	0.0063	6283.5026	4355.3921	1934.0	2475.0	0.7814	1784.0	0.7208	865.0	964.0	1196.0	0.8060	0.7232	913.0	970.0	1267.0	0.7656	0.7206
0.0	11.0	429	1.9865	0.0063	7093.0480	4916.5262	1929.0	2475.0	0.7794	1837.0	0.7422	961.0	1015.0	1196.0	0.8487	0.8035	870.0	914.0	1267.0	0.7214	0.6867
0.0	12.0	468	2.1776	0.0063	7775.3417	5389.4561	1922.0	2475.0	0.7766	1827.0	0.7382	1014.0	1061.0	1196.0	0.8871	0.8478	807.0	861.0	1267.0	0.6796	0.6369
0.0	13.0	507	2.0786	0.0063	7421.9222	5144.4844	1921.0	2475.0	0.7762	1828.0	0.7386	964.0	1015.0	1196.0	0.8487	0.8060	857.0	906.0	1267.0	0.7151	0.6764
0.0	14.0	546	2.0790	0.0063	7423.4325	5145.5313	1920.0	2475.0	0.7758	1822.0	0.7362	958.0	1014.0	1196.0	0.8478	0.8010	857.0	906.0	1267.0	0.7151	0.6764
0.0	15.0	585	2.0766	0.0063	7414.9542	5139.6546	1920.0	2475.0	0.7758	1819.0	0.7349	958.0	1014.0	1196.0	0.8478	0.8010	855.0	906.0	1267.0	0.7151	0.6748
0.0	16.0	624	2.0779	0.0063	7419.5115	5142.8135	1923.0	2475.0	0.7770	1825.0	0.7374	960.0	1016.0	1196.0	0.8495	0.8027	858.0	907.0	1267.0	0.7159	0.6772
0.0	17.0	663	2.0790	0.0063	7423.4221	5145.5241	1922.0	2475.0	0.7766	1825.0	0.7374	960.0	1015.0	1196.0	0.8487	0.8027	858.0	907.0	1267.0	0.7159	0.6772
0.0	18.0	702	2.0758	0.0063	7412.1084	5137.6820	1921.0	2475.0	0.7762	1824.0	0.7370	960.0	1014.0	1196.0	0.8478	0.8027	857.0	907.0	1267.0	0.7159	0.6764
0.0	19.0	741	2.0786	0.0063	7422.1484	5144.6412	1920.0	2475.0	0.7758	1822.0	0.7362	958.0	1013.0	1196.0	0.8470	0.8010	857.0	907.0	1267.0	0.7159	0.6764
0.0	20.0	780	2.0767	0.0063	7415.1916	5139.8192	1921.0	2475.0	0.7762	1823.0	0.7366	958.0	1016.0	1196.0	0.8495	0.8010	858.0	905.0	1267.0	0.7143	0.6772
0.0	21.0	819	2.0793	0.0063	7424.3233	5146.1488	1922.0	2475.0	0.7766	1823.0	0.7366	959.0	1015.0	1196.0	0.8487	0.8018	857.0	907.0	1267.0	0.7159	0.6764

Framework versions

Transformers 4.51.3
Pytorch 2.6.0+cu124
Datasets 3.5.0
Tokenizers 0.21.1

Downloads last month: 6

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for donoway/GSM8K-Binary_Llama-3.2-1B-18tzjprl

Base model

meta-llama/Llama-3.2-1B

Finetuned

(903)

this model