GSM8K-Binary_Llama-3.2-1B-eb5nfly3

This model is a fine-tuned version of meta-llama/Llama-3.2-1B on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.7169
Model Preparation Time: 0.0058
Mdl: 2559.7869
Accumulated Loss: 1774.3091
Correct Preds: 1961.0
Total Preds: 2475.0
Accuracy: 0.7923
Correct Gen Preds: 731.0
Gen Accuracy: 0.2954
Correct Gen Preds 34192: 204.0
Correct Preds 34192: 933.0
Total Labels 34192: 1196.0
Accuracy 34192: 0.7801
Gen Accuracy 34192: 0.1706
Correct Gen Preds 41568: 518.0
Correct Preds 41568: 1028.0
Total Labels 41568: 1267.0
Accuracy 41568: 0.8114
Gen Accuracy 41568: 0.4088

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 32
eval_batch_size: 64
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
lr_scheduler_warmup_ratio: 0.001
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss	Model Preparation Time	Mdl	Accumulated Loss	Correct Preds	Total Preds	Accuracy	Correct Gen Preds	Gen Accuracy	Correct Gen Preds 34192	Correct Preds 34192	Total Labels 34192	Accuracy 34192	Gen Accuracy 34192	Correct Gen Preds 41568	Correct Preds 41568	Total Labels 41568	Accuracy 41568	Gen Accuracy 41568
No log	0	0	1.4656	0.0058	5233.1723	3627.3586	1196.0	2475.0	0.4832	1204.0	0.4865	1196.0	1196.0	1196.0	1.0	1.0	0.0	0.0	1267.0	0.0	0.0
0.4267	1.0	44	0.6381	0.0058	2278.4224	1579.2820	1751.0	2475.0	0.7075	9.0	0.0036	0.0	626.0	1196.0	0.5234	0.0	0.0	1125.0	1267.0	0.8879	0.0
0.2951	2.0	88	0.5564	0.0058	1986.8987	1377.2132	1855.0	2475.0	0.7495	8.0	0.0032	0.0	1112.0	1196.0	0.9298	0.0	0.0	743.0	1267.0	0.5864	0.0
0.2168	3.0	132	0.5809	0.0058	2074.3206	1437.8095	1900.0	2475.0	0.7677	8.0	0.0032	0.0	793.0	1196.0	0.6630	0.0	0.0	1107.0	1267.0	0.8737	0.0
0.1239	4.0	176	0.7169	0.0058	2559.7869	1774.3091	1961.0	2475.0	0.7923	731.0	0.2954	204.0	933.0	1196.0	0.7801	0.1706	518.0	1028.0	1267.0	0.8114	0.4088
0.2313	5.0	220	0.7906	0.0058	2823.1453	1956.8552	1944.0	2475.0	0.7855	442.0	0.1786	84.0	955.0	1196.0	0.7985	0.0702	349.0	989.0	1267.0	0.7806	0.2755
0.0149	6.0	264	1.4312	0.0058	5110.2939	3542.1858	1910.0	2475.0	0.7717	1166.0	0.4711	395.0	905.0	1196.0	0.7567	0.3303	763.0	1005.0	1267.0	0.7932	0.6022
0.0	7.0	308	1.8767	0.0058	6701.2420	4644.9470	1920.0	2475.0	0.7758	1771.0	0.7156	984.0	1065.0	1196.0	0.8905	0.8227	780.0	855.0	1267.0	0.6748	0.6156
0.0	8.0	352	1.8809	0.0058	6716.1771	4655.2992	1940.0	2475.0	0.7838	1924.0	0.7774	980.0	996.0	1196.0	0.8328	0.8194	936.0	944.0	1267.0	0.7451	0.7388
0.0	9.0	396	1.8805	0.0058	6714.6723	4654.2562	1952.0	2475.0	0.7887	1934.0	0.7814	1020.0	1026.0	1196.0	0.8579	0.8528	906.0	926.0	1267.0	0.7309	0.7151
0.0	10.0	440	2.1020	0.0058	7505.4300	5202.3676	1942.0	2475.0	0.7846	1917.0	0.7745	1035.0	1043.0	1196.0	0.8721	0.8654	874.0	899.0	1267.0	0.7096	0.6898
0.0	11.0	484	2.2177	0.0058	7918.5831	5488.7435	1945.0	2475.0	0.7859	1917.0	0.7745	1048.0	1058.0	1196.0	0.8846	0.8763	861.0	887.0	1267.0	0.7001	0.6796
0.0	12.0	528	2.2185	0.0058	7921.6693	5490.8827	1943.0	2475.0	0.7851	1916.0	0.7741	1049.0	1058.0	1196.0	0.8846	0.8771	859.0	885.0	1267.0	0.6985	0.6780
0.0	13.0	572	2.2196	0.0058	7925.2973	5493.3975	1939.0	2475.0	0.7834	1912.0	0.7725	1046.0	1056.0	1196.0	0.8829	0.8746	858.0	883.0	1267.0	0.6969	0.6772
0.0	14.0	616	2.2197	0.0058	7925.9213	5493.8300	1941.0	2475.0	0.7842	1913.0	0.7729	1046.0	1056.0	1196.0	0.8829	0.8746	859.0	885.0	1267.0	0.6985	0.6780
0.6919	15.0	660	2.2190	0.0058	7923.3928	5492.0774	1944.0	2475.0	0.7855	1915.0	0.7737	1046.0	1056.0	1196.0	0.8829	0.8746	861.0	888.0	1267.0	0.7009	0.6796

Framework versions

Transformers 4.51.3
Pytorch 2.6.0+cu124
Datasets 3.5.0
Tokenizers 0.21.1

Downloads last month: 3

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for donoway/GSM8K-Binary_Llama-3.2-1B-eb5nfly3

Base model

meta-llama/Llama-3.2-1B

Finetuned

(900)

this model