GSM8K-Binary_Llama-3.2-1B-18tzjprl
This model is a fine-tuned version of meta-llama/Llama-3.2-1B on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.8522
- Model Preparation Time: 0.0063
- Mdl: 6613.4816
- Accumulated Loss: 4584.1161
- Correct Preds: 1948.0
- Total Preds: 2475.0
- Accuracy: 0.7871
- Correct Gen Preds: 1842.0
- Gen Accuracy: 0.7442
- Correct Gen Preds 34192: 932.0
- Correct Preds 34192: 998.0
- Total Labels 34192: 1196.0
- Accuracy 34192: 0.8344
- Gen Accuracy 34192: 0.7793
- Correct Gen Preds 41568: 903.0
- Correct Preds 41568: 950.0
- Total Labels 41568: 1267.0
- Accuracy 41568: 0.7498
- Gen Accuracy 41568: 0.7127
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 32
- eval_batch_size: 64
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- lr_scheduler_warmup_ratio: 0.001
- num_epochs: 100
Training results
| Training Loss | Epoch | Step | Validation Loss | Model Preparation Time | Mdl | Accumulated Loss | Correct Preds | Total Preds | Accuracy | Correct Gen Preds | Gen Accuracy | Correct Gen Preds 34192 | Correct Preds 34192 | Total Labels 34192 | Accuracy 34192 | Gen Accuracy 34192 | Correct Gen Preds 41568 | Correct Preds 41568 | Total Labels 41568 | Accuracy 41568 | Gen Accuracy 41568 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| No log | 0 | 0 | 1.4656 | 0.0063 | 5233.1723 | 3627.3586 | 1196.0 | 2475.0 | 0.4832 | 1204.0 | 0.4865 | 1196.0 | 1196.0 | 1196.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1267.0 | 0.0 | 0.0 |
| 0.6986 | 1.0 | 39 | 0.5737 | 0.0063 | 2048.4536 | 1419.8798 | 1819.0 | 2475.0 | 0.7349 | 8.0 | 0.0032 | 0.0 | 1032.0 | 1196.0 | 0.8629 | 0.0 | 0.0 | 787.0 | 1267.0 | 0.6212 | 0.0 |
| 0.0873 | 2.0 | 78 | 0.9806 | 0.0063 | 3501.2903 | 2426.9095 | 1495.0 | 2475.0 | 0.6040 | 7.0 | 0.0028 | 0.0 | 1190.0 | 1196.0 | 0.9950 | 0.0 | 0.0 | 305.0 | 1267.0 | 0.2407 | 0.0 |
| 0.0457 | 3.0 | 117 | 0.5995 | 0.0063 | 2140.5990 | 1483.7501 | 1932.0 | 2475.0 | 0.7806 | 8.0 | 0.0032 | 0.0 | 935.0 | 1196.0 | 0.7818 | 0.0 | 1.0 | 997.0 | 1267.0 | 0.7869 | 0.0008 |
| 0.1391 | 4.0 | 156 | 0.8943 | 0.0063 | 3193.1325 | 2213.3108 | 1788.0 | 2475.0 | 0.7224 | 20.0 | 0.0081 | 6.0 | 1132.0 | 1196.0 | 0.9465 | 0.0050 | 6.0 | 656.0 | 1267.0 | 0.5178 | 0.0047 |
| 0.0048 | 5.0 | 195 | 0.8029 | 0.0063 | 2866.8084 | 1987.1202 | 1937.0 | 2475.0 | 0.7826 | 317.0 | 0.1281 | 177.0 | 936.0 | 1196.0 | 0.7826 | 0.1480 | 132.0 | 1001.0 | 1267.0 | 0.7901 | 0.1042 |
| 0.0029 | 6.0 | 234 | 1.3201 | 0.0063 | 4713.7690 | 3267.3357 | 1897.0 | 2475.0 | 0.7665 | 1697.0 | 0.6857 | 732.0 | 879.0 | 1196.0 | 0.7349 | 0.6120 | 959.0 | 1018.0 | 1267.0 | 0.8035 | 0.7569 |
| 0.0 | 7.0 | 273 | 1.5486 | 0.0063 | 5529.4640 | 3832.7324 | 1913.0 | 2475.0 | 0.7729 | 1780.0 | 0.7192 | 807.0 | 890.0 | 1196.0 | 0.7441 | 0.6747 | 966.0 | 1023.0 | 1267.0 | 0.8074 | 0.7624 |
| 0.0 | 8.0 | 312 | 1.8522 | 0.0063 | 6613.4816 | 4584.1161 | 1948.0 | 2475.0 | 0.7871 | 1842.0 | 0.7442 | 932.0 | 998.0 | 1196.0 | 0.8344 | 0.7793 | 903.0 | 950.0 | 1267.0 | 0.7498 | 0.7127 |
| 0.3208 | 9.0 | 351 | 2.7180 | 0.0063 | 9704.9736 | 6726.9751 | 1795.0 | 2475.0 | 0.7253 | 1514.0 | 0.6117 | 1011.0 | 1128.0 | 1196.0 | 0.9431 | 0.8453 | 497.0 | 667.0 | 1267.0 | 0.5264 | 0.3923 |
| 0.0001 | 10.0 | 390 | 1.7598 | 0.0063 | 6283.5026 | 4355.3921 | 1934.0 | 2475.0 | 0.7814 | 1784.0 | 0.7208 | 865.0 | 964.0 | 1196.0 | 0.8060 | 0.7232 | 913.0 | 970.0 | 1267.0 | 0.7656 | 0.7206 |
| 0.0 | 11.0 | 429 | 1.9865 | 0.0063 | 7093.0480 | 4916.5262 | 1929.0 | 2475.0 | 0.7794 | 1837.0 | 0.7422 | 961.0 | 1015.0 | 1196.0 | 0.8487 | 0.8035 | 870.0 | 914.0 | 1267.0 | 0.7214 | 0.6867 |
| 0.0 | 12.0 | 468 | 2.1776 | 0.0063 | 7775.3417 | 5389.4561 | 1922.0 | 2475.0 | 0.7766 | 1827.0 | 0.7382 | 1014.0 | 1061.0 | 1196.0 | 0.8871 | 0.8478 | 807.0 | 861.0 | 1267.0 | 0.6796 | 0.6369 |
| 0.0 | 13.0 | 507 | 2.0786 | 0.0063 | 7421.9222 | 5144.4844 | 1921.0 | 2475.0 | 0.7762 | 1828.0 | 0.7386 | 964.0 | 1015.0 | 1196.0 | 0.8487 | 0.8060 | 857.0 | 906.0 | 1267.0 | 0.7151 | 0.6764 |
| 0.0 | 14.0 | 546 | 2.0790 | 0.0063 | 7423.4325 | 5145.5313 | 1920.0 | 2475.0 | 0.7758 | 1822.0 | 0.7362 | 958.0 | 1014.0 | 1196.0 | 0.8478 | 0.8010 | 857.0 | 906.0 | 1267.0 | 0.7151 | 0.6764 |
| 0.0 | 15.0 | 585 | 2.0766 | 0.0063 | 7414.9542 | 5139.6546 | 1920.0 | 2475.0 | 0.7758 | 1819.0 | 0.7349 | 958.0 | 1014.0 | 1196.0 | 0.8478 | 0.8010 | 855.0 | 906.0 | 1267.0 | 0.7151 | 0.6748 |
| 0.0 | 16.0 | 624 | 2.0779 | 0.0063 | 7419.5115 | 5142.8135 | 1923.0 | 2475.0 | 0.7770 | 1825.0 | 0.7374 | 960.0 | 1016.0 | 1196.0 | 0.8495 | 0.8027 | 858.0 | 907.0 | 1267.0 | 0.7159 | 0.6772 |
| 0.0 | 17.0 | 663 | 2.0790 | 0.0063 | 7423.4221 | 5145.5241 | 1922.0 | 2475.0 | 0.7766 | 1825.0 | 0.7374 | 960.0 | 1015.0 | 1196.0 | 0.8487 | 0.8027 | 858.0 | 907.0 | 1267.0 | 0.7159 | 0.6772 |
| 0.0 | 18.0 | 702 | 2.0758 | 0.0063 | 7412.1084 | 5137.6820 | 1921.0 | 2475.0 | 0.7762 | 1824.0 | 0.7370 | 960.0 | 1014.0 | 1196.0 | 0.8478 | 0.8027 | 857.0 | 907.0 | 1267.0 | 0.7159 | 0.6764 |
| 0.0 | 19.0 | 741 | 2.0786 | 0.0063 | 7422.1484 | 5144.6412 | 1920.0 | 2475.0 | 0.7758 | 1822.0 | 0.7362 | 958.0 | 1013.0 | 1196.0 | 0.8470 | 0.8010 | 857.0 | 907.0 | 1267.0 | 0.7159 | 0.6764 |
| 0.0 | 20.0 | 780 | 2.0767 | 0.0063 | 7415.1916 | 5139.8192 | 1921.0 | 2475.0 | 0.7762 | 1823.0 | 0.7366 | 958.0 | 1016.0 | 1196.0 | 0.8495 | 0.8010 | 858.0 | 905.0 | 1267.0 | 0.7143 | 0.6772 |
| 0.0 | 21.0 | 819 | 2.0793 | 0.0063 | 7424.3233 | 5146.1488 | 1922.0 | 2475.0 | 0.7766 | 1823.0 | 0.7366 | 959.0 | 1015.0 | 1196.0 | 0.8487 | 0.8018 | 857.0 | 907.0 | 1267.0 | 0.7159 | 0.6764 |
Framework versions
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1
- Downloads last month
- 6
Model tree for donoway/GSM8K-Binary_Llama-3.2-1B-18tzjprl
Base model
meta-llama/Llama-3.2-1B