GSM8K-Binary_Llama-3.2-1B-bpzb2ktm
This model is a fine-tuned version of meta-llama/Llama-3.2-1B on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.3525
- Model Preparation Time: 0.0056
- Mdl: 4829.4638
- Accumulated Loss: 3347.5292
- Correct Preds: 2001.0
- Total Preds: 2475.0
- Accuracy: 0.8085
- Correct Gen Preds: 2001.0
- Gen Accuracy: 0.8085
- Correct Gen Preds 34192: 1028.0
- Correct Preds 34192: 1033.0
- Total Labels 34192: 1196.0
- Accuracy 34192: 0.8637
- Gen Accuracy 34192: 0.8595
- Correct Gen Preds 41568: 965.0
- Correct Preds 41568: 968.0
- Total Labels 41568: 1267.0
- Accuracy 41568: 0.7640
- Gen Accuracy 41568: 0.7616
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 32
- eval_batch_size: 64
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.01
- num_epochs: 100
Training results
| Training Loss | Epoch | Step | Validation Loss | Model Preparation Time | Mdl | Accumulated Loss | Correct Preds | Total Preds | Accuracy | Correct Gen Preds | Gen Accuracy | Correct Gen Preds 34192 | Correct Preds 34192 | Total Labels 34192 | Accuracy 34192 | Gen Accuracy 34192 | Correct Gen Preds 41568 | Correct Preds 41568 | Total Labels 41568 | Accuracy 41568 | Gen Accuracy 41568 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| No log | 0 | 0 | 1.4656 | 0.0056 | 5233.1723 | 3627.3586 | 1196.0 | 2475.0 | 0.4832 | 1204.0 | 0.4865 | 1196.0 | 1196.0 | 1196.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1267.0 | 0.0 | 0.0 |
| 0.4643 | 1.0 | 32 | 0.6079 | 0.0056 | 2170.4901 | 1504.4691 | 1770.0 | 2475.0 | 0.7152 | 1776.0 | 0.7176 | 821.0 | 821.0 | 1196.0 | 0.6865 | 0.6865 | 947.0 | 949.0 | 1267.0 | 0.7490 | 0.7474 |
| 0.5059 | 2.0 | 64 | 0.5626 | 0.0056 | 2008.7864 | 1392.3846 | 1839.0 | 2475.0 | 0.7430 | 10.0 | 0.0040 | 0.0 | 1091.0 | 1196.0 | 0.9122 | 0.0 | 1.0 | 748.0 | 1267.0 | 0.5904 | 0.0008 |
| 0.4259 | 3.0 | 96 | 0.6197 | 0.0056 | 2212.8014 | 1533.7970 | 1859.0 | 2475.0 | 0.7511 | 176.0 | 0.0711 | 0.0 | 1134.0 | 1196.0 | 0.9482 | 0.0 | 168.0 | 725.0 | 1267.0 | 0.5722 | 0.1326 |
| 0.5362 | 4.0 | 128 | 0.6927 | 0.0056 | 2473.5441 | 1714.5301 | 1919.0 | 2475.0 | 0.7754 | 508.0 | 0.2053 | 79.0 | 832.0 | 1196.0 | 0.6957 | 0.0661 | 420.0 | 1087.0 | 1267.0 | 0.8579 | 0.3315 |
| 0.1605 | 5.0 | 160 | 0.8818 | 0.0056 | 3148.5150 | 2182.3843 | 1925.0 | 2475.0 | 0.7778 | 1554.0 | 0.6279 | 667.0 | 892.0 | 1196.0 | 0.7458 | 0.5577 | 878.0 | 1033.0 | 1267.0 | 0.8153 | 0.6930 |
| 0.0332 | 6.0 | 192 | 1.1003 | 0.0056 | 3928.6770 | 2723.1514 | 1914.0 | 2475.0 | 0.7733 | 1550.0 | 0.6263 | 870.0 | 1085.0 | 1196.0 | 0.9072 | 0.7274 | 670.0 | 829.0 | 1267.0 | 0.6543 | 0.5288 |
| 0.1398 | 7.0 | 224 | 1.2557 | 0.0056 | 4483.5623 | 3107.7686 | 1896.0 | 2475.0 | 0.7661 | 1728.0 | 0.6982 | 675.0 | 777.0 | 1196.0 | 0.6497 | 0.5644 | 1043.0 | 1119.0 | 1267.0 | 0.8832 | 0.8232 |
| 0.6464 | 8.0 | 256 | 1.4174 | 0.0056 | 5061.1064 | 3508.0916 | 1859.0 | 2475.0 | 0.7511 | 1796.0 | 0.7257 | 690.0 | 734.0 | 1196.0 | 0.6137 | 0.5769 | 1097.0 | 1125.0 | 1267.0 | 0.8879 | 0.8658 |
| 0.0 | 9.0 | 288 | 1.4300 | 0.0056 | 5105.9013 | 3539.1411 | 1987.0 | 2475.0 | 0.8028 | 1989.0 | 0.8036 | 1058.0 | 1062.0 | 1196.0 | 0.8880 | 0.8846 | 922.0 | 925.0 | 1267.0 | 0.7301 | 0.7277 |
| 0.0003 | 10.0 | 320 | 1.2444 | 0.0056 | 4443.1854 | 3079.7814 | 1983.0 | 2475.0 | 0.8012 | 1968.0 | 0.7952 | 1022.0 | 1036.0 | 1196.0 | 0.8662 | 0.8545 | 938.0 | 947.0 | 1267.0 | 0.7474 | 0.7403 |
| 0.941 | 11.0 | 352 | 1.3525 | 0.0056 | 4829.4638 | 3347.5292 | 2001.0 | 2475.0 | 0.8085 | 2001.0 | 0.8085 | 1028.0 | 1033.0 | 1196.0 | 0.8637 | 0.8595 | 965.0 | 968.0 | 1267.0 | 0.7640 | 0.7616 |
| 0.0 | 12.0 | 384 | 1.3653 | 0.0056 | 4874.9553 | 3379.0615 | 2000.0 | 2475.0 | 0.8081 | 2001.0 | 0.8085 | 1030.0 | 1035.0 | 1196.0 | 0.8654 | 0.8612 | 963.0 | 965.0 | 1267.0 | 0.7616 | 0.7601 |
| 0.0 | 13.0 | 416 | 1.3662 | 0.0056 | 4878.3502 | 3381.4147 | 1999.0 | 2475.0 | 0.8077 | 2001.0 | 0.8085 | 1030.0 | 1034.0 | 1196.0 | 0.8645 | 0.8612 | 963.0 | 965.0 | 1267.0 | 0.7616 | 0.7601 |
| 0.0 | 14.0 | 448 | 1.3674 | 0.0056 | 4882.3632 | 3384.1963 | 1998.0 | 2475.0 | 0.8073 | 1998.0 | 0.8073 | 1029.0 | 1034.0 | 1196.0 | 0.8645 | 0.8604 | 961.0 | 964.0 | 1267.0 | 0.7609 | 0.7585 |
| 0.0 | 15.0 | 480 | 1.3707 | 0.0056 | 4894.2185 | 3392.4138 | 1997.0 | 2475.0 | 0.8069 | 1999.0 | 0.8077 | 1030.0 | 1034.0 | 1196.0 | 0.8645 | 0.8612 | 961.0 | 963.0 | 1267.0 | 0.7601 | 0.7585 |
| 0.4705 | 16.0 | 512 | 1.3717 | 0.0056 | 4898.0271 | 3395.0537 | 1998.0 | 2475.0 | 0.8073 | 2001.0 | 0.8085 | 1029.0 | 1033.0 | 1196.0 | 0.8637 | 0.8604 | 964.0 | 965.0 | 1267.0 | 0.7616 | 0.7609 |
| 0.0 | 17.0 | 544 | 1.3733 | 0.0056 | 4903.4386 | 3398.8046 | 1996.0 | 2475.0 | 0.8065 | 1999.0 | 0.8077 | 1031.0 | 1034.0 | 1196.0 | 0.8645 | 0.8620 | 960.0 | 962.0 | 1267.0 | 0.7593 | 0.7577 |
| 0.0 | 18.0 | 576 | 1.3761 | 0.0056 | 4913.6611 | 3405.8903 | 1996.0 | 2475.0 | 0.8065 | 1997.0 | 0.8069 | 1030.0 | 1035.0 | 1196.0 | 0.8654 | 0.8612 | 959.0 | 961.0 | 1267.0 | 0.7585 | 0.7569 |
| 0.0 | 19.0 | 608 | 1.3771 | 0.0056 | 4917.0501 | 3408.2394 | 1998.0 | 2475.0 | 0.8073 | 2000.0 | 0.8081 | 1030.0 | 1034.0 | 1196.0 | 0.8645 | 0.8612 | 962.0 | 964.0 | 1267.0 | 0.7609 | 0.7593 |
| 0.4705 | 20.0 | 640 | 1.3790 | 0.0056 | 4923.8825 | 3412.9753 | 2001.0 | 2475.0 | 0.8085 | 2003.0 | 0.8093 | 1033.0 | 1037.0 | 1196.0 | 0.8671 | 0.8637 | 962.0 | 964.0 | 1267.0 | 0.7609 | 0.7593 |
| 0.0 | 21.0 | 672 | 1.3786 | 0.0056 | 4922.3614 | 3411.9210 | 1996.0 | 2475.0 | 0.8065 | 1999.0 | 0.8077 | 1032.0 | 1035.0 | 1196.0 | 0.8654 | 0.8629 | 959.0 | 961.0 | 1267.0 | 0.7585 | 0.7569 |
| 0.0 | 22.0 | 704 | 1.3813 | 0.0056 | 4932.1419 | 3418.7003 | 1997.0 | 2475.0 | 0.8069 | 2000.0 | 0.8081 | 1033.0 | 1036.0 | 1196.0 | 0.8662 | 0.8637 | 959.0 | 961.0 | 1267.0 | 0.7585 | 0.7569 |
| 0.0 | 23.0 | 736 | 1.3823 | 0.0056 | 4935.9056 | 3421.3091 | 1995.0 | 2475.0 | 0.8061 | 1999.0 | 0.8077 | 1033.0 | 1035.0 | 1196.0 | 0.8654 | 0.8637 | 958.0 | 960.0 | 1267.0 | 0.7577 | 0.7561 |
| 0.0 | 24.0 | 768 | 1.3830 | 0.0056 | 4938.2231 | 3422.9154 | 1997.0 | 2475.0 | 0.8069 | 1999.0 | 0.8077 | 1032.0 | 1036.0 | 1196.0 | 0.8662 | 0.8629 | 959.0 | 961.0 | 1267.0 | 0.7585 | 0.7569 |
| 0.0 | 25.0 | 800 | 1.3842 | 0.0056 | 4942.4284 | 3425.8303 | 1998.0 | 2475.0 | 0.8073 | 2000.0 | 0.8081 | 1032.0 | 1036.0 | 1196.0 | 0.8662 | 0.8629 | 960.0 | 962.0 | 1267.0 | 0.7593 | 0.7577 |
| 0.0 | 26.0 | 832 | 1.3851 | 0.0056 | 4945.7223 | 3428.1134 | 1996.0 | 2475.0 | 0.8065 | 1998.0 | 0.8073 | 1032.0 | 1036.0 | 1196.0 | 0.8662 | 0.8629 | 958.0 | 960.0 | 1267.0 | 0.7577 | 0.7561 |
| 0.4705 | 27.0 | 864 | 1.3860 | 0.0056 | 4948.9212 | 3430.3308 | 1998.0 | 2475.0 | 0.8073 | 1999.0 | 0.8077 | 1032.0 | 1037.0 | 1196.0 | 0.8671 | 0.8629 | 959.0 | 961.0 | 1267.0 | 0.7585 | 0.7569 |
Framework versions
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1
- Downloads last month
- 2
Model tree for donoway/GSM8K-Binary_Llama-3.2-1B-bpzb2ktm
Base model
meta-llama/Llama-3.2-1B