GSM8K-Binary_Llama-3.2-1B-jevfwxa5
This model is a fine-tuned version of meta-llama/Llama-3.2-1B on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.3576
- Model Preparation Time: 0.0059
- Mdl: 4847.5727
- Accumulated Loss: 3360.0814
- Correct Preds: 1896.0
- Total Preds: 2475.0
- Accuracy: 0.7661
- Correct Gen Preds: 1904.0
- Gen Accuracy: 0.7693
- Correct Gen Preds 34192: 961.0
- Correct Preds 34192: 961.0
- Total Labels 34192: 1196.0
- Accuracy 34192: 0.8035
- Gen Accuracy 34192: 0.8035
- Correct Gen Preds 41568: 935.0
- Correct Preds 41568: 935.0
- Total Labels 41568: 1267.0
- Accuracy 41568: 0.7380
- Gen Accuracy 41568: 0.7380
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 32
- eval_batch_size: 64
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.01
- num_epochs: 100
Training results
| Training Loss | Epoch | Step | Validation Loss | Model Preparation Time | Mdl | Accumulated Loss | Correct Preds | Total Preds | Accuracy | Correct Gen Preds | Gen Accuracy | Correct Gen Preds 34192 | Correct Preds 34192 | Total Labels 34192 | Accuracy 34192 | Gen Accuracy 34192 | Correct Gen Preds 41568 | Correct Preds 41568 | Total Labels 41568 | Accuracy 41568 | Gen Accuracy 41568 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| No log | 0 | 0 | 1.4656 | 0.0059 | 5233.1723 | 3627.3586 | 1196.0 | 2475.0 | 0.4832 | 1204.0 | 0.4865 | 1196.0 | 1196.0 | 1196.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1267.0 | 0.0 | 0.0 |
| 0.872 | 1.0 | 8 | 0.7247 | 0.0059 | 2587.5709 | 1793.5675 | 1455.0 | 2475.0 | 0.5879 | 8.0 | 0.0032 | 0.0 | 481.0 | 1196.0 | 0.4022 | 0.0 | 0.0 | 974.0 | 1267.0 | 0.7687 | 0.0 |
| 0.6521 | 2.0 | 16 | 0.6294 | 0.0059 | 2247.2598 | 1557.6818 | 1770.0 | 2475.0 | 0.7152 | 10.0 | 0.0040 | 0.0 | 912.0 | 1196.0 | 0.7625 | 0.0 | 2.0 | 858.0 | 1267.0 | 0.6772 | 0.0016 |
| 0.473 | 3.0 | 24 | 0.7213 | 0.0059 | 2575.3764 | 1785.1149 | 1616.0 | 2475.0 | 0.6529 | 14.0 | 0.0057 | 0.0 | 1160.0 | 1196.0 | 0.9699 | 0.0 | 6.0 | 456.0 | 1267.0 | 0.3599 | 0.0047 |
| 0.2653 | 4.0 | 32 | 0.5926 | 0.0059 | 2115.9963 | 1466.6969 | 1824.0 | 2475.0 | 0.7370 | 309.0 | 0.1248 | 70.0 | 1037.0 | 1196.0 | 0.8671 | 0.0585 | 231.0 | 787.0 | 1267.0 | 0.6212 | 0.1823 |
| 1.1097 | 5.0 | 40 | 0.7075 | 0.0059 | 2526.3900 | 1751.1601 | 1843.0 | 2475.0 | 0.7446 | 528.0 | 0.2133 | 39.0 | 823.0 | 1196.0 | 0.6881 | 0.0326 | 481.0 | 1020.0 | 1267.0 | 0.8051 | 0.3796 |
| 0.1011 | 6.0 | 48 | 0.7255 | 0.0059 | 2590.4327 | 1795.5511 | 1866.0 | 2475.0 | 0.7539 | 990.0 | 0.4 | 552.0 | 1056.0 | 1196.0 | 0.8829 | 0.4615 | 430.0 | 810.0 | 1267.0 | 0.6393 | 0.3394 |
| 0.0076 | 7.0 | 56 | 0.9199 | 0.0059 | 3284.7845 | 2276.8391 | 1863.0 | 2475.0 | 0.7527 | 1675.0 | 0.6768 | 923.0 | 998.0 | 1196.0 | 0.8344 | 0.7717 | 744.0 | 865.0 | 1267.0 | 0.6827 | 0.5872 |
| 0.1554 | 8.0 | 64 | 2.4840 | 0.0059 | 8869.4230 | 6147.8155 | 1658.0 | 2475.0 | 0.6699 | 1518.0 | 0.6133 | 1106.0 | 1160.0 | 1196.0 | 0.9699 | 0.9247 | 405.0 | 498.0 | 1267.0 | 0.3931 | 0.3197 |
| 0.0043 | 9.0 | 72 | 1.8407 | 0.0059 | 6572.5429 | 4555.7396 | 1833.0 | 2475.0 | 0.7406 | 1810.0 | 0.7313 | 1104.0 | 1109.0 | 1196.0 | 0.9273 | 0.9231 | 698.0 | 724.0 | 1267.0 | 0.5714 | 0.5509 |
| 0.0012 | 10.0 | 80 | 1.3940 | 0.0059 | 4977.4916 | 3450.1343 | 1856.0 | 2475.0 | 0.7499 | 1834.0 | 0.7410 | 1030.0 | 1040.0 | 1196.0 | 0.8696 | 0.8612 | 796.0 | 816.0 | 1267.0 | 0.6440 | 0.6283 |
| 1.8099 | 11.0 | 88 | 1.1367 | 0.0059 | 4058.7883 | 2813.3377 | 1833.0 | 2475.0 | 0.7406 | 1769.0 | 0.7147 | 754.0 | 784.0 | 1196.0 | 0.6555 | 0.6304 | 1007.0 | 1049.0 | 1267.0 | 0.8279 | 0.7948 |
| 0.0013 | 12.0 | 96 | 1.4702 | 0.0059 | 5249.6676 | 3638.7923 | 1824.0 | 2475.0 | 0.7370 | 1802.0 | 0.7281 | 1073.0 | 1078.0 | 1196.0 | 0.9013 | 0.8972 | 721.0 | 746.0 | 1267.0 | 0.5888 | 0.5691 |
| 0.9049 | 13.0 | 104 | 1.2923 | 0.0059 | 4614.4626 | 3198.5017 | 1882.0 | 2475.0 | 0.7604 | 1890.0 | 0.7636 | 931.0 | 931.0 | 1196.0 | 0.7784 | 0.7784 | 951.0 | 951.0 | 1267.0 | 0.7506 | 0.7506 |
| 0.0 | 14.0 | 112 | 1.3205 | 0.0059 | 4714.9289 | 3268.1397 | 1883.0 | 2475.0 | 0.7608 | 1891.0 | 0.7640 | 909.0 | 909.0 | 1196.0 | 0.7600 | 0.7600 | 974.0 | 974.0 | 1267.0 | 0.7687 | 0.7687 |
| 0.9048 | 15.0 | 120 | 1.3371 | 0.0059 | 4774.4603 | 3309.4037 | 1890.0 | 2475.0 | 0.7636 | 1898.0 | 0.7669 | 940.0 | 940.0 | 1196.0 | 0.7860 | 0.7860 | 950.0 | 950.0 | 1267.0 | 0.7498 | 0.7498 |
| 0.9048 | 16.0 | 128 | 1.3482 | 0.0059 | 4813.9970 | 3336.8084 | 1890.0 | 2475.0 | 0.7636 | 1898.0 | 0.7669 | 948.0 | 948.0 | 1196.0 | 0.7926 | 0.7926 | 942.0 | 942.0 | 1267.0 | 0.7435 | 0.7435 |
| 0.0 | 17.0 | 136 | 1.3548 | 0.0059 | 4837.6611 | 3353.2112 | 1889.0 | 2475.0 | 0.7632 | 1897.0 | 0.7665 | 955.0 | 955.0 | 1196.0 | 0.7985 | 0.7985 | 934.0 | 934.0 | 1267.0 | 0.7372 | 0.7372 |
| 0.9048 | 18.0 | 144 | 1.3576 | 0.0059 | 4847.5727 | 3360.0814 | 1896.0 | 2475.0 | 0.7661 | 1904.0 | 0.7693 | 961.0 | 961.0 | 1196.0 | 0.8035 | 0.8035 | 935.0 | 935.0 | 1267.0 | 0.7380 | 0.7380 |
| 0.0 | 19.0 | 152 | 1.3615 | 0.0059 | 4861.3027 | 3369.5983 | 1890.0 | 2475.0 | 0.7636 | 1898.0 | 0.7669 | 958.0 | 958.0 | 1196.0 | 0.8010 | 0.8010 | 932.0 | 932.0 | 1267.0 | 0.7356 | 0.7356 |
| 0.9048 | 20.0 | 160 | 1.3636 | 0.0059 | 4868.9856 | 3374.9237 | 1890.0 | 2475.0 | 0.7636 | 1898.0 | 0.7669 | 962.0 | 962.0 | 1196.0 | 0.8043 | 0.8043 | 928.0 | 928.0 | 1267.0 | 0.7324 | 0.7324 |
| 0.0 | 21.0 | 168 | 1.3645 | 0.0059 | 4872.2361 | 3377.1767 | 1892.0 | 2475.0 | 0.7644 | 1900.0 | 0.7677 | 963.0 | 963.0 | 1196.0 | 0.8052 | 0.8052 | 929.0 | 929.0 | 1267.0 | 0.7332 | 0.7332 |
| 0.0 | 22.0 | 176 | 1.3666 | 0.0059 | 4879.8524 | 3382.4559 | 1887.0 | 2475.0 | 0.7624 | 1895.0 | 0.7657 | 961.0 | 961.0 | 1196.0 | 0.8035 | 0.8035 | 926.0 | 926.0 | 1267.0 | 0.7309 | 0.7309 |
| 0.0 | 23.0 | 184 | 1.3683 | 0.0059 | 4885.5762 | 3386.4234 | 1890.0 | 2475.0 | 0.7636 | 1898.0 | 0.7669 | 967.0 | 967.0 | 1196.0 | 0.8085 | 0.8085 | 923.0 | 923.0 | 1267.0 | 0.7285 | 0.7285 |
| 0.0 | 24.0 | 192 | 1.3706 | 0.0059 | 4893.9476 | 3392.2260 | 1884.0 | 2475.0 | 0.7612 | 1892.0 | 0.7644 | 966.0 | 966.0 | 1196.0 | 0.8077 | 0.8077 | 918.0 | 918.0 | 1267.0 | 0.7245 | 0.7245 |
| 0.0 | 25.0 | 200 | 1.3715 | 0.0059 | 4897.1641 | 3394.4555 | 1894.0 | 2475.0 | 0.7653 | 1902.0 | 0.7685 | 969.0 | 969.0 | 1196.0 | 0.8102 | 0.8102 | 925.0 | 925.0 | 1267.0 | 0.7301 | 0.7301 |
| 0.0 | 26.0 | 208 | 1.3739 | 0.0059 | 4905.6882 | 3400.3640 | 1889.0 | 2475.0 | 0.7632 | 1897.0 | 0.7665 | 966.0 | 966.0 | 1196.0 | 0.8077 | 0.8077 | 923.0 | 923.0 | 1267.0 | 0.7285 | 0.7285 |
| 0.0 | 27.0 | 216 | 1.3761 | 0.0059 | 4913.4331 | 3405.7323 | 1885.0 | 2475.0 | 0.7616 | 1893.0 | 0.7648 | 967.0 | 967.0 | 1196.0 | 0.8085 | 0.8085 | 918.0 | 918.0 | 1267.0 | 0.7245 | 0.7245 |
| 0.0 | 28.0 | 224 | 1.3770 | 0.0059 | 4916.9617 | 3408.1781 | 1891.0 | 2475.0 | 0.7640 | 1899.0 | 0.7673 | 971.0 | 971.0 | 1196.0 | 0.8119 | 0.8119 | 920.0 | 920.0 | 1267.0 | 0.7261 | 0.7261 |
| 0.0 | 29.0 | 232 | 1.3771 | 0.0059 | 4917.0011 | 3408.2055 | 1887.0 | 2475.0 | 0.7624 | 1895.0 | 0.7657 | 970.0 | 970.0 | 1196.0 | 0.8110 | 0.8110 | 917.0 | 917.0 | 1267.0 | 0.7238 | 0.7238 |
| 0.0 | 30.0 | 240 | 1.3800 | 0.0059 | 4927.4709 | 3415.4626 | 1892.0 | 2475.0 | 0.7644 | 1900.0 | 0.7677 | 975.0 | 975.0 | 1196.0 | 0.8152 | 0.8152 | 917.0 | 917.0 | 1267.0 | 0.7238 | 0.7238 |
| 0.0 | 31.0 | 248 | 1.3791 | 0.0059 | 4924.1567 | 3413.1653 | 1892.0 | 2475.0 | 0.7644 | 1900.0 | 0.7677 | 972.0 | 972.0 | 1196.0 | 0.8127 | 0.8127 | 920.0 | 920.0 | 1267.0 | 0.7261 | 0.7261 |
| 0.0 | 32.0 | 256 | 1.3816 | 0.0059 | 4933.1472 | 3419.3971 | 1889.0 | 2475.0 | 0.7632 | 1897.0 | 0.7665 | 973.0 | 973.0 | 1196.0 | 0.8135 | 0.8135 | 916.0 | 916.0 | 1267.0 | 0.7230 | 0.7230 |
| 0.0 | 33.0 | 264 | 1.3827 | 0.0059 | 4937.3087 | 3422.2816 | 1888.0 | 2475.0 | 0.7628 | 1896.0 | 0.7661 | 975.0 | 975.0 | 1196.0 | 0.8152 | 0.8152 | 913.0 | 913.0 | 1267.0 | 0.7206 | 0.7206 |
| 0.0 | 34.0 | 272 | 1.3832 | 0.0059 | 4938.8121 | 3423.3237 | 1889.0 | 2475.0 | 0.7632 | 1897.0 | 0.7665 | 974.0 | 974.0 | 1196.0 | 0.8144 | 0.8144 | 915.0 | 915.0 | 1267.0 | 0.7222 | 0.7222 |
| 0.0 | 35.0 | 280 | 1.3851 | 0.0059 | 4945.8881 | 3428.2284 | 1891.0 | 2475.0 | 0.7640 | 1899.0 | 0.7673 | 978.0 | 978.0 | 1196.0 | 0.8177 | 0.8177 | 913.0 | 913.0 | 1267.0 | 0.7206 | 0.7206 |
| 0.0 | 36.0 | 288 | 1.3842 | 0.0059 | 4942.6233 | 3425.9654 | 1892.0 | 2475.0 | 0.7644 | 1900.0 | 0.7677 | 977.0 | 977.0 | 1196.0 | 0.8169 | 0.8169 | 915.0 | 915.0 | 1267.0 | 0.7222 | 0.7222 |
| 0.0 | 37.0 | 296 | 1.3861 | 0.0059 | 4949.1341 | 3430.4784 | 1886.0 | 2475.0 | 0.7620 | 1894.0 | 0.7653 | 975.0 | 975.0 | 1196.0 | 0.8152 | 0.8152 | 911.0 | 911.0 | 1267.0 | 0.7190 | 0.7190 |
| 0.0 | 38.0 | 304 | 1.3866 | 0.0059 | 4951.1992 | 3431.9098 | 1891.0 | 2475.0 | 0.7640 | 1899.0 | 0.7673 | 976.0 | 976.0 | 1196.0 | 0.8161 | 0.8161 | 915.0 | 915.0 | 1267.0 | 0.7222 | 0.7222 |
| 0.0 | 39.0 | 312 | 1.3877 | 0.0059 | 4954.8532 | 3434.4425 | 1891.0 | 2475.0 | 0.7640 | 1899.0 | 0.7673 | 979.0 | 979.0 | 1196.0 | 0.8186 | 0.8186 | 912.0 | 912.0 | 1267.0 | 0.7198 | 0.7198 |
| 0.9048 | 40.0 | 320 | 1.3879 | 0.0059 | 4955.7900 | 3435.0919 | 1891.0 | 2475.0 | 0.7640 | 1899.0 | 0.7673 | 978.0 | 978.0 | 1196.0 | 0.8177 | 0.8177 | 913.0 | 913.0 | 1267.0 | 0.7206 | 0.7206 |
| 0.0 | 41.0 | 328 | 1.3896 | 0.0059 | 4961.8652 | 3439.3029 | 1891.0 | 2475.0 | 0.7640 | 1899.0 | 0.7673 | 978.0 | 978.0 | 1196.0 | 0.8177 | 0.8177 | 913.0 | 913.0 | 1267.0 | 0.7206 | 0.7206 |
| 0.0 | 42.0 | 336 | 1.3883 | 0.0059 | 4957.0459 | 3435.9624 | 1891.0 | 2475.0 | 0.7640 | 1899.0 | 0.7673 | 980.0 | 980.0 | 1196.0 | 0.8194 | 0.8194 | 911.0 | 911.0 | 1267.0 | 0.7190 | 0.7190 |
| 0.0 | 43.0 | 344 | 1.3898 | 0.0059 | 4962.4755 | 3439.7259 | 1889.0 | 2475.0 | 0.7632 | 1897.0 | 0.7665 | 979.0 | 979.0 | 1196.0 | 0.8186 | 0.8186 | 910.0 | 910.0 | 1267.0 | 0.7182 | 0.7182 |
| 0.0 | 44.0 | 352 | 1.3905 | 0.0059 | 4965.0345 | 3441.4997 | 1889.0 | 2475.0 | 0.7632 | 1897.0 | 0.7665 | 980.0 | 980.0 | 1196.0 | 0.8194 | 0.8194 | 909.0 | 909.0 | 1267.0 | 0.7174 | 0.7174 |
| 0.0 | 45.0 | 360 | 1.3905 | 0.0059 | 4964.9209 | 3441.4209 | 1891.0 | 2475.0 | 0.7640 | 1899.0 | 0.7673 | 981.0 | 981.0 | 1196.0 | 0.8202 | 0.8202 | 910.0 | 910.0 | 1267.0 | 0.7182 | 0.7182 |
| 0.0 | 46.0 | 368 | 1.3915 | 0.0059 | 4968.4294 | 3443.8529 | 1891.0 | 2475.0 | 0.7640 | 1899.0 | 0.7673 | 980.0 | 980.0 | 1196.0 | 0.8194 | 0.8194 | 911.0 | 911.0 | 1267.0 | 0.7190 | 0.7190 |
| 0.0 | 47.0 | 376 | 1.3918 | 0.0059 | 4969.5671 | 3444.6414 | 1889.0 | 2475.0 | 0.7632 | 1897.0 | 0.7665 | 981.0 | 981.0 | 1196.0 | 0.8202 | 0.8202 | 908.0 | 908.0 | 1267.0 | 0.7167 | 0.7167 |
| 0.0 | 48.0 | 384 | 1.3916 | 0.0059 | 4968.9056 | 3444.1829 | 1890.0 | 2475.0 | 0.7636 | 1897.0 | 0.7665 | 980.0 | 981.0 | 1196.0 | 0.8202 | 0.8194 | 909.0 | 909.0 | 1267.0 | 0.7174 | 0.7174 |
Framework versions
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1
- Downloads last month
- 2
Model tree for donoway/GSM8K-Binary_Llama-3.2-1B-jevfwxa5
Base model
meta-llama/Llama-3.2-1B