GSM8K-Binary_Llama-3.2-1B-f5e1gp29
This model is a fine-tuned version of meta-llama/Llama-3.2-1B on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.3710
- Model Preparation Time: 0.0058
- Mdl: 4895.4536
- Accumulated Loss: 3393.2699
- Correct Preds: 1979.0
- Total Preds: 2475.0
- Accuracy: 0.7996
- Correct Gen Preds: 1881.0
- Gen Accuracy: 0.76
- Correct Gen Preds 34192: 895.0
- Correct Preds 34192: 956.0
- Total Labels 34192: 1196.0
- Accuracy 34192: 0.7993
- Gen Accuracy 34192: 0.7483
- Correct Gen Preds 41568: 979.0
- Correct Preds 41568: 1023.0
- Total Labels 41568: 1267.0
- Accuracy 41568: 0.8074
- Gen Accuracy 41568: 0.7727
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 32
- eval_batch_size: 64
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- lr_scheduler_warmup_ratio: 0.001
- num_epochs: 100
Training results
| Training Loss | Epoch | Step | Validation Loss | Model Preparation Time | Mdl | Accumulated Loss | Correct Preds | Total Preds | Accuracy | Correct Gen Preds | Gen Accuracy | Correct Gen Preds 34192 | Correct Preds 34192 | Total Labels 34192 | Accuracy 34192 | Gen Accuracy 34192 | Correct Gen Preds 41568 | Correct Preds 41568 | Total Labels 41568 | Accuracy 41568 | Gen Accuracy 41568 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| No log | 0 | 0 | 1.4656 | 0.0058 | 5233.1723 | 3627.3586 | 1196.0 | 2475.0 | 0.4832 | 1204.0 | 0.4865 | 1196.0 | 1196.0 | 1196.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1267.0 | 0.0 | 0.0 |
| 0.3983 | 1.0 | 49 | 0.5710 | 0.0058 | 2038.7089 | 1413.1253 | 1846.0 | 2475.0 | 0.7459 | 8.0 | 0.0032 | 0.0 | 1068.0 | 1196.0 | 0.8930 | 0.0 | 0.0 | 778.0 | 1267.0 | 0.6140 | 0.0 |
| 0.7238 | 2.0 | 98 | 0.5804 | 0.0058 | 2072.3256 | 1436.4266 | 1842.0 | 2475.0 | 0.7442 | 54.0 | 0.0218 | 0.0 | 695.0 | 1196.0 | 0.5811 | 0.0 | 46.0 | 1147.0 | 1267.0 | 0.9053 | 0.0363 |
| 0.2749 | 3.0 | 147 | 0.6253 | 0.0058 | 2232.6603 | 1547.5622 | 1858.0 | 2475.0 | 0.7507 | 64.0 | 0.0259 | 0.0 | 701.0 | 1196.0 | 0.5861 | 0.0 | 56.0 | 1157.0 | 1267.0 | 0.9132 | 0.0442 |
| 0.597 | 4.0 | 196 | 0.8394 | 0.0058 | 2997.3738 | 2077.6212 | 1911.0 | 2475.0 | 0.7721 | 212.0 | 0.0857 | 95.0 | 1127.0 | 1196.0 | 0.9423 | 0.0794 | 109.0 | 784.0 | 1267.0 | 0.6188 | 0.0860 |
| 0.0501 | 5.0 | 245 | 0.8325 | 0.0058 | 2972.4858 | 2060.3702 | 1947.0 | 2475.0 | 0.7867 | 902.0 | 0.3644 | 211.0 | 870.0 | 1196.0 | 0.7274 | 0.1764 | 682.0 | 1077.0 | 1267.0 | 0.8500 | 0.5383 |
| 0.4066 | 6.0 | 294 | 1.4218 | 0.0058 | 5076.7544 | 3518.9380 | 1962.0 | 2475.0 | 0.7927 | 1696.0 | 0.6853 | 936.0 | 1062.0 | 1196.0 | 0.8880 | 0.7826 | 750.0 | 900.0 | 1267.0 | 0.7103 | 0.5919 |
| 0.9865 | 7.0 | 343 | 1.2377 | 0.0058 | 4419.2414 | 3063.1847 | 1900.0 | 2475.0 | 0.7677 | 1592.0 | 0.6432 | 600.0 | 778.0 | 1196.0 | 0.6505 | 0.5017 | 983.0 | 1122.0 | 1267.0 | 0.8856 | 0.7758 |
| 0.0072 | 8.0 | 392 | 1.3710 | 0.0058 | 4895.4536 | 3393.2699 | 1979.0 | 2475.0 | 0.7996 | 1881.0 | 0.76 | 895.0 | 956.0 | 1196.0 | 0.7993 | 0.7483 | 979.0 | 1023.0 | 1267.0 | 0.8074 | 0.7727 |
| 0.0 | 9.0 | 441 | 1.6015 | 0.0058 | 5718.5637 | 3963.8063 | 1971.0 | 2475.0 | 0.7964 | 1898.0 | 0.7669 | 958.0 | 999.0 | 1196.0 | 0.8353 | 0.8010 | 932.0 | 972.0 | 1267.0 | 0.7672 | 0.7356 |
| 0.0007 | 10.0 | 490 | 1.7766 | 0.0058 | 6343.6664 | 4397.0945 | 1967.0 | 2475.0 | 0.7947 | 1930.0 | 0.7798 | 998.0 | 1022.0 | 1196.0 | 0.8545 | 0.8344 | 924.0 | 945.0 | 1267.0 | 0.7459 | 0.7293 |
| 0.0 | 11.0 | 539 | 1.8418 | 0.0058 | 6576.5420 | 4558.5115 | 1960.0 | 2475.0 | 0.7919 | 1930.0 | 0.7798 | 1013.0 | 1034.0 | 1196.0 | 0.8645 | 0.8470 | 909.0 | 926.0 | 1267.0 | 0.7309 | 0.7174 |
| 0.0 | 12.0 | 588 | 1.8374 | 0.0058 | 6560.5850 | 4547.4510 | 1960.0 | 2475.0 | 0.7919 | 1926.0 | 0.7782 | 1009.0 | 1033.0 | 1196.0 | 0.8637 | 0.8436 | 909.0 | 927.0 | 1267.0 | 0.7316 | 0.7174 |
| 0.0 | 13.0 | 637 | 1.8365 | 0.0058 | 6557.4102 | 4545.2504 | 1958.0 | 2475.0 | 0.7911 | 1928.0 | 0.7790 | 1010.0 | 1031.0 | 1196.0 | 0.8620 | 0.8445 | 910.0 | 927.0 | 1267.0 | 0.7316 | 0.7182 |
| 0.0 | 14.0 | 686 | 1.8354 | 0.0058 | 6553.5613 | 4542.5825 | 1958.0 | 2475.0 | 0.7911 | 1932.0 | 0.7806 | 1011.0 | 1030.0 | 1196.0 | 0.8612 | 0.8453 | 913.0 | 928.0 | 1267.0 | 0.7324 | 0.7206 |
| 0.0 | 15.0 | 735 | 1.8357 | 0.0058 | 6554.7112 | 4543.3796 | 1964.0 | 2475.0 | 0.7935 | 1934.0 | 0.7814 | 1010.0 | 1031.0 | 1196.0 | 0.8620 | 0.8445 | 916.0 | 933.0 | 1267.0 | 0.7364 | 0.7230 |
| 0.4056 | 16.0 | 784 | 1.8367 | 0.0058 | 6558.1779 | 4545.7825 | 1963.0 | 2475.0 | 0.7931 | 1938.0 | 0.7830 | 1011.0 | 1030.0 | 1196.0 | 0.8612 | 0.8453 | 919.0 | 933.0 | 1267.0 | 0.7364 | 0.7253 |
| 0.0 | 17.0 | 833 | 1.8323 | 0.0058 | 6542.6779 | 4535.0388 | 1963.0 | 2475.0 | 0.7931 | 1935.0 | 0.7818 | 1012.0 | 1030.0 | 1196.0 | 0.8612 | 0.8462 | 915.0 | 933.0 | 1267.0 | 0.7364 | 0.7222 |
| 0.0 | 18.0 | 882 | 1.8342 | 0.0058 | 6549.3643 | 4539.6734 | 1964.0 | 2475.0 | 0.7935 | 1939.0 | 0.7834 | 1012.0 | 1030.0 | 1196.0 | 0.8612 | 0.8462 | 919.0 | 934.0 | 1267.0 | 0.7372 | 0.7253 |
Framework versions
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1
- Downloads last month
- 2
Model tree for donoway/GSM8K-Binary_Llama-3.2-1B-f5e1gp29
Base model
meta-llama/Llama-3.2-1B