GSM8K-Binary_Llama-3.2-1B-thbeu27y
This model is a fine-tuned version of meta-llama/Llama-3.2-1B on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.6734
- Model Preparation Time: 0.0026
- Mdl: 5975.0439
- Accumulated Loss: 4141.5848
- Correct Preds: 2006.0
- Total Preds: 2475.0
- Accuracy: 0.8105
- Correct Gen Preds: 2008.0
- Gen Accuracy: 0.8113
- Correct Gen Preds 34192: 1008.0
- Correct Preds 34192: 1011.0
- Total Labels 34192: 1196.0
- Accuracy 34192: 0.8453
- Gen Accuracy 34192: 0.8428
- Correct Gen Preds 41568: 992.0
- Correct Preds 41568: 995.0
- Total Labels 41568: 1267.0
- Accuracy 41568: 0.7853
- Gen Accuracy 41568: 0.7830
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 32
- eval_batch_size: 64
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.01
- num_epochs: 100
Training results
| Training Loss | Epoch | Step | Validation Loss | Model Preparation Time | Mdl | Accumulated Loss | Correct Preds | Total Preds | Accuracy | Correct Gen Preds | Gen Accuracy | Correct Gen Preds 34192 | Correct Preds 34192 | Total Labels 34192 | Accuracy 34192 | Gen Accuracy 34192 | Correct Gen Preds 41568 | Correct Preds 41568 | Total Labels 41568 | Accuracy 41568 | Gen Accuracy 41568 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| No log | 0 | 0 | 1.4656 | 0.0026 | 5233.1723 | 3627.3586 | 1196.0 | 2475.0 | 0.4832 | 1204.0 | 0.4865 | 1196.0 | 1196.0 | 1196.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1267.0 | 0.0 | 0.0 |
| 0.5402 | 1.0 | 52 | 0.5851 | 0.0026 | 2089.1467 | 1448.0861 | 1830.0 | 2475.0 | 0.7394 | 1830.0 | 0.7394 | 707.0 | 711.0 | 1196.0 | 0.5945 | 0.5911 | 1115.0 | 1119.0 | 1267.0 | 0.8832 | 0.8800 |
| 0.6372 | 2.0 | 104 | 0.4997 | 0.0026 | 1784.3730 | 1236.8331 | 1971.0 | 2475.0 | 0.7964 | 1778.0 | 0.7184 | 907.0 | 969.0 | 1196.0 | 0.8102 | 0.7584 | 862.0 | 1002.0 | 1267.0 | 0.7908 | 0.6803 |
| 0.4523 | 3.0 | 156 | 0.4722 | 0.0026 | 1686.1189 | 1168.7286 | 1996.0 | 2475.0 | 0.8065 | 1420.0 | 0.5737 | 732.0 | 979.0 | 1196.0 | 0.8186 | 0.6120 | 681.0 | 1017.0 | 1267.0 | 0.8027 | 0.5375 |
| 0.1558 | 4.0 | 208 | 0.6308 | 0.0026 | 2252.3394 | 1561.2027 | 1920.0 | 2475.0 | 0.7758 | 1016.0 | 0.4105 | 737.0 | 1129.0 | 1196.0 | 0.9440 | 0.6162 | 271.0 | 791.0 | 1267.0 | 0.6243 | 0.2139 |
| 0.2356 | 5.0 | 260 | 0.6758 | 0.0026 | 2413.1342 | 1672.6572 | 1992.0 | 2475.0 | 0.8048 | 1324.0 | 0.5349 | 516.0 | 957.0 | 1196.0 | 0.8002 | 0.4314 | 800.0 | 1035.0 | 1267.0 | 0.8169 | 0.6314 |
| 0.0194 | 6.0 | 312 | 0.9312 | 0.0026 | 3325.0850 | 2304.7733 | 1940.0 | 2475.0 | 0.7838 | 1685.0 | 0.6808 | 961.0 | 1108.0 | 1196.0 | 0.9264 | 0.8035 | 715.0 | 832.0 | 1267.0 | 0.6567 | 0.5643 |
| 0.028 | 7.0 | 364 | 1.1226 | 0.0026 | 4008.2597 | 2778.3139 | 1998.0 | 2475.0 | 0.8073 | 1949.0 | 0.7875 | 983.0 | 1011.0 | 1196.0 | 0.8453 | 0.8219 | 956.0 | 987.0 | 1267.0 | 0.7790 | 0.7545 |
| 0.3 | 8.0 | 416 | 1.1704 | 0.0026 | 4178.9701 | 2896.6413 | 1929.0 | 2475.0 | 0.7794 | 1915.0 | 0.7737 | 813.0 | 827.0 | 1196.0 | 0.6915 | 0.6798 | 1092.0 | 1102.0 | 1267.0 | 0.8698 | 0.8619 |
| 0.5882 | 9.0 | 468 | 1.5126 | 0.0026 | 5400.9251 | 3743.6360 | 1996.0 | 2475.0 | 0.8065 | 1983.0 | 0.8012 | 1035.0 | 1046.0 | 1196.0 | 0.8746 | 0.8654 | 940.0 | 950.0 | 1267.0 | 0.7498 | 0.7419 |
| 0.0 | 10.0 | 520 | 1.6734 | 0.0026 | 5975.0439 | 4141.5848 | 2006.0 | 2475.0 | 0.8105 | 2008.0 | 0.8113 | 1008.0 | 1011.0 | 1196.0 | 0.8453 | 0.8428 | 992.0 | 995.0 | 1267.0 | 0.7853 | 0.7830 |
| 0.5881 | 11.0 | 572 | 1.7454 | 0.0026 | 6232.2057 | 4319.8358 | 1999.0 | 2475.0 | 0.8077 | 1999.0 | 0.8077 | 1024.0 | 1028.0 | 1196.0 | 0.8595 | 0.8562 | 967.0 | 971.0 | 1267.0 | 0.7664 | 0.7632 |
| 0.0 | 12.0 | 624 | 1.7462 | 0.0026 | 6235.0887 | 4321.8342 | 1997.0 | 2475.0 | 0.8069 | 1999.0 | 0.8077 | 1018.0 | 1021.0 | 1196.0 | 0.8537 | 0.8512 | 973.0 | 976.0 | 1267.0 | 0.7703 | 0.7680 |
| 0.5881 | 13.0 | 676 | 1.7479 | 0.0026 | 6241.0599 | 4325.9731 | 1998.0 | 2475.0 | 0.8073 | 2000.0 | 0.8081 | 1020.0 | 1023.0 | 1196.0 | 0.8554 | 0.8528 | 972.0 | 975.0 | 1267.0 | 0.7695 | 0.7672 |
| 0.0 | 14.0 | 728 | 1.7475 | 0.0026 | 6239.8590 | 4325.1407 | 1998.0 | 2475.0 | 0.8073 | 2001.0 | 0.8085 | 1020.0 | 1023.0 | 1196.0 | 0.8554 | 0.8528 | 973.0 | 975.0 | 1267.0 | 0.7695 | 0.7680 |
| 0.0 | 15.0 | 780 | 1.7505 | 0.0026 | 6250.3925 | 4332.4419 | 1998.0 | 2475.0 | 0.8073 | 2000.0 | 0.8081 | 1021.0 | 1025.0 | 1196.0 | 0.8570 | 0.8537 | 971.0 | 973.0 | 1267.0 | 0.7680 | 0.7664 |
| 0.0 | 16.0 | 832 | 1.7500 | 0.0026 | 6248.8051 | 4331.3416 | 1998.0 | 2475.0 | 0.8073 | 1999.0 | 0.8077 | 1020.0 | 1023.0 | 1196.0 | 0.8554 | 0.8528 | 971.0 | 975.0 | 1267.0 | 0.7695 | 0.7664 |
| 0.0 | 17.0 | 884 | 1.7511 | 0.0026 | 6252.4822 | 4333.8904 | 1999.0 | 2475.0 | 0.8077 | 2002.0 | 0.8089 | 1021.0 | 1024.0 | 1196.0 | 0.8562 | 0.8537 | 973.0 | 975.0 | 1267.0 | 0.7695 | 0.7680 |
| 0.0 | 18.0 | 936 | 1.7545 | 0.0026 | 6264.7927 | 4342.4234 | 1999.0 | 2475.0 | 0.8077 | 2000.0 | 0.8081 | 1020.0 | 1024.0 | 1196.0 | 0.8562 | 0.8528 | 972.0 | 975.0 | 1267.0 | 0.7695 | 0.7672 |
| 0.0 | 19.0 | 988 | 1.7568 | 0.0026 | 6272.8945 | 4348.0391 | 1997.0 | 2475.0 | 0.8069 | 1999.0 | 0.8077 | 1019.0 | 1022.0 | 1196.0 | 0.8545 | 0.8520 | 972.0 | 975.0 | 1267.0 | 0.7695 | 0.7672 |
| 0.0 | 20.0 | 1040 | 1.7562 | 0.0026 | 6270.8480 | 4346.6206 | 1998.0 | 2475.0 | 0.8073 | 2000.0 | 0.8081 | 1020.0 | 1023.0 | 1196.0 | 0.8554 | 0.8528 | 972.0 | 975.0 | 1267.0 | 0.7695 | 0.7672 |
Framework versions
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1
- Downloads last month
- 2
Model tree for donoway/GSM8K-Binary_Llama-3.2-1B-thbeu27y
Base model
meta-llama/Llama-3.2-1B