rel-bf16-math-code-instruction-lr2e-5-g0.997-l1.0-gpu8-bs8-ga16-ep2-wu50-cut3000
This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct on the open_code_reasoning_2_python_train and the deepmath_103k_train datasets. It achieves the following results on the evaluation set:
- Loss: 0.0094
- Token Mean Mae: 1297119276.5064
- Token Mean Rmse: 702740.8851
- Token Mean Seq Mean Mae: 71988.1939
- Token Mean Seq Mean Rmse: 2890.4310
- Token Mean Relerr: 0.2466
- Token Mean Seq Mean Relerr: 0.2451
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 12
- seed: 0
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 32
- total_train_batch_size: 2048
- total_eval_batch_size: 96
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 50
- num_epochs: 2.0
Training results
| Training Loss | Epoch | Step | Validation Loss | Token Mean Mae | Token Mean Relerr | Token Mean Rmse | Token Mean Seq Mean Mae | Token Mean Seq Mean Relerr | Token Mean Seq Mean Rmse |
|---|---|---|---|---|---|---|---|---|---|
| 0.4755 | 0.1314 | 100 | 0.0135 | 1456415405.9805 | 0.3054 | 767039.2951 | 80777.0375 | 0.3126 | 3262.0017 |
| 0.3622 | 0.2628 | 200 | 0.0112 | 1506335022.9417 | 0.2737 | 792339.7735 | 83796.5537 | 0.2648 | 3314.5851 |
| 0.3805 | 0.3942 | 300 | 0.0112 | 1604541414.7167 | 0.2824 | 821278.0011 | 89832.8002 | 0.2687 | 3525.6876 |
| 0.3575 | 0.5256 | 400 | 0.0103 | 1353430084.3613 | 0.2605 | 718862.3113 | 75235.0765 | 0.2602 | 3028.8783 |
| 0.3151 | 0.6570 | 500 | 0.0108 | 1475595485.0556 | 0.2645 | 778537.6656 | 81954.1302 | 0.2542 | 3241.0204 |
| 0.3036 | 0.7884 | 600 | 0.0103 | 1450872533.8526 | 0.2586 | 768687.6390 | 80762.7081 | 0.2485 | 3187.3654 |
| 0.2912 | 0.9198 | 700 | 0.0100 | 1319307979.4934 | 0.2572 | 713181.0907 | 73189.7899 | 0.2582 | 2943.4583 |
| 0.2822 | 1.0499 | 800 | 0.0098 | 1331232200.1018 | 0.2579 | 722341.9749 | 73859.6232 | 0.2583 | 2959.9273 |
| 0.2784 | 1.1813 | 900 | 0.0097 | 1299024105.2114 | 0.2518 | 703705.5372 | 72039.4788 | 0.2523 | 2910.7108 |
| 0.2686 | 1.3127 | 1000 | 0.0096 | 1328926572.0214 | 0.2492 | 717018.2651 | 73735.3429 | 0.2460 | 2951.7896 |
| 0.2712 | 1.4441 | 1100 | 0.0095 | 1324941036.8788 | 0.2483 | 711871.2977 | 73622.6563 | 0.2450 | 2955.1585 |
| 0.2574 | 1.5755 | 1200 | 0.0094 | 1309249340.9649 | 0.2471 | 709189.9430 | 72619.9943 | 0.2449 | 2909.5304 |
| 0.2634 | 1.7069 | 1300 | 0.0094 | 1301741137.4120 | 705186.4313 | 72221.8683 | 2900.8766 | 0.2470 | 0.2453 |
| 0.2576 | 1.8383 | 1400 | 0.0094 | 1300029599.8485 | 704446.0606 | 72146.3161 | 2894.7313 | 0.2467 | 0.2452 |
| 0.2844 | 1.9697 | 1500 | 0.0094 | 1297352931.4169 | 702856.5783 | 72001.5767 | 2890.6912 | 0.2465 | 0.2451 |
Framework versions
- Transformers 5.0.0
- Pytorch 2.10.0+cu128
- Datasets 4.0.0
- Tokenizers 0.22.2
- Downloads last month
- 3