rel-bf16-math-code-instruction-lr2e-5-g0.997-l1.0-gpu8-bs8-ga16-ep2-wu50-cut3000

This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct on the open_code_reasoning_2_python_train and the deepmath_103k_train datasets. It achieves the following results on the evaluation set:

  • Loss: 0.0094
  • Token Mean Mae: 1297119276.5064
  • Token Mean Rmse: 702740.8851
  • Token Mean Seq Mean Mae: 71988.1939
  • Token Mean Seq Mean Rmse: 2890.4310
  • Token Mean Relerr: 0.2466
  • Token Mean Seq Mean Relerr: 0.2451

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 12
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 2048
  • total_eval_batch_size: 96
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • num_epochs: 2.0

Training results

Training Loss Epoch Step Validation Loss Token Mean Mae Token Mean Relerr Token Mean Rmse Token Mean Seq Mean Mae Token Mean Seq Mean Relerr Token Mean Seq Mean Rmse
0.4755 0.1314 100 0.0135 1456415405.9805 0.3054 767039.2951 80777.0375 0.3126 3262.0017
0.3622 0.2628 200 0.0112 1506335022.9417 0.2737 792339.7735 83796.5537 0.2648 3314.5851
0.3805 0.3942 300 0.0112 1604541414.7167 0.2824 821278.0011 89832.8002 0.2687 3525.6876
0.3575 0.5256 400 0.0103 1353430084.3613 0.2605 718862.3113 75235.0765 0.2602 3028.8783
0.3151 0.6570 500 0.0108 1475595485.0556 0.2645 778537.6656 81954.1302 0.2542 3241.0204
0.3036 0.7884 600 0.0103 1450872533.8526 0.2586 768687.6390 80762.7081 0.2485 3187.3654
0.2912 0.9198 700 0.0100 1319307979.4934 0.2572 713181.0907 73189.7899 0.2582 2943.4583
0.2822 1.0499 800 0.0098 1331232200.1018 0.2579 722341.9749 73859.6232 0.2583 2959.9273
0.2784 1.1813 900 0.0097 1299024105.2114 0.2518 703705.5372 72039.4788 0.2523 2910.7108
0.2686 1.3127 1000 0.0096 1328926572.0214 0.2492 717018.2651 73735.3429 0.2460 2951.7896
0.2712 1.4441 1100 0.0095 1324941036.8788 0.2483 711871.2977 73622.6563 0.2450 2955.1585
0.2574 1.5755 1200 0.0094 1309249340.9649 0.2471 709189.9430 72619.9943 0.2449 2909.5304
0.2634 1.7069 1300 0.0094 1301741137.4120 705186.4313 72221.8683 2900.8766 0.2470 0.2453
0.2576 1.8383 1400 0.0094 1300029599.8485 704446.0606 72146.3161 2894.7313 0.2467 0.2452
0.2844 1.9697 1500 0.0094 1297352931.4169 702856.5783 72001.5767 2890.6912 0.2465 0.2451

Framework versions

  • Transformers 5.0.0
  • Pytorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
3
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for namezz/lvm-rel-a-qwen2.5-3b-instruct-b-qwen2.5-3b-instruct

Base model

Qwen/Qwen2.5-3B
Finetuned
(1190)
this model