abs-bf16-7b-math-code-instruction-lr2e-5-g0.997-l1.0-gpu4-bs8-ga32-ep2-wu0-cut3000

This model is a fine-tuned version of /blue/ericxwang.ucsb/zzhan483.ucsc/models/Qwen/Qwen2.5-7B-Instruct on the 7b_math_95k_2_train, the 7b_code_100k_2_train and the 7b_instruction_100k_2_train datasets. It achieves the following results on the evaluation set:

  • Loss: 0.0041
  • Token Mean Mae: 1610558922.1324
  • Token Mean Rmse: 803252.6947
  • Token Mean Seq Mean Mae: 56994.9658
  • Token Mean Seq Mean Rmse: 2333.1020
  • Token Mean Relerr: 0.2722
  • Token Mean Seq Mean Relerr: 0.3018

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 1024
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 2.0

Training results

Training Loss Epoch Step Validation Loss Token Mean Mae Token Mean Rmse Token Mean Seq Mean Mae Token Mean Seq Mean Rmse Token Mean Relerr Token Mean Seq Mean Relerr
0.1852 0.0868 50 0.0059 1907160306.1471 930283.1019 66624.9641 2711.1534 0.4319 0.5864
0.2126 0.1737 100 0.0055 1851835288.9751 896672.0433 65090.5387 2682.0112 0.3751 0.4526
0.1595 0.2605 150 0.0050 1876889416.7664 928085.8894 65985.0544 2660.5587 0.3031 0.3266
0.1648 0.3473 200 0.0047 1695688670.0366 833579.5052 59803.6419 2465.7253 0.2989 0.3304
0.1442 0.4341 250 0.0045 1740636749.0589 882800.1614 60960.1234 2464.5775 0.2982 0.3583
0.1378 0.5210 300 0.0044 1714206070.0226 863226.5934 60292.4662 2448.7114 0.2908 0.3270
0.1447 0.6078 350 0.0047 1843493220.4167 921955.9684 64646.9641 2602.8592 0.2809 0.2948
0.1375 0.6946 400 0.0044 1664994861.1452 836578.3385 58447.9679 2403.9304 0.3329 0.4024
0.1271 0.7815 450 0.0043 1670403568.1074 850520.7112 58580.1415 2400.5498 0.2737 0.2928
0.1290 0.8683 500 0.0042 1679348890.4706 848237.0045 59136.7752 2409.9989 0.2813 0.3124
0.1417 0.9551 550 0.0042 1707826151.5538 869221.7546 59880.2944 2427.0415 0.2859 0.3147
0.0907 1.0417 600 0.0042 1645061997.1007 842375.6200 57551.9569 2352.6436 0.2742 0.3057
0.0971 1.1285 650 0.0042 1702075684.7239 867903.5378 59530.6614 2408.3558 0.2720 0.2965
0.1009 1.2153 700 0.0042 1662731180.8235 850913.0256 58185.7074 2376.2369 0.2764 0.3033
0.1026 1.3022 750 0.0042 1667689617.8062 849387.5361 58762.5025 2387.5846 0.2687 0.2900
0.1017 1.3890 800 0.0043 1651699128.7428 830843.4633 58324.0720 2389.4457 0.2739 0.2993
0.0953 1.4758 850 0.0042 1634418392.2972 838966.8687 57134.1139 2336.2125 0.2879 0.3227
0.0977 1.5627 900 0.0042 1607481269.1738 818639.6959 56572.7680 2321.0729 0.2757 0.3067
0.1006 1.6495 950 0.0043 1691131081.2398 854469.1827 59520.5266 2413.2577 0.2675 0.2842
0.1072 1.7363 1000 0.0041 1610062288.9798 798837.3019 57059.2424 2334.3221 0.2940 0.3303
0.1097 1.8231 1050 0.0041 1653332151.5669 832029.1876 58211.0662 2362.7339 0.2798 0.3081
0.1074 1.9100 1100 0.0041 1661387640.9335 837864.4103 58503.7164 2360.7811 0.2835 0.3246
0.0977 1.9968 1150 0.0041 1617136020.5541 808006.3436 57160.0393 2338.4854 0.2782 0.3126

Framework versions

  • Transformers 5.0.0
  • Pytorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
136
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for namezz/lvm-a-qwen2.5-7b-instruct-b-qwen2.5-7b-instruct

Base model

Qwen/Qwen2.5-7B
Finetuned
(3218)
this model

Collection including namezz/lvm-a-qwen2.5-7b-instruct-b-qwen2.5-7b-instruct