abs-data-n16-bf16-7b-instruction-lr2e-5-g0.997-l1.0-gpu4-bs8-ga32-ep2-wu50-cut3000

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the 7b_instruction_100k_16_train dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0037
  • Token Mean Mae: 386685541.7458
  • Token Mean Rmse: 453861.6612
  • Token Mean Seq Mean Mae: 31277.6791
  • Token Mean Seq Mean Rmse: 1353.6186
  • Token Mean Relerr: 0.3004
  • Token Mean Seq Mean Relerr: 0.4871

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 0
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 1024
  • total_eval_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • num_epochs: 2.0

Training results

Training Loss Epoch Step Validation Loss Token Mean Mae Token Mean Relerr Token Mean Rmse Token Mean Seq Mean Mae Token Mean Seq Mean Relerr Token Mean Seq Mean Rmse
0.0082 0.032 50 0.0080 608425155.8249 0.4860 634337.1071 49016.6042 0.8634 2066.2285
0.0049 0.064 100 0.0052 488127386.7483 0.3914 536577.3978 38943.8809 0.7013 1669.2678
0.0049 0.096 150 0.0048 475120389.5790 0.3856 531628.7342 37852.7598 0.6617 1610.7651
0.0048 0.128 200 0.0049 481805109.8853 0.3577 533769.3790 38637.7402 0.6499 1623.8317
0.0042 0.16 250 0.0045 451726004.4679 0.3549 511599.7858 36269.7874 0.6364 1546.9978
0.0043 0.192 300 0.0045 452136195.4396 0.3354 506556.3922 36330.8552 0.5529 1537.9397
0.0039 0.224 350 0.0043 431075755.8509 0.3767 483720.3007 35022.8476 0.7019 1508.8952
0.0040 0.256 400 0.0042 428685113.7043 0.3644 488718.9404 34441.1474 0.7210 1482.5354
0.0044 0.288 450 0.0043 436808096.1055 0.3618 490426.2621 35491.9788 0.6338 1509.2700
0.0036 0.32 500 0.0043 425259647.9635 0.3842 475208.2751 34596.7729 0.6689 1503.8005
0.0038 0.352 550 0.0041 424481052.6458 0.3325 485194.9456 33965.9044 0.5568 1459.7803
0.0042 0.384 600 0.0041 416627471.4934 0.3477 473553.8559 33659.7396 0.6224 1451.3301
0.0043 0.416 650 0.0041 418919517.4599 0.3307 477484.7566 33742.9811 0.5391 1452.1818
0.0043 0.448 700 0.0041 425921146.5534 0.3072 490523.6028 34315.8195 0.4590 1471.0800
0.0036 0.48 750 0.0041 412971921.8994 0.3083 472029.2194 33441.2669 0.5165 1443.8789
0.0041 0.512 800 0.0040 421228461.5054 0.3372 485035.1077 33899.3248 0.5555 1450.2831
0.0041 0.544 850 0.0040 409085666.9095 0.3424 466717.6581 33191.6016 0.5851 1440.9357
0.0038 0.576 900 0.0042 412179388.5635 0.3695 467476.0377 33627.2967 0.6272 1451.9416
0.0036 0.608 950 0.0041 408180795.0375 0.3463 462878.8329 33098.6940 0.5822 1435.1700
0.0042 0.64 1000 0.0040 414082129.8004 0.3051 474770.3497 33469.6209 0.4600 1436.9670
0.0033 0.672 1050 0.0040 416573746.2561 0.3010 478667.6112 33390.7520 0.4735 1432.4401
0.0037 0.704 1100 0.0039 409080410.9753 0.3223 471784.7408 33069.0180 0.5653 1423.4942
0.0039 0.736 1150 0.0040 414529748.5283 0.3060 480363.9079 33461.9558 0.4526 1430.7729
0.0039 0.768 1200 0.0038 408060349.0209 0.3099 472453.4875 33012.9848 0.5047 1414.1793
0.0035 0.8 1250 0.0039 410530731.4820 0.3277 478798.7453 32934.7524 0.5611 1412.0277
0.0039 0.832 1300 0.0039 400444054.3551 0.3261 457524.1809 32501.9058 0.5401 1411.6451
0.0034 0.864 1350 0.0038 401191161.5502 0.3162 461911.0799 32531.2734 0.5506 1402.6450
0.0036 0.896 1400 0.0039 411589472.5259 0.3036 479207.3545 33028.4667 0.4974 1412.5106
0.0039 0.928 1450 0.0039 403316449.4579 0.2976 461425.6247 32678.8754 0.4571 1401.9725
0.0036 0.96 1500 0.0038 397996640.9071 0.3197 460458.7850 32191.5058 0.5528 1388.1403
0.0036 0.992 1550 0.0038 397878383.1569 0.3172 459268.7323 32082.3904 0.5086 1390.6809
0.0033 1.0237 1600 0.0038 403549105.5117 0.3001 470122.2352 32460.6207 0.4609 1393.3032
0.0032 1.0557 1650 0.0038 404166162.9221 0.2993 476033.3722 32285.2127 0.4485 1389.7234
0.0031 1.0877 1700 0.0038 393612283.8630 0.3053 460389.1674 31846.0432 0.5003 1382.9924
0.0031 1.1197 1750 0.0039 401458449.5915 0.3046 464624.1376 32486.4189 0.4536 1397.6015
0.0030 1.1517 1800 0.0038 387094167.9261 0.3162 442174.7457 31852.4104 0.5006 1384.3810
0.0028 1.1837 1850 0.0038 394617804.1685 0.2920 462627.0268 31840.2037 0.4669 1375.5938
0.0032 1.2157 1900 0.0038 392901609.9683 0.2958 461826.8537 31643.1195 0.4796 1370.3818
0.0029 1.2477 1950 0.0038 401406658.1758 0.3061 471571.4465 32139.0364 0.5093 1378.6055
0.0027 1.2797 2000 0.0038 393478169.1530 0.2977 462308.5230 31684.1146 0.4902 1366.1339
0.0031 1.3117 2050 0.0038 394208720.6071 0.2976 462952.6062 31793.0456 0.4881 1371.8230
0.0031 1.3437 2100 0.0037 396781398.3369 0.3049 463251.8912 32050.0905 0.4972 1377.4668
0.0036 1.3757 2150 0.0037 395437093.0954 0.3051 462512.9335 31641.5281 0.4831 1366.7757
0.0031 1.4077 2200 0.0037 395949867.5542 0.3025 461985.5436 31789.9705 0.4970 1368.6927
0.0031 1.4397 2250 0.0037 386212822.5036 0.3106 445312.4503 31486.1063 0.5107 1370.1466
0.0028 1.4717 2300 0.0038 389202214.0629 0.3226 449137.1804 31690.3852 0.5504 1376.1187
0.0033 1.5037 2350 0.0037 391200473.4113 0.3053 458915.1965 31654.9283 0.4863 1364.4807
0.0030 1.5357 2400 0.0037 388488638.5763 0.3043 454479.3683 31442.2902 0.4930 1360.3722
0.0030 1.5677 2450 0.0037 392525139.1263 0.3009 460564.8294 31699.8567 0.4748 1365.3168
0.0030 1.5997 2500 0.0037 393120724.2463 0.2990 462593.0911 31594.6174 0.4850 1362.0217
0.0026 1.6317 2550 0.0037 386413276.0903 0.3103 450750.4296 31360.8288 0.5154 1361.8087
0.0029 1.6637 2600 0.0037 387943931.7722 0.2964 454269.9894 31459.7615 0.4839 1359.0512
0.0030 1.6957 2650 0.0037 387453941.8684 453519.8739 31395.3375 1357.8446 0.2977 0.4791
0.0030 1.7277 2700 0.0037 388700001.2806 455841.6119 31446.9061 1357.2810 0.2988 0.4788
0.0029 1.7597 2750 0.0037 388746166.5099 456414.0732 31448.9568 1356.8091 0.2962 0.4697
0.0032 1.7917 2800 0.0037 388479514.7293 456098.5994 31386.6973 1355.9567 0.2973 0.4799
0.0028 1.8237 2850 0.0037 386241442.0539 452872.4607 31286.2562 1354.1872 0.3001 0.4876
0.0029 1.8557 2900 0.0037 386503821.0774 453532.6895 31259.5156 1353.4503 0.2998 0.4842
0.0030 1.8877 2950 0.0037 387462689.0075 455169.0309 31311.5292 1353.9187 0.3002 0.4874
0.0031 1.9197 3000 0.0037 386798994.6583 454121.0297 31273.6807 1353.7956 0.3009 0.4855
0.0033 1.9517 3050 0.0037 387538013.3891 455192.9192 31321.1101 1354.1975 0.2992 0.4847
0.0028 1.9837 3100 0.0037 386674178.7403 453785.7409 31281.2366 1353.7729 0.3004 0.4870

Framework versions

  • Transformers 5.0.0
  • Pytorch 2.10.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.22.2
Downloads last month
204
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for namezz/lvm-instruct-0327-a-qwen2.5-7b-instruct-b-qwen2.5-1.5b-instruct

Finetuned
(1506)
this model

Collection including namezz/lvm-instruct-0327-a-qwen2.5-7b-instruct-b-qwen2.5-1.5b-instruct