abs-bf16-7b-math-code-instruction-lr2e-5-g0.997-l1.0-gpu4-bs8-ga32-ep2-wu0-cut3000

This model is a fine-tuned version of /blue/ericxwang.ucsb/zzhan483.ucsc/models/Qwen/Qwen2.5-7B-Instruct on the 7b_math_95k_2_train, the 7b_code_100k_2_train and the 7b_instruction_100k_2_train datasets. It achieves the following results on the evaluation set:

Loss: 0.0041
Token Mean Mae: 1610558922.1324
Token Mean Rmse: 803252.6947
Token Mean Seq Mean Mae: 56994.9658
Token Mean Seq Mean Rmse: 2333.1020
Token Mean Relerr: 0.2722
Token Mean Seq Mean Relerr: 0.3018

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 0
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 32
total_train_batch_size: 1024
total_eval_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 2.0

Training results

Training Loss	Epoch	Step	Validation Loss	Token Mean Mae	Token Mean Rmse	Token Mean Seq Mean Mae	Token Mean Seq Mean Rmse	Token Mean Relerr	Token Mean Seq Mean Relerr
0.1852	0.0868	50	0.0059	1907160306.1471	930283.1019	66624.9641	2711.1534	0.4319	0.5864
0.2126	0.1737	100	0.0055	1851835288.9751	896672.0433	65090.5387	2682.0112	0.3751	0.4526
0.1595	0.2605	150	0.0050	1876889416.7664	928085.8894	65985.0544	2660.5587	0.3031	0.3266
0.1648	0.3473	200	0.0047	1695688670.0366	833579.5052	59803.6419	2465.7253	0.2989	0.3304
0.1442	0.4341	250	0.0045	1740636749.0589	882800.1614	60960.1234	2464.5775	0.2982	0.3583
0.1378	0.5210	300	0.0044	1714206070.0226	863226.5934	60292.4662	2448.7114	0.2908	0.3270
0.1447	0.6078	350	0.0047	1843493220.4167	921955.9684	64646.9641	2602.8592	0.2809	0.2948
0.1375	0.6946	400	0.0044	1664994861.1452	836578.3385	58447.9679	2403.9304	0.3329	0.4024
0.1271	0.7815	450	0.0043	1670403568.1074	850520.7112	58580.1415	2400.5498	0.2737	0.2928
0.1290	0.8683	500	0.0042	1679348890.4706	848237.0045	59136.7752	2409.9989	0.2813	0.3124
0.1417	0.9551	550	0.0042	1707826151.5538	869221.7546	59880.2944	2427.0415	0.2859	0.3147
0.0907	1.0417	600	0.0042	1645061997.1007	842375.6200	57551.9569	2352.6436	0.2742	0.3057
0.0971	1.1285	650	0.0042	1702075684.7239	867903.5378	59530.6614	2408.3558	0.2720	0.2965
0.1009	1.2153	700	0.0042	1662731180.8235	850913.0256	58185.7074	2376.2369	0.2764	0.3033
0.1026	1.3022	750	0.0042	1667689617.8062	849387.5361	58762.5025	2387.5846	0.2687	0.2900
0.1017	1.3890	800	0.0043	1651699128.7428	830843.4633	58324.0720	2389.4457	0.2739	0.2993
0.0953	1.4758	850	0.0042	1634418392.2972	838966.8687	57134.1139	2336.2125	0.2879	0.3227
0.0977	1.5627	900	0.0042	1607481269.1738	818639.6959	56572.7680	2321.0729	0.2757	0.3067
0.1006	1.6495	950	0.0043	1691131081.2398	854469.1827	59520.5266	2413.2577	0.2675	0.2842
0.1072	1.7363	1000	0.0041	1610062288.9798	798837.3019	57059.2424	2334.3221	0.2940	0.3303
0.1097	1.8231	1050	0.0041	1653332151.5669	832029.1876	58211.0662	2362.7339	0.2798	0.3081
0.1074	1.9100	1100	0.0041	1661387640.9335	837864.4103	58503.7164	2360.7811	0.2835	0.3246
0.0977	1.9968	1150	0.0041	1617136020.5541	808006.3436	57160.0393	2338.4854	0.2782	0.3126