gptoss_20b_all_zhtw_lr5e-7_ep3_16_64_128_turn_v5data_500

This model is a fine-tuned version of openai/gpt-oss-20b on the multi_turn_miss_func_zh_tw_function_mix_v5_500_turn_oss_gpt_oss_20b_pretokenized, the multi_turn_miss_param_zh_tw_function_mix_v5_500_turn_oss_gpt_oss_20b_pretokenized, the multi_turn_zh_tw_function_mix_v5_500_turn_oss_gpt_oss_20b_pretokenized, the new_irrelevance_zh_tw_oss3000_oss_gpt_oss_20b_pretokenized and the apigen_zhtwV3_remove_sys_gpt_oss_20b_pretokenized datasets. It achieves the following results on the evaluation set:

  • Loss: 1.0354

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 64
  • total_train_batch_size: 128
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 15
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss
1.5138 0.0435 10 1.7372
1.8616 0.0870 20 1.7316
1.5077 0.1305 30 1.7197
1.5093 0.1739 40 1.6988
1.5221 0.2174 50 1.6783
1.4645 0.2609 60 1.6571
1.4865 0.3044 70 1.6276
1.24 0.3479 80 1.6035
1.482 0.3914 90 1.5790
1.3486 0.4349 100 1.5459
1.4871 0.4784 110 1.5275
1.3186 0.5218 120 1.5016
1.3892 0.5653 130 1.4813
1.4606 0.6088 140 1.4613
1.2964 0.6523 150 1.4345
1.2039 0.6958 160 1.4139
1.1981 0.7393 170 1.3928
1.2071 0.7828 180 1.3773
1.2505 0.8263 190 1.3566
1.241 0.8697 200 1.3361
1.4614 0.9132 210 1.3231
1.1201 0.9567 220 1.3060
1.1996 1.0 230 1.2896
1.1315 1.0435 240 1.2759
1.182 1.0870 250 1.2652
1.1442 1.1305 260 1.2473
1.1309 1.1739 270 1.2370
0.9868 1.2174 280 1.2237
1.1573 1.2609 290 1.2061
1.1479 1.3044 300 1.2005
1.1099 1.3479 310 1.1852
1.1088 1.3914 320 1.1753
1.0451 1.4349 330 1.1658
1.0777 1.4784 340 1.1505
1.0902 1.5218 350 1.1406
1.0681 1.5653 360 1.1321
1.0797 1.6088 370 1.1250
1.1503 1.6523 380 1.1155
1.1035 1.6958 390 1.1087
1.0596 1.7393 400 1.0997
1.0397 1.7828 410 1.0939
1.1203 1.8263 420 1.0878
0.9642 1.8697 430 1.0845
1.0697 1.9132 440 1.0780
0.9721 1.9567 450 1.0719
1.0128 2.0 460 1.0680
1.0888 2.0435 470 1.0658
1.0267 2.0870 480 1.0626
1.0126 2.1305 490 1.0589
0.9058 2.1739 500 1.0564
1.031 2.2174 510 1.0544
1.0226 2.2609 520 1.0503
1.1264 2.3044 530 1.0485
1.0121 2.3479 540 1.0477
1.0678 2.3914 550 1.0450
1.0438 2.4349 560 1.0406
1.1264 2.4784 570 1.0405
1.0563 2.5218 580 1.0410
1.0601 2.5653 590 1.0386
0.9677 2.6088 600 1.0440
1.1078 2.6523 610 1.0392
1.0677 2.6958 620 1.0387
1.0246 2.7393 630 1.0376
1.0707 2.7828 640 1.0387
0.9897 2.8263 650 1.0386
0.9436 2.8697 660 1.0377
1.1909 2.9132 670 1.0375
0.9286 2.9567 680 1.0350
1.0323 3.0 690 1.0354

Framework versions

  • PEFT 0.18.1
  • Transformers 4.57.6
  • Pytorch 2.8.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for a3ilab-llm-uncertainty/gptoss_20b_all_zhtw_lr5e-7_ep3_16_64_128_turn_v5data_500

Adapter
(161)
this model