gptoss_20b_all_zhtw_lr5e-7_ep1_16_64_128_turn

This model is a fine-tuned version of openai/gpt-oss-20b on the multi_turn_miss_func_zh_tw_function_mix500_turn_oss_gpt_oss_20b_pretokenized, the multi_turn_miss_para_zh_tw_function_mix500_turn_oss_gpt_oss_20b_pretokenized, the multi_turn_zh_tw_function_mix500_turn_oss_gpt_oss_20b_pretokenized, the irrelevance_zh_tw_oss3000_gpt_oss_20b_pretokenized and the apigen_zhtwV3_remove_sys_gpt_oss_20b_pretokenized datasets. It achieves the following results on the evaluation set:

  • Loss: 1.7413

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 15
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
1.9188 0.0467 10 2.0771
2.1396 0.0934 20 2.0681
2.0909 0.1401 30 2.0564
2.0204 0.1868 40 2.0281
1.9068 0.2335 50 2.0003
1.9153 0.2803 60 1.9579
1.8924 0.3270 70 1.9261
2.0036 0.3737 80 1.9034
1.8911 0.4204 90 1.8709
1.854 0.4671 100 1.8492
1.8588 0.5138 110 1.8197
1.713 0.5605 120 1.8010
1.6542 0.6072 130 1.7828
1.5714 0.6539 140 1.7707
1.7184 0.7006 150 1.7601
1.6565 0.7473 160 1.7521
1.7325 0.7940 170 1.7503
1.8492 0.8408 180 1.7432
1.725 0.8875 190 1.7447
1.7197 0.9342 200 1.7353
1.811 0.9809 210 1.7406

Framework versions

  • PEFT 0.18.1
  • Transformers 4.57.6
  • Pytorch 2.8.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for a3ilab-llm-uncertainty/gptoss_20b_all_zhtw_lr5e-7_ep1_16_64_128_turn

Adapter
(161)
this model