llama3_1_8B_all_zhtw_lr1e-5_ep1_16_32_128_turn

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the multi_turn_miss_func_zh_tw_function_mix500_turn_llama3.1_pretokenized, the multi_turn_miss_para_zh_tw_function_mix500_turn_llama3.1_pretokenized, the multi_turn_zh_tw_function_mix500_turn_llama3.1_pretokenized, the irrelevance_zh_tw3000_llama3.1_pretokenized and the apigen_zhtwV3_remove_sys_llama3.1_pretokenized datasets. It achieves the following results on the evaluation set:

  • Loss: 0.3873

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 15
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
0.8972 0.0645 10 0.7509
0.7576 0.1290 20 0.5763
0.5547 0.1935 30 0.4981
0.5052 0.2581 40 0.4673
0.5385 0.3226 50 0.4456
0.5042 0.3871 60 0.4303
0.5493 0.4516 70 0.4183
0.5523 0.5161 80 0.4085
0.5716 0.5806 90 0.4009
0.4327 0.6452 100 0.3960
0.5545 0.7097 110 0.3922
0.4067 0.7742 120 0.3895
0.4078 0.8387 130 0.3880
0.4392 0.9032 140 0.3875
0.4435 0.9677 150 0.3873

Framework versions

  • PEFT 0.18.1
  • Transformers 4.57.6
  • Pytorch 2.8.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for a3ilab-llm-uncertainty/llama3_1_8B_all_zhtw_lr1e-5_ep1_16_32_128_turn_tokenize

Adapter
(1981)
this model