llama3_1_8B_all_zhtw_turn_lr1e-5_ep1_16_32_128

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the multi_turn_miss_func_zh_tw_function_mix_sharegpt500_turn, the multi_turn_miss_param_zh_tw_function_mix_sharegpt500_turn, the multi_turn_zh_tw_function_mix_sharegpt500_turn, the irrelevance_zh_tw_sharegpt3000 and the apigen_zhtwV3_remove_sys_sharegpt datasets. It achieves the following results on the evaluation set:

  • Loss: 0.7852

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 128
  • total_train_batch_size: 128
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 15
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
1.5538 0.0467 10 1.7379
1.4577 0.0933 20 1.6165
1.4283 0.1400 30 1.4638
0.9057 0.1866 40 1.3257
1.0109 0.2333 50 1.2057
0.9861 0.2799 60 1.0982
0.9197 0.3266 70 1.0025
0.7071 0.3732 80 0.9370
0.8973 0.4199 90 0.8938
0.8647 0.4665 100 0.8634
0.9744 0.5132 110 0.8408
0.7256 0.5598 120 0.8232
0.7378 0.6065 130 0.8116
0.7114 0.6531 140 0.8037
0.7931 0.6998 150 0.7972
0.754 0.7464 160 0.7923
0.9558 0.7931 170 0.7890
0.8876 0.8397 180 0.7870
0.7552 0.8864 190 0.7857
0.7593 0.9330 200 0.7851
0.8785 0.9797 210 0.7850

Framework versions

  • PEFT 0.17.0
  • Transformers 4.57.1
  • Pytorch 2.4.1+cu121
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for a3ilab-llm-uncertainty/llama3_1_8B_all_zhtw_turn_lr1e-5_ep1_16_32_128

Adapter
(1981)
this model