llama3_1_8B_all_zhtw_lr1e5_ep1_16_32_128

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the multi_turn_miss_func_zh_tw_function_mix_sharegpt_clean1, the multi_turn_miss_param_zh_tw_function_mix_sharegpt_clean1, the multi_turn_zh_tw_function_mix_sharegpt, the irrelevance_zh_tw_sharegpt3000 and the apigen_zhtwV3_remove_sys_sharegpt datasets. It achieves the following results on the evaluation set:

  • Loss: 0.6250

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 128
  • total_train_batch_size: 128
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
0.9075 0.0673 10 1.1429
0.873 0.1347 20 1.0365
0.7009 0.2020 30 0.9636
0.8234 0.2694 40 0.8995
0.5767 0.3367 50 0.8376
0.6782 0.4041 60 0.7834
0.6734 0.4714 70 0.7381
0.6834 0.5388 80 0.7015
0.5739 0.6061 90 0.6727
0.7269 0.6735 100 0.6514
0.5797 0.7408 110 0.6378
0.5396 0.8082 120 0.6303
0.5815 0.8755 130 0.6265
0.4435 0.9429 140 0.6248

Framework versions

  • PEFT 0.17.0
  • Transformers 4.57.1
  • Pytorch 2.4.1+cu121
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for a3ilab-llm-uncertainty/llama3_1_8B_all_zhtw_lr1e5_ep1_16_32_128

Adapter
(1981)
this model