llama3_1_8B_all_zhtw_lr1e-5_ep1_16_32_128_turn

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the multi_turn_miss_func_zh_tw_function_mix500_turn_llama3.1_pretokenized, the multi_turn_miss_para_zh_tw_function_mix500_turn_llama3.1_pretokenized, the multi_turn_zh_tw_function_mix500_turn_llama3.1_pretokenized, the irrelevance_zh_tw3000_llama3.1_pretokenized and the apigen_zhtwV3_remove_sys_llama3.1_pretokenized datasets. It achieves the following results on the evaluation set:

Loss: 0.3873

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 15
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss
0.8972	0.0645	10	0.7509
0.7576	0.1290	20	0.5763
0.5547	0.1935	30	0.4981
0.5052	0.2581	40	0.4673
0.5385	0.3226	50	0.4456
0.5042	0.3871	60	0.4303
0.5493	0.4516	70	0.4183
0.5523	0.5161	80	0.4085
0.5716	0.5806	90	0.4009
0.4327	0.6452	100	0.3960
0.5545	0.7097	110	0.3922
0.4067	0.7742	120	0.3895
0.4078	0.8387	130	0.3880
0.4392	0.9032	140	0.3875
0.4435	0.9677	150	0.3873

Framework versions

PEFT 0.18.1
Transformers 4.57.6
Pytorch 2.8.0+cu128
Datasets 4.0.0
Tokenizers 0.22.2

Downloads last month: 1

Model tree for a3ilab-llm-uncertainty/llama3_1_8B_all_zhtw_lr1e-5_ep1_16_32_128_turn_tokenize

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Adapter

(1981)

this model