llama-3-8b-base-kto-ultrafeedback-8xh200

This model is a fine-tuned version of W-61/llama-3-8b-base-sft-ultrachat-8xh200 on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.3658
Rewards/chosen: 0.1622
Logps/chosen: -286.2337
Rewards/rejected: -2.5444
Logps/rejected: -292.3963
Rewards/margins: 2.7066
Kl: 0.0
Logits/chosen: -140467840.0
Logits/rejected: -139209600.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 4
total_train_batch_size: 128
total_eval_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Logps/chosen	Rewards/rejected	Logps/rejected	Rewards/margins	Logits/chosen	Logits/rejected
1.5841	0.2094	200	0.3971	-0.1699	-289.5548	-1.7146	-284.0978	1.5447	-151004736.0	-149476768.0
1.404	0.4188	400	0.3773	-0.0342	-288.1983	-2.3874	-290.8255	2.3531	-143785152.0	-142386976.0
1.4253	0.6283	600	0.3684	-0.3211	-291.0670	-3.1407	-298.3589	2.8196	-145117536.0	-143700400.0
1.4432	0.8377	800	0.3658	0.1622	-286.2337	-2.5444	-292.3963	2.7066	-140467840.0	-139209600.0

Framework versions

Transformers 4.51.0
Pytorch 2.3.1+cu121
Datasets 2.21.0
Tokenizers 0.21.4

Downloads last month: 192

Safetensors

Model size

8B params

Tensor type

F32

Model tree for jackf857/llama-3-8b-base-kto-ultrafeedback-8xh200

Base model

meta-llama/Meta-Llama-3-8B

Finetuned

W-61/llama-3-8b-base-sft-ultrachat-8xh200

Finetuned

(11)

this model

jackf857
/

llama-3-8b-base-kto-ultrafeedback-8xh200

llama-3-8b-base-kto-ultrafeedback-8xh200

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for jackf857/llama-3-8b-base-kto-ultrafeedback-8xh200

Dataset used to train jackf857/llama-3-8b-base-kto-ultrafeedback-8xh200