zephyr-7b-dpo-full-alpha_0.5_batch64_0.003

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.7764
Rewards/chosen: -1.2722
Rewards/rejected: -2.3007
Rewards/accuracies: 0.7798
Rewards/margins: 1.0285
Logps/rejected: -490.2682
Logps/chosen: -409.1960
Logits/rejected: -0.1893
Logits/chosen: -1.1488

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.932	0.1047	100	0.9289	-0.0447	-0.2483	0.7103	0.2036	-285.0338	-286.4436	-2.3925	-2.4620
0.8626	0.2093	200	0.8748	-0.8410	-1.5326	0.7381	0.6917	-413.4665	-366.0735	-0.5487	-1.1429
0.8319	0.3140	300	0.8334	-0.9854	-1.7894	0.7579	0.8041	-439.1472	-380.5152	-0.7096	-1.3168
0.8266	0.4186	400	0.8083	-0.7498	-1.4939	0.7778	0.7441	-409.5971	-356.9564	-1.3008	-1.8553
0.7846	0.5233	500	0.7918	-1.1813	-2.1016	0.7817	0.9203	-470.3610	-400.1062	-0.9569	-1.4479
0.7725	0.6279	600	0.7836	-1.1925	-2.1692	0.7679	0.9767	-477.1200	-401.2269	0.0171	-0.9912
0.747	0.7326	700	0.7802	-1.2403	-2.2603	0.7758	1.0200	-486.2288	-406.0034	-0.1232	-1.1647
0.7634	0.8373	800	0.7777	-1.1944	-2.1827	0.7837	0.9883	-478.4758	-401.4192	-0.5323	-1.3216
0.7538	0.9419	900	0.7767	-1.2710	-2.3022	0.7778	1.0313	-490.4274	-409.0746	-0.1798	-1.1420

Framework versions

Transformers 4.44.2
Pytorch 2.2.1+cu118
Datasets 2.14.7
Tokenizers 0.19.1

Downloads last month: 2

Safetensors

Model size

7B params

Tensor type

BF16

Model tree for YeongminKim/zephyr-7b-dpo-full-alpha_0.5_batch64_0.003

Base model

mistralai/Mistral-7B-v0.1

Finetuned

alignment-handbook/zephyr-7b-sft-full

Finetuned

(389)

this model

YeongminKim
/

zephyr-7b-dpo-full-alpha_0.5_batch64_0.003

zephyr-7b-dpo-full-alpha_0.5_batch64_0.003

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for YeongminKim/zephyr-7b-dpo-full-alpha_0.5_batch64_0.003

Dataset used to train YeongminKim/zephyr-7b-dpo-full-alpha_0.5_batch64_0.003