YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Epoch	Training Loss	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/chosen	Logps/rejected	Logits/chosen	Logits/rejected
1	0.6220	0.6291	-0.7651	-0.9942	0.6389	0.2290	-51.33	-53.62	-0.5743	-0.5584
2	0.2153	0.3662	-1.6715	-2.8304	0.8333	1.1588	-60.39	-71.98	-0.8344	-0.8035
3	0.0216	0.2678	-3.9962	-6.7451	0.8056	2.7488	-83.64	-111.13	-0.9501	-0.9175
4	0.0034	0.2886	-7.7645	-11.9930	0.8333	4.2285	-121.32	-163.61	-0.8709	-0.8464
5	0.0012	0.3095	-8.7604	-13.2704	0.8611	4.5100	-131.28	-176.38	-0.8497	-0.8267

this is a hands-on dpo experiment that didn't end up amazingly well since my DPO pairs weren't as big in numbers as a huge lab would make use of but I learned a lot

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support