YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Epoch Training Loss Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/chosen Logps/rejected Logits/chosen Logits/rejected
1 0.6220 0.6291 -0.7651 -0.9942 0.6389 0.2290 -51.33 -53.62 -0.5743 -0.5584
2 0.2153 0.3662 -1.6715 -2.8304 0.8333 1.1588 -60.39 -71.98 -0.8344 -0.8035
3 0.0216 0.2678 -3.9962 -6.7451 0.8056 2.7488 -83.64 -111.13 -0.9501 -0.9175
4 0.0034 0.2886 -7.7645 -11.9930 0.8333 4.2285 -121.32 -163.61 -0.8709 -0.8464
5 0.0012 0.3095 -8.7604 -13.2704 0.8611 4.5100 -131.28 -176.38 -0.8497 -0.8267

this is a hands-on dpo experiment that didn't end up amazingly well since my DPO pairs weren't as big in numbers as a huge lab would make use of but I learned a lot

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support