YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
| Epoch | Training Loss | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/chosen | Logps/rejected | Logits/chosen | Logits/rejected |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.6220 | 0.6291 | -0.7651 | -0.9942 | 0.6389 | 0.2290 | -51.33 | -53.62 | -0.5743 | -0.5584 |
| 2 | 0.2153 | 0.3662 | -1.6715 | -2.8304 | 0.8333 | 1.1588 | -60.39 | -71.98 | -0.8344 | -0.8035 |
| 3 | 0.0216 | 0.2678 | -3.9962 | -6.7451 | 0.8056 | 2.7488 | -83.64 | -111.13 | -0.9501 | -0.9175 |
| 4 | 0.0034 | 0.2886 | -7.7645 | -11.9930 | 0.8333 | 4.2285 | -121.32 | -163.61 | -0.8709 | -0.8464 |
| 5 | 0.0012 | 0.3095 | -8.7604 | -13.2704 | 0.8611 | 4.5100 | -131.28 | -176.38 | -0.8497 | -0.8267 |
this is a hands-on dpo experiment that didn't end up amazingly well since my DPO pairs weren't as big in numbers as a huge lab would make use of but I learned a lot
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support