Question: did you use beta=0.1?
#1
by eengad - opened
(default in alignment handbook).
BTW I ran MT-bench and got:
gemma-2b-zephyr-dpo 4.347826
gemma-2b-zephyr-sft 4.215625
Here is the run: https://wandb.ai/llm_surgery/gemma-zephyr/runs/lbqi9kvq
nope, beta=0.01. I think the default is 0.05 in the new recipe
tcapelle changed discussion status to closed
tcapelle changed discussion status to open
The idea here, was to use the "original recipe" and in that recipe, beta: 0.01 > https://github.com/huggingface/alignment-handbook/blob/ff618a4d13a2c77cf97479fac8af2c576619062a/recipes/zephyr-7b-beta/dpo/config_full.yaml#L16