Which post-training method was actually used for this model, GDPO or GRPO?

#1
by roseblooming - opened

The model name includes GDPO, but the model card states that it was trained using GRPO—could you clarify which method was actually used?

GDPO

Sign up or log in to comment