Which post-training method was actually used for this model, GDPO or GRPO?
#1
by roseblooming - opened
The model name includes GDPO, but the model card states that it was trained using GRPO—could you clarify which method was actually used?
GDPO
The model name includes GDPO, but the model card states that it was trained using GRPO—could you clarify which method was actually used?
GDPO