MMR-DR_GRPO-lambda-0.8 / reward_plots
8.02 MB
kangdawei's picture
Training in progress, step 500
576de5e verified