作者你好，请问你是如何用grpo训练的？

by hzb29 - opened 13 days ago

我自己之前也尝试过grpo，用llm judge作为打分，但并发数太高跑不起来，想和您交流一下您是怎么做的

Owner 8 days ago

感谢关注，我想的是除了llm as judge可以用一些rule based reward，比如是不是好事起手，或者有一些keyword被mention了就+ reward，不过目前还未实现，可以交流一下

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment