ianncity/KIMI-K2.5-1000000x
Viewer • Updated • 733k • 2.98k • 205
Hi @sseymens
Thank you for your comments.
I can help to reply your question about MOE on policy part.
old_log_prob = log_prob.detach() does not solve the on policy issue since the prob is using current policy but sampling distribution can be different due to expert selection.old_log_prob = log_prob.detach() will alleviate the issue if this is the root cause. This is just for hypothesis testing.