consider use RL after SFT
#1
by BryanADA - opened
I think RL could boost the ability of Gemopus after SFT. Would you like to try it in the next version of Gemopus?
Thanks for the suggestion! I’ve actually been exploring RL training on Qwopus, but there are still some challenges—especially with multimodal compatibility and algorithm optimization, which aren’t fully resolved yet.
That said, I definitely plan to incorporate RL in future training once these issues are addressed.