1.93 MB
InosLihka's picture
Claude Sonnet 4.6
fix: reduce kl_coef to prevent training instability
0bdfeaa