fix: remove max_new_tokens from GRPOConfig, set batch_size=2 (multiple of num_generations) e99351e muskan singh Claude Opus 4.7 commited on 13 days ago
fix: pin trl<=0.24, multi-step reward, lower LR, reduce NUM_GEN 2ab0fe0 muskan singh Claude Opus 4.7 commited on 13 days ago
fix: remove max_new_tokens from GRPOConfig (not supported by Unsloth patched version) 7a0b2ce muskan singh commited on 13 days ago
fix: add missing server modules and unsloth import order 7557a2b muskan singh Claude Sonnet 4.6 commited on 13 days ago