feat: introduce GRPO GPU fallback support, enhance training script with warmstart tagging, and add learning rate parameter for improved training flexibility 1b6d30b Humanlearning commited on 28 days ago