Running RL 1 Algo Reasoning Environment 🧠1 Submit Rust code and reasoning to get a correctness reward
Running RL 1 Algo Reasoning Environment 🧠1 Submit Rust code and reasoning to get a correctness reward
view reply Chinchilla paper actually shows that for a fixed compute budget, it is better to train a smaller model on more data rather than training a larger model for fewer steps.