feat: implement core RL training infrastructure and architecture documentation f3080d1 Humanlearning commited on 13 days ago