feat: implement core RL training infrastructure and architecture documentation f3080d1 Humanlearning commited on 16 days ago