49.8 kB
Humanlearning's picture
feat: enhance reward configuration management with new logging functions, add parallel Modal training guidelines to documentation, and improve reward config hashing for deterministic behavior
0e7f59c