feat: enhance reward configuration management with new logging functions, add parallel Modal training guidelines to documentation, and improve reward config hashing for deterministic behavior 0e7f59c Humanlearning commited on 12 days ago
feat: enhance scenario authoring and caching mechanisms, update action submission terminology, and improve reward configuration for CyberSecurity_OWASP environment be8eade Humanlearning commited on 13 days ago
feat: update training configuration and documentation for Modal execution, including new model integration and enhanced tracking utilities b3ee507 Humanlearning commited on 13 days ago
feat: implement RL environment server with training infrastructure and Modal integration 6abc8c5 Humanlearning commited on 13 days ago
feat: integrate Trackio for experiment tracking and add Modal training infrastructure with environment and test utilities. 4e663d8 Humanlearning commited on 13 days ago
feat: implement core RL training infrastructure, including GRPO training, evaluation utilities, custom environments, and Modal-based execution scripts. 3807ea3 Humanlearning commited on 13 days ago