feat: enhance reward configuration management with new logging functions, add parallel Modal training guidelines to documentation, and improve reward config hashing for deterministic behavior 0e7f59c Humanlearning commited on 12 days ago
feat: enhance scenario authoring and caching mechanisms, update action submission terminology, and improve reward configuration for CyberSecurity_OWASP environment be8eade Humanlearning commited on 12 days ago
feat: update training configuration and documentation for Modal execution, including new model integration and enhanced tracking utilities b3ee507 Humanlearning commited on 13 days ago
feat: add cybersecurity-owasp-trainer skill with reference notes and update AGENTS.md documentation 28685f3 Humanlearning commited on 13 days ago
feat: integrate Trackio for experiment tracking, add GRPO training support, and deploy web-based monitoring tools 0e95d4f Humanlearning commited on 13 days ago