Commit History

Sync README training commands
0b1f1bd
verified

Humanlearning commited on

docs: add architecture and RL training flow diagrams to README and blog for improved clarity on system design
b2ee80d

Humanlearning commited on

feat: enhance SFT training process with new tokenization method, implement custom trainer class for loss computation, and update README with GRPO launcher details for Unsloth LoRA integration
e5fe6f5

Humanlearning commited on

fix: update README with SFT training configuration details, modify modal training scripts to disable assistant-only loss and packing for compatibility, and adjust test assertions to reflect these changes
1544ce8

Humanlearning commited on

feat: expand README with synthetic SFT dataset generation instructions, enhance dataset verification and pushing to Hugging Face Hub, and improve modal training scripts with default configurations for curriculum and GPU fallback
60f97ab

Humanlearning commited on

feat: enhance reward configuration management with new logging functions, add parallel Modal training guidelines to documentation, and improve reward config hashing for deterministic behavior
0e7f59c

Humanlearning commited on

feat: update README with GPU-utilization tuning instructions, enhance modal training script with run name parameter, and modify GRPO configuration for trace logging and vLLM settings
7d32451

Humanlearning commited on

feat: enhance scenario authoring and caching mechanisms, update action submission terminology, and improve reward configuration for CyberSecurity_OWASP environment
be8eade

Humanlearning commited on

feat: update training configuration and documentation for Modal execution, including new model integration and enhanced tracking utilities
b3ee507

Humanlearning commited on

feat: implement RL environment server with training infrastructure and Modal integration
6abc8c5

Humanlearning commited on

feat: integrate Trackio for experiment tracking and add Modal training infrastructure with environment and test utilities.
4e663d8

Humanlearning commited on

feat: implement core RL training infrastructure and architecture documentation
f3080d1

Humanlearning commited on

feat: implement core RL training infrastructure, including GRPO training, evaluation utilities, custom environments, and Modal-based execution scripts.
3807ea3

Humanlearning commited on