Spaces:

Humanlearning
/

Cyber_analyst-round1

Sleeping

App Files Files Community

Cyber_analyst-round1 / scripts /modal_train_grpo.py

Commit History

feat: introduce GRPO GPU fallback support, enhance training script with warmstart tagging, and add learning rate parameter for improved training flexibility

1b6d30b

Humanlearning commited on 12 days ago

feat: enhance SFT training process with new tokenization method, implement custom trainer class for loss computation, and update README with GRPO launcher details for Unsloth LoRA integration

e5fe6f5

Humanlearning commited on 12 days ago

fix: update README with SFT training configuration details, modify modal training scripts to disable assistant-only loss and packing for compatibility, and adjust test assertions to reflect these changes

1544ce8

Humanlearning commited on 12 days ago

feat: introduce reward ablation configurations for enhanced training flexibility, implement YAML loading with extends support, and add reward variant tracking in training scripts

f7b8ac6

Humanlearning commited on 12 days ago

feat: enhance reward configuration management with new logging functions, add parallel Modal training guidelines to documentation, and improve reward config hashing for deterministic behavior

0e7f59c

Humanlearning commited on 12 days ago

feat: update README with GPU-utilization tuning instructions, enhance modal training script with run name parameter, and modify GRPO configuration for trace logging and vLLM settings

7d32451

Humanlearning commited on 12 days ago

feat: enhance CyberSecurity_OWASP observation model with scenario prompt, improve GRPO batch configuration validation, and add scenario grouping for adaptive difficulty curriculum

632c145

Humanlearning commited on 12 days ago

feat: add episode trace fingerprinting for improved trace logging and update reward penalties in GRPO configuration

2eada22

Humanlearning commited on 13 days ago

feat: enhance scenario authoring and caching mechanisms, update action submission terminology, and improve reward configuration for CyberSecurity_OWASP environment

be8eade

Humanlearning commited on 13 days ago

feat: enhance training image setup and add startup notice for Modal execution, improve dependency installation process, and implement training heartbeat for monitoring

448eddd

Humanlearning commited on 13 days ago

feat: update training configuration and documentation for Modal execution, including new model integration and enhanced tracking utilities

b3ee507

Humanlearning commited on 13 days ago

feat: implement RL environment server with training infrastructure and Modal integration

6abc8c5

Humanlearning commited on 13 days ago

feat: integrate Trackio for experiment tracking and add Modal training infrastructure with environment and test utilities.

4e663d8

Humanlearning commited on 13 days ago

feat: implement core RL training infrastructure and architecture documentation

f3080d1

Humanlearning commited on 13 days ago