Spaces:

Humanlearning
/

Cyber_analyst-round1

Sleeping

1.45 MB

Ctrl+K

1 contributor

feat: enhance SFT training process with new tokenization method, implement custom trainer class for loss computation, and update README with GRPO launcher details for Unsloth LoRA integration

e5fe6f5 12 days ago

.agents
feat: enhance reward configuration management with new logging functions, add parallel Modal training guidelines to documentation, and improve reward config hashing for deterministic behavior 12 days ago
.codex
feat: integrate Trackio for experiment tracking, add GRPO training support, and deploy web-based monitoring tools 13 days ago
assets
diagrams updated 12 days ago
configs
feat: enhance scenario authoring and caching mechanisms, update action submission terminology, and improve reward configuration for CyberSecurity_OWASP environment 13 days ago
scripts
feat: enhance SFT training process with new tokenization method, implement custom trainer class for loss computation, and update README with GRPO launcher details for Unsloth LoRA integration 12 days ago
server
feat: enhance CyberSecurity_OWASP observation model with scenario prompt, improve GRPO batch configuration validation, and add scenario grouping for adaptive difficulty curriculum 13 days ago
tests
feat: enhance SFT training process with new tokenization method, implement custom trainer class for loss computation, and update README with GRPO launcher details for Unsloth LoRA integration 12 days ago
training
feat: introduce reward ablation configurations for enhanced training flexibility, implement YAML loading with extends support, and add reward variant tracking in training scripts 12 days ago
.dockerignore

100 Bytes
feat: implement core RL training infrastructure and architecture documentation 13 days ago
.gitignore

82 Bytes
feat: integrate Trackio for experiment tracking and add Modal training infrastructure with environment and test utilities. 13 days ago
.hfignore

163 Bytes
feat: implement core RL training infrastructure and architecture documentation 13 days ago
00_PROJECT_BRIEF.md

6.97 kB
feat: enhance scenario authoring and caching mechanisms, update action submission terminology, and improve reward configuration for CyberSecurity_OWASP environment 13 days ago
01_ARCHITECTURE.md

20 kB
feat: enhance scenario authoring and caching mechanisms, update action submission terminology, and improve reward configuration for CyberSecurity_OWASP environment 13 days ago
AGENTS.md

38.4 kB
feat: enhance reward configuration management with new logging functions, add parallel Modal training guidelines to documentation, and improve reward config hashing for deterministic behavior 12 days ago
Dockerfile

927 Bytes
feat: integrate Trackio for experiment tracking and add Modal training infrastructure with environment and test utilities. 13 days ago
README.md

20.2 kB
feat: enhance SFT training process with new tokenization method, implement custom trainer class for loss computation, and update README with GRPO launcher details for Unsloth LoRA integration 12 days ago
__init__.py

594 Bytes
feat: implement core RL training infrastructure, including GRPO training, evaluation utilities, custom environments, and Modal-based execution scripts. 13 days ago
bug_mutator.py

692 Bytes
feat: implement core RL training infrastructure, including GRPO training, evaluation utilities, custom environments, and Modal-based execution scripts. 13 days ago
client.py

1.26 kB
feat: implement core RL training infrastructure, including GRPO training, evaluation utilities, custom environments, and Modal-based execution scripts. 13 days ago
config.py

7.72 kB
feat: enhance scenario authoring and caching mechanisms, update action submission terminology, and improve reward configuration for CyberSecurity_OWASP environment 13 days ago
evals.py

2.91 kB
feat: enhance scenario authoring and caching mechanisms, update action submission terminology, and improve reward configuration for CyberSecurity_OWASP environment 13 days ago
fixture_generator.py

537 Bytes
feat: implement core RL training infrastructure, including GRPO training, evaluation utilities, custom environments, and Modal-based execution scripts. 13 days ago
models.py

3.88 kB
feat: enhance CyberSecurity_OWASP observation model with scenario prompt, improve GRPO batch configuration validation, and add scenario grouping for adaptive difficulty curriculum 13 days ago
openenv.yaml

103 Bytes
Initial commit 13 days ago
policy_graph.py

3.48 kB
feat: implement core RL training infrastructure, including GRPO training, evaluation utilities, custom environments, and Modal-based execution scripts. 13 days ago
pyproject.toml

1.78 kB
feat: enhance scenario authoring and caching mechanisms, update action submission terminology, and improve reward configuration for CyberSecurity_OWASP environment 13 days ago
reward_config.py

10.2 kB
feat: introduce reward ablation configurations for enhanced training flexibility, implement YAML loading with extends support, and add reward variant tracking in training scripts 12 days ago
rewards.py

15.7 kB
feat: enhance scenario authoring and caching mechanisms, update action submission terminology, and improve reward configuration for CyberSecurity_OWASP environment 13 days ago
safety.py

420 Bytes
feat: implement core RL training infrastructure, including GRPO training, evaluation utilities, custom environments, and Modal-based execution scripts. 13 days ago
scenario_compiler.py

680 Bytes
feat: implement RL environment server with training infrastructure and Modal integration 13 days ago
template_renderer.py

2.86 kB
feat: implement core RL training infrastructure, including GRPO training, evaluation utilities, custom environments, and Modal-based execution scripts. 13 days ago
uv.lock

817 kB
feat: enhance scenario authoring and caching mechanisms, update action submission terminology, and improve reward configuration for CyberSecurity_OWASP environment 13 days ago
validators.py

10.7 kB
feat: enhance scenario authoring and caching mechanisms, update action submission terminology, and improve reward configuration for CyberSecurity_OWASP environment 13 days ago