Spaces:
Sleeping
Sleeping
feat: implement core RL training infrastructure, including GRPO training, evaluation utilities, custom environments, and Modal-based execution scripts.
3807ea3 | title: CyberSecurity_OWASP Environment Server | |
| emoji: 🛡️ | |
| colorFrom: blue | |
| colorTo: gray | |
| sdk: docker | |
| pinned: false | |
| app_port: 8000 | |
| base_path: /web | |
| tags: | |
| - openenv | |
| - cybersecurity | |
| - owasp | |
| # CyberSecurity_OWASP | |
| `CyberSecurity_OWASP` is an OpenEnv-compliant reinforcement-learning environment for a single LLM agent that performs a defensive authorization-repair workflow: | |
| ```text | |
| inspect generated app + policy -> discover authorization bug -> submit finding -> patch code -> preserve intended behavior | |
| ``` | |
| The current implementation includes a functional MVP scenario: an invoices FastAPI-style app with one injected OWASP A01 BOLA/IDOR defect, visible tests, hidden deterministic verifier checks, anti-cheat safeguards, and decomposed reward. | |
| ## Quick Start | |
| ```bash | |
| uv sync --extra dev | |
| uv run --extra dev pytest | |
| uv run server --port 8000 | |
| ``` | |
| Then connect with the OpenEnv client: | |
| ```python | |
| from CyberSecurity_OWASP import CyberSecurityOWASPAction, CyberSecurityOWASPEnv | |
| with CyberSecurityOWASPEnv(base_url="http://localhost:8000") as env: | |
| result = env.reset(seed=7) | |
| print(result.observation.task_brief) | |
| result = env.step(CyberSecurityOWASPAction(tool_name="list_routes")) | |
| print(result.observation.last_tool_result) | |
| ``` | |
| ## Action Space | |
| The agent emits one JSON action at a time: | |
| ```json | |
| {"tool_name":"read_file","arguments":{"path":"app/routes/invoices.py"}} | |
| ``` | |
| Supported tools: | |
| - `inspect_policy_graph` | |
| - `list_routes` | |
| - `read_openapi` | |
| - `read_file` | |
| - `search_code` | |
| - `send_local_request` | |
| - `compare_identities` | |
| - `submit_finding` | |
| - `patch_file` | |
| - `run_visible_tests` | |
| - `submit_fix` | |
| - `noop` | |
| Tools are phase-gated: | |
| - `discover`: inspect policy/routes/files, run safe local requests, compare identities, submit finding. | |
| - `patch`: read/search, patch editable app files, run visible tests, submit final fix. | |
| - `done`: stable terminal observation only. | |
| ## Reward | |
| Terminal reward uses stable components: | |
| ```python | |
| { | |
| "discovery": 0.0, | |
| "security": 0.0, | |
| "regression": 0.0, | |
| "public_routes": 0.0, | |
| "patch_quality": 0.0, | |
| "visible_tests": 0.0, | |
| "safety": 0.0, | |
| "anti_cheat": 0.0, | |
| "total": 0.0, | |
| } | |
| ``` | |
| The verifier rewards blocking the hidden exploit while preserving legitimate owner/admin behavior and intentionally public routes. It penalizes deny-all fixes, hardcoded IDs, hidden file probes, external URL attempts, and test/fixture tampering. | |
| ## Scenario Generation | |
| `reset(seed)` compiles a fresh isolated workspace under a temp directory. The MVP compiler generates: | |
| - invoices domain policy graph; | |
| - randomized users, tenants, invoices, and IDs; | |
| - generated app files under `app/`; | |
| - visible tests under `tests/test_visible.py`; | |
| - hidden facts kept only in state for deterministic verification. | |
| Additional domains and bug families are scaffolded for extension. | |
| ## Testing | |
| ```bash | |
| uv run --extra dev pytest | |
| ``` | |
| The suite covers model serialization, reset/step/state behavior, seed reproducibility, invalid actions, reward outcomes, anti-cheat checks, and scripted rollout policies. | |
| ## Training Scaffold | |
| Training files are under `training/`: | |
| - `rollout.py` | |
| - `reward_funcs.py` | |
| - `train_grpo.py` | |
| - `eval_before_after.py` | |
| - `trackio_utils.py` | |
| - `configs/grpo_small.yaml` | |
| The training scaffold is intentionally minimal until the environment/verifier behavior is stable. Trackio metric names and GRPO defaults follow the project brief. | |
| ## Modal Ephemeral Runs | |
| Modal Labs support is kept in a separate launcher script so the local OpenEnv server and core training scaffold stay unchanged. | |
| Install the optional local Modal client: | |
| ```bash | |
| uv sync --extra modal | |
| ``` | |
| Run a temporary Modal app for a cheap environment/training smoke check: | |
| ```bash | |
| uv run --extra modal modal run scripts/modal_ephemeral_train.py --mode smoke --episodes 4 | |
| ``` | |
| The app is ephemeral: Modal starts it for the command and stops it when the command exits. The remote result is written locally under `outputs/rollouts/`. | |
| You can also validate the GRPO config construction remotely: | |
| ```bash | |
| uv run --extra modal modal run scripts/modal_ephemeral_train.py --mode grpo-config | |
| ``` | |
| The shell wrapper is equivalent: | |
| ```bash | |
| MODE=smoke EPISODES=4 uv run --extra modal bash scripts/modal_run_ephemeral.sh | |
| ``` | |
| ## Docker / Spaces | |
| ```bash | |
| docker build -t CyberSecurity_OWASP:latest -f server/Dockerfile . | |
| docker run --rm -p 8000:8000 CyberSecurity_OWASP:latest | |
| openenv push --repo-id <username>/CyberSecurity_OWASP | |
| ``` | |