Humanlearning's picture
feat: implement core RL training infrastructure, including GRPO training, evaluation utilities, custom environments, and Modal-based execution scripts.
3807ea3
|
raw
history blame
4.48 kB
---
title: CyberSecurity_OWASP Environment Server
emoji: 🛡️
colorFrom: blue
colorTo: gray
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
- cybersecurity
- owasp
---
# CyberSecurity_OWASP
`CyberSecurity_OWASP` is an OpenEnv-compliant reinforcement-learning environment for a single LLM agent that performs a defensive authorization-repair workflow:
```text
inspect generated app + policy -> discover authorization bug -> submit finding -> patch code -> preserve intended behavior
```
The current implementation includes a functional MVP scenario: an invoices FastAPI-style app with one injected OWASP A01 BOLA/IDOR defect, visible tests, hidden deterministic verifier checks, anti-cheat safeguards, and decomposed reward.
## Quick Start
```bash
uv sync --extra dev
uv run --extra dev pytest
uv run server --port 8000
```
Then connect with the OpenEnv client:
```python
from CyberSecurity_OWASP import CyberSecurityOWASPAction, CyberSecurityOWASPEnv
with CyberSecurityOWASPEnv(base_url="http://localhost:8000") as env:
result = env.reset(seed=7)
print(result.observation.task_brief)
result = env.step(CyberSecurityOWASPAction(tool_name="list_routes"))
print(result.observation.last_tool_result)
```
## Action Space
The agent emits one JSON action at a time:
```json
{"tool_name":"read_file","arguments":{"path":"app/routes/invoices.py"}}
```
Supported tools:
- `inspect_policy_graph`
- `list_routes`
- `read_openapi`
- `read_file`
- `search_code`
- `send_local_request`
- `compare_identities`
- `submit_finding`
- `patch_file`
- `run_visible_tests`
- `submit_fix`
- `noop`
Tools are phase-gated:
- `discover`: inspect policy/routes/files, run safe local requests, compare identities, submit finding.
- `patch`: read/search, patch editable app files, run visible tests, submit final fix.
- `done`: stable terminal observation only.
## Reward
Terminal reward uses stable components:
```python
{
"discovery": 0.0,
"security": 0.0,
"regression": 0.0,
"public_routes": 0.0,
"patch_quality": 0.0,
"visible_tests": 0.0,
"safety": 0.0,
"anti_cheat": 0.0,
"total": 0.0,
}
```
The verifier rewards blocking the hidden exploit while preserving legitimate owner/admin behavior and intentionally public routes. It penalizes deny-all fixes, hardcoded IDs, hidden file probes, external URL attempts, and test/fixture tampering.
## Scenario Generation
`reset(seed)` compiles a fresh isolated workspace under a temp directory. The MVP compiler generates:
- invoices domain policy graph;
- randomized users, tenants, invoices, and IDs;
- generated app files under `app/`;
- visible tests under `tests/test_visible.py`;
- hidden facts kept only in state for deterministic verification.
Additional domains and bug families are scaffolded for extension.
## Testing
```bash
uv run --extra dev pytest
```
The suite covers model serialization, reset/step/state behavior, seed reproducibility, invalid actions, reward outcomes, anti-cheat checks, and scripted rollout policies.
## Training Scaffold
Training files are under `training/`:
- `rollout.py`
- `reward_funcs.py`
- `train_grpo.py`
- `eval_before_after.py`
- `trackio_utils.py`
- `configs/grpo_small.yaml`
The training scaffold is intentionally minimal until the environment/verifier behavior is stable. Trackio metric names and GRPO defaults follow the project brief.
## Modal Ephemeral Runs
Modal Labs support is kept in a separate launcher script so the local OpenEnv server and core training scaffold stay unchanged.
Install the optional local Modal client:
```bash
uv sync --extra modal
```
Run a temporary Modal app for a cheap environment/training smoke check:
```bash
uv run --extra modal modal run scripts/modal_ephemeral_train.py --mode smoke --episodes 4
```
The app is ephemeral: Modal starts it for the command and stops it when the command exits. The remote result is written locally under `outputs/rollouts/`.
You can also validate the GRPO config construction remotely:
```bash
uv run --extra modal modal run scripts/modal_ephemeral_train.py --mode grpo-config
```
The shell wrapper is equivalent:
```bash
MODE=smoke EPISODES=4 uv run --extra modal bash scripts/modal_run_ephemeral.sh
```
## Docker / Spaces
```bash
docker build -t CyberSecurity_OWASP:latest -f server/Dockerfile .
docker run --rm -p 8000:8000 CyberSecurity_OWASP:latest
openenv push --repo-id <username>/CyberSecurity_OWASP
```