Spaces:
Sleeping
Sleeping
File size: 4,476 Bytes
287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 3807ea3 287d681 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 | ---
title: CyberSecurity_OWASP Environment Server
emoji: 🛡️
colorFrom: blue
colorTo: gray
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
- cybersecurity
- owasp
---
# CyberSecurity_OWASP
`CyberSecurity_OWASP` is an OpenEnv-compliant reinforcement-learning environment for a single LLM agent that performs a defensive authorization-repair workflow:
```text
inspect generated app + policy -> discover authorization bug -> submit finding -> patch code -> preserve intended behavior
```
The current implementation includes a functional MVP scenario: an invoices FastAPI-style app with one injected OWASP A01 BOLA/IDOR defect, visible tests, hidden deterministic verifier checks, anti-cheat safeguards, and decomposed reward.
## Quick Start
```bash
uv sync --extra dev
uv run --extra dev pytest
uv run server --port 8000
```
Then connect with the OpenEnv client:
```python
from CyberSecurity_OWASP import CyberSecurityOWASPAction, CyberSecurityOWASPEnv
with CyberSecurityOWASPEnv(base_url="http://localhost:8000") as env:
result = env.reset(seed=7)
print(result.observation.task_brief)
result = env.step(CyberSecurityOWASPAction(tool_name="list_routes"))
print(result.observation.last_tool_result)
```
## Action Space
The agent emits one JSON action at a time:
```json
{"tool_name":"read_file","arguments":{"path":"app/routes/invoices.py"}}
```
Supported tools:
- `inspect_policy_graph`
- `list_routes`
- `read_openapi`
- `read_file`
- `search_code`
- `send_local_request`
- `compare_identities`
- `submit_finding`
- `patch_file`
- `run_visible_tests`
- `submit_fix`
- `noop`
Tools are phase-gated:
- `discover`: inspect policy/routes/files, run safe local requests, compare identities, submit finding.
- `patch`: read/search, patch editable app files, run visible tests, submit final fix.
- `done`: stable terminal observation only.
## Reward
Terminal reward uses stable components:
```python
{
"discovery": 0.0,
"security": 0.0,
"regression": 0.0,
"public_routes": 0.0,
"patch_quality": 0.0,
"visible_tests": 0.0,
"safety": 0.0,
"anti_cheat": 0.0,
"total": 0.0,
}
```
The verifier rewards blocking the hidden exploit while preserving legitimate owner/admin behavior and intentionally public routes. It penalizes deny-all fixes, hardcoded IDs, hidden file probes, external URL attempts, and test/fixture tampering.
## Scenario Generation
`reset(seed)` compiles a fresh isolated workspace under a temp directory. The MVP compiler generates:
- invoices domain policy graph;
- randomized users, tenants, invoices, and IDs;
- generated app files under `app/`;
- visible tests under `tests/test_visible.py`;
- hidden facts kept only in state for deterministic verification.
Additional domains and bug families are scaffolded for extension.
## Testing
```bash
uv run --extra dev pytest
```
The suite covers model serialization, reset/step/state behavior, seed reproducibility, invalid actions, reward outcomes, anti-cheat checks, and scripted rollout policies.
## Training Scaffold
Training files are under `training/`:
- `rollout.py`
- `reward_funcs.py`
- `train_grpo.py`
- `eval_before_after.py`
- `trackio_utils.py`
- `configs/grpo_small.yaml`
The training scaffold is intentionally minimal until the environment/verifier behavior is stable. Trackio metric names and GRPO defaults follow the project brief.
## Modal Ephemeral Runs
Modal Labs support is kept in a separate launcher script so the local OpenEnv server and core training scaffold stay unchanged.
Install the optional local Modal client:
```bash
uv sync --extra modal
```
Run a temporary Modal app for a cheap environment/training smoke check:
```bash
uv run --extra modal modal run scripts/modal_ephemeral_train.py --mode smoke --episodes 4
```
The app is ephemeral: Modal starts it for the command and stops it when the command exits. The remote result is written locally under `outputs/rollouts/`.
You can also validate the GRPO config construction remotely:
```bash
uv run --extra modal modal run scripts/modal_ephemeral_train.py --mode grpo-config
```
The shell wrapper is equivalent:
```bash
MODE=smoke EPISODES=4 uv run --extra modal bash scripts/modal_run_ephemeral.sh
```
## Docker / Spaces
```bash
docker build -t CyberSecurity_OWASP:latest -f server/Dockerfile .
docker run --rm -p 8000:8000 CyberSecurity_OWASP:latest
openenv push --repo-id <username>/CyberSecurity_OWASP
```
|