Spaces:

Humanlearning
/

Cyber_analyst-round1

Sleeping

feat: implement core RL training infrastructure, including GRPO training, evaluation utilities, custom environments, and Modal-based execution scripts.

3807ea3 13 days ago

preview code

raw

history blame

4.48 kB

	---
	title: CyberSecurity_OWASP Environment Server
	emoji: 🛡️
	colorFrom: blue
	colorTo: gray
	sdk: docker
	pinned: false
	app_port: 8000
	base_path: /web
	tags:
	- openenv
	- cybersecurity
	- owasp
	---

	# CyberSecurity_OWASP

	`CyberSecurity_OWASP` is an OpenEnv-compliant reinforcement-learning environment for a single LLM agent that performs a defensive authorization-repair workflow:

	```text
	inspect generated app + policy -> discover authorization bug -> submit finding -> patch code -> preserve intended behavior
	```

	The current implementation includes a functional MVP scenario: an invoices FastAPI-style app with one injected OWASP A01 BOLA/IDOR defect, visible tests, hidden deterministic verifier checks, anti-cheat safeguards, and decomposed reward.

	## Quick Start

	```bash
	uv sync --extra dev
	uv run --extra dev pytest
	uv run server --port 8000
	```

	Then connect with the OpenEnv client:

	```python
	from CyberSecurity_OWASP import CyberSecurityOWASPAction, CyberSecurityOWASPEnv

	with CyberSecurityOWASPEnv(base_url="http://localhost:8000") as env:
	result = env.reset(seed=7)
	print(result.observation.task_brief)
	result = env.step(CyberSecurityOWASPAction(tool_name="list_routes"))
	print(result.observation.last_tool_result)
	```

	## Action Space

	The agent emits one JSON action at a time:

	```json
	{"tool_name":"read_file","arguments":{"path":"app/routes/invoices.py"}}
	```

	Supported tools:

	- `inspect_policy_graph`
	- `list_routes`
	- `read_openapi`
	- `read_file`
	- `search_code`
	- `send_local_request`
	- `compare_identities`
	- `submit_finding`
	- `patch_file`
	- `run_visible_tests`
	- `submit_fix`
	- `noop`

	Tools are phase-gated:

	- `discover`: inspect policy/routes/files, run safe local requests, compare identities, submit finding.
	- `patch`: read/search, patch editable app files, run visible tests, submit final fix.
	- `done`: stable terminal observation only.

	## Reward

	Terminal reward uses stable components:

	```python
	{
	"discovery": 0.0,
	"security": 0.0,
	"regression": 0.0,
	"public_routes": 0.0,
	"patch_quality": 0.0,
	"visible_tests": 0.0,
	"safety": 0.0,
	"anti_cheat": 0.0,
	"total": 0.0,
	}
	```

	The verifier rewards blocking the hidden exploit while preserving legitimate owner/admin behavior and intentionally public routes. It penalizes deny-all fixes, hardcoded IDs, hidden file probes, external URL attempts, and test/fixture tampering.

	## Scenario Generation

	`reset(seed)` compiles a fresh isolated workspace under a temp directory. The MVP compiler generates:

	- invoices domain policy graph;
	- randomized users, tenants, invoices, and IDs;
	- generated app files under `app/`;
	- visible tests under `tests/test_visible.py`;
	- hidden facts kept only in state for deterministic verification.

	Additional domains and bug families are scaffolded for extension.

	## Testing

	```bash
	uv run --extra dev pytest
	```

	The suite covers model serialization, reset/step/state behavior, seed reproducibility, invalid actions, reward outcomes, anti-cheat checks, and scripted rollout policies.

	## Training Scaffold

	Training files are under `training/`:

	- `rollout.py`
	- `reward_funcs.py`
	- `train_grpo.py`
	- `eval_before_after.py`
	- `trackio_utils.py`
	- `configs/grpo_small.yaml`

	The training scaffold is intentionally minimal until the environment/verifier behavior is stable. Trackio metric names and GRPO defaults follow the project brief.

	## Modal Ephemeral Runs

	Modal Labs support is kept in a separate launcher script so the local OpenEnv server and core training scaffold stay unchanged.

	Install the optional local Modal client:

	```bash
	uv sync --extra modal
	```

	Run a temporary Modal app for a cheap environment/training smoke check:

	```bash
	uv run --extra modal modal run scripts/modal_ephemeral_train.py --mode smoke --episodes 4
	```

	The app is ephemeral: Modal starts it for the command and stops it when the command exits. The remote result is written locally under `outputs/rollouts/`.

	You can also validate the GRPO config construction remotely:

	```bash
	uv run --extra modal modal run scripts/modal_ephemeral_train.py --mode grpo-config
	```

	The shell wrapper is equivalent:

	```bash
	MODE=smoke EPISODES=4 uv run --extra modal bash scripts/modal_run_ephemeral.sh
	```

	## Docker / Spaces

	```bash
	docker build -t CyberSecurity_OWASP:latest -f server/Dockerfile .
	docker run --rm -p 8000:8000 CyberSecurity_OWASP:latest
	openenv push --repo-id <username>/CyberSecurity_OWASP
	```