Spaces:
Sleeping
Sleeping
docs: add initial project brief and architecture documentation for CyberSecurity_OWASP environment
06bfd31 | # 01_ARCHITECTURE.md | |
| # CyberSecurity_OWASP — Architecture | |
| ## 1. System goal | |
| `CyberSecurity_OWASP` is an OpenEnv environment for training a **single LLM policy** to perform a complete defensive authorization-repair workflow: | |
| ```text | |
| Understand policy → discover local evidence → patch code → validate → submit | |
| ``` | |
| The environment is intentionally not a two-agent red-team/blue-team setup. The agent is one model with one trajectory. It must learn both sides of the defensive workflow: finding the policy violation and fixing it safely. | |
| ## 2. Final architecture diagram | |
| ```mermaid | |
| flowchart TB | |
| %% ========================= | |
| %% Offline Build Layer | |
| %% ========================= | |
| subgraph A[Offline Scenario Factory] | |
| A1[Policy Graph Generator\nroles, users, tenants, ownership, route intent] | |
| A2[App Template Library\nFastAPI, Express, Django MVP templates] | |
| A3[Bug Injector\nmissing guard, IDOR, tenant leak, role confusion, query omission] | |
| A4[Scenario Compiler\nmaterializes app + DB + public tests + hidden invariants] | |
| A5[Split Manager\ntrain seeds, validation seeds, hidden held-out seeds] | |
| A1 --> A4 | |
| A2 --> A4 | |
| A3 --> A4 | |
| A5 --> A4 | |
| end | |
| %% ========================= | |
| %% OpenEnv Runtime | |
| %% ========================= | |
| subgraph B[CyberSecurity_OWASP OpenEnv Server] | |
| B1[reset\(\)\nselect scenario + start sandbox] | |
| B2[Sandbox App Runtime\nlocal app, DB fixture, logs, route map] | |
| B3[Tool API exposed through step\(action\)\nReadFile, ListRoutes, SendLocalRequest, RunTests, ApplyPatch, SubmitFix] | |
| B4[State Store\nepisode_id, step_count, scenario_id, patch diff, test history] | |
| B5[Deterministic Reward Engine\npolicy tests + hidden tests + regression tests + penalties] | |
| B6[state\(\)\nstructured metadata for debugging/eval] | |
| B1 --> B2 | |
| B2 --> B3 | |
| B3 --> B4 | |
| B4 --> B5 | |
| B4 --> B6 | |
| end | |
| %% ========================= | |
| %% Agent + Training | |
| %% ========================= | |
| subgraph C[Single LLM Agent] | |
| C1[Observation Parser] | |
| C2[Planner\npolicy reasoning + patch strategy] | |
| C3[Action Generator\nchooses next OpenEnv action] | |
| C1 --> C2 --> C3 | |
| end | |
| subgraph D[Training + Evaluation] | |
| D1[Rollout Loop\nreset → step* → final reward] | |
| D2[GRPO / TRL / Unsloth Training] | |
| D3[Trackio Metrics\nreward curves, pass rates, patch size, steps] | |
| D4[Held-out Eval Suite\nunseen templates, seeds, names, route structures] | |
| D5[Demo Artifacts\nbefore/after traces, mini-blog, 2-minute video] | |
| D1 --> D2 --> D3 | |
| D3 --> D4 --> D5 | |
| end | |
| A4 --> B1 | |
| C3 -->|typed action| B3 | |
| B3 -->|observation + reward + done| C1 | |
| B5 --> D1 | |
| D2 --> C1 | |
| B5 --> D4 | |
| ``` | |
| ## 3. Component responsibilities | |
| ### 3.1 Scenario Factory | |
| The scenario factory generates many small but realistic web apps from a structured authorization policy. | |
| It should output: | |
| - application code; | |
| - route map; | |
| - database fixture; | |
| - user/session/token fixtures; | |
| - policy graph; | |
| - intentionally injected access-control bug; | |
| - public tests visible to the agent; | |
| - hidden tests invisible to the agent; | |
| - metadata for eval and debugging. | |
| The scenario compiler is the main anti-overfitting mechanism. It should vary: | |
| - route names; | |
| - schema names; | |
| - ORM query structure; | |
| - framework template; | |
| - role names; | |
| - tenant IDs; | |
| - object ownership patterns; | |
| - file layout; | |
| - visible test coverage; | |
| - hidden invariant seeds. | |
| ### 3.2 Policy Graph Generator | |
| The policy graph is the ground truth for intended behavior. | |
| Example internal representation: | |
| ```yaml | |
| resources: | |
| invoice: | |
| owner_field: owner_user_id | |
| tenant_field: tenant_id | |
| roles: | |
| user: | |
| can: | |
| - read:invoice where owner_user_id == actor.user_id | |
| - update:invoice where owner_user_id == actor.user_id and status != locked | |
| support: | |
| can: | |
| - read:invoice where tenant_id == actor.tenant_id | |
| admin: | |
| can: | |
| - read:any_invoice where tenant_id == actor.tenant_id | |
| - update:any_invoice where tenant_id == actor.tenant_id | |
| public_routes: | |
| - GET /health | |
| - GET /pricing | |
| forbidden: | |
| - cross_tenant_read | |
| - cross_tenant_update | |
| - user_reads_other_user_invoice | |
| ``` | |
| The policy graph prevents false rewards for over-securing intentionally public or intentionally allowed routes. | |
| ### 3.3 Bug Injector | |
| The bug injector creates controlled, defensive lab scenarios. It should only generate bugs inside local synthetic apps. | |
| MVP bug classes: | |
| | Bug class | Example failure mode | Expected fix type | | |
| |---|---|---| | |
| | Missing route guard | Protected endpoint lacks authorization middleware | Add policy check/middleware | | |
| | IDOR / ownership bug | User can access another user’s object by changing ID | Add owner check in query/policy | | |
| | Tenant leak | Tenant A can list Tenant B records | Add tenant filter | | |
| | Role confusion | Support/editor/admin boundary is wrong | Correct role-to-permission mapping | | |
| | Client-side-only auth | Server trusts UI to hide forbidden action | Enforce server-side authorization | | |
| | Query omission | List/export/search endpoint lacks auth filter | Filter query by actor permissions | | |
| | Over-broad mutation | User can update/delete forbidden object | Add mutation permission check | | |
| | Public route decoy | Agent may wrongly lock down intended public endpoint | Preserve intended public behavior | | |
| ### 3.4 OpenEnv Server | |
| The OpenEnv server should implement the standard lifecycle: | |
| - `reset()` — initialize a fresh scenario instance. | |
| - `step(action)` — execute one typed action and return observation, reward, and done. | |
| - `state()` — expose episode metadata for debugging and evaluation. | |
| Recommended package/class names: | |
| ```text | |
| Repo name: CyberSecurity_OWASP | |
| Python package: cybersecurity_owasp | |
| Client class: CyberSecurityOWASPEnv | |
| Action class: CyberSecurityOWASPAction | |
| Observation: CyberSecurityOWASPObservation | |
| State: CyberSecurityOWASPState | |
| ``` | |
| ### 3.5 Tool API | |
| The agent should interact through typed actions. Keep the interface small enough for RL but expressive enough for realistic repair. | |
| ```python | |
| @dataclass | |
| class CyberSecurityOWASPAction(Action): | |
| action_type: Literal[ | |
| "read_file", | |
| "list_files", | |
| "list_routes", | |
| "inspect_policy", | |
| "send_local_request", | |
| "run_public_tests", | |
| "apply_patch", | |
| "submit_fix", | |
| ] | |
| arguments: dict | |
| ``` | |
| Recommended actions: | |
| | Action | Purpose | Safety boundary | | |
| |---|---|---| | |
| | `inspect_policy` | Read intended authorization rules. | Only synthetic policy. | | |
| | `list_routes` | See local app route map. | No internet target. | | |
| | `read_file` | Inspect selected source file. | Sandbox allowlist only. | | |
| | `send_local_request` | Validate behavior against local app. | Local generated app only. | | |
| | `run_public_tests` | Run visible tests. | No hidden test disclosure. | | |
| | `apply_patch` | Modify source through unified diff. | Patch size and file allowlist limits. | | |
| | `submit_fix` | End episode and trigger hidden eval. | Final hidden score only, no leaked test details. | | |
| ### 3.6 Observation schema | |
| Observations should be compact and structured. | |
| ```python | |
| @dataclass | |
| class CyberSecurityOWASPObservation(Observation): | |
| message: str | |
| visible_policy_summary: str | |
| route_summary: list[dict] | |
| last_action_result: dict | |
| public_test_summary: dict | |
| patch_summary: dict | |
| done_reason: str | None = None | |
| ``` | |
| Do not expose hidden test bodies, hidden expected outputs, or seed-specific solution hints. | |
| ### 3.7 State schema | |
| State should support debugging and training analytics. | |
| ```python | |
| @dataclass | |
| class CyberSecurityOWASPState(State): | |
| episode_id: str | |
| scenario_id: str | |
| split: Literal["train", "validation", "heldout"] | |
| step_count: int = 0 | |
| max_steps: int = 30 | |
| scenario_family: str = "" | |
| app_template: str = "" | |
| files_touched: list[str] = field(default_factory=list) | |
| public_tests_passed: int = 0 | |
| public_tests_total: int = 0 | |
| hidden_tests_passed: int = 0 | |
| hidden_tests_total: int = 0 | |
| accumulated_reward: float = 0.0 | |
| ``` | |
| ## 4. Episode lifecycle | |
| ```text | |
| 1. reset() | |
| - sample train/validation scenario seed | |
| - compile app from policy graph + template + injected bug | |
| - start local sandbox app and DB fixture | |
| - return initial observation | |
| 2. agent loop | |
| - inspect policy/routes/files | |
| - send local requests only inside sandbox | |
| - run public tests | |
| - apply one or more patches | |
| - rerun public tests | |
| 3. submit_fix | |
| - freeze patch | |
| - run public tests | |
| - run hidden authorization invariants | |
| - run regression tests | |
| - compute deterministic reward | |
| - return final observation, reward, done=True | |
| 4. logging | |
| - record scenario_id, action trace, patch diff, reward components | |
| - send metrics to Trackio during training/eval | |
| ``` | |
| ## 5. Reward design | |
| The reward should be deterministic, decomposed, and resistant to reward hacking. | |
| Recommended reward formula: | |
| ```text | |
| R = 0.35 * public_policy_tests | |
| + 0.30 * hidden_authz_invariants | |
| + 0.15 * regression_preservation | |
| + 0.10 * evidence_quality | |
| + 0.05 * patch_minimality | |
| + 0.05 * efficiency | |
| - penalties | |
| ``` | |
| ### Reward components | |
| | Component | Weight | What it rewards | | |
| |---|---:|---| | |
| | Public policy tests | 0.35 | Agent fixes known failing behavior. | | |
| | Hidden authz invariants | 0.30 | Patch generalizes beyond visible tests. | | |
| | Regression preservation | 0.15 | Valid user flows and intended public routes still work. | | |
| | Evidence quality | 0.10 | Agent gathered relevant policy/test/file evidence before patching. | | |
| | Patch minimality | 0.05 | Small focused patches instead of broad rewrites. | | |
| | Efficiency | 0.05 | Fewer wasted steps and repeated actions. | | |
| ### Penalties | |
| | Penalty | Trigger | | |
| |---|---| | |
| | `-0.25` | Breaks public route intentionally marked public. | | |
| | `-0.25` | Deletes tests, policy file, or route instead of fixing authorization. | | |
| | `-0.20` | Hardcodes seed-specific IDs, users, tenants, or hidden assumptions. | | |
| | `-0.15` | Over-broad denial that blocks legitimate authorized users. | | |
| | `-0.10` | Patch exceeds file or diff-size budget. | | |
| | `-1.00` | Attempts external network access, credential extraction, persistence, or unsafe behavior. | | |
| The LLM judge, if used at all, should only annotate trace quality for analysis. It must not decide security-critical reward. | |
| ## 6. Hidden tests and anti-overfitting | |
| Hidden tests are necessary because visible tests can be gamed or memorized. They should test policy invariants rather than exact implementation details. | |
| Use **4 anti-overfitting layers**: | |
| 1. **Seed diversity** — route names, user IDs, tenant IDs, object names, and schemas change every episode. | |
| 2. **Template diversity** — same policy bug appears in different frameworks and file layouts. | |
| 3. **Hidden invariant tests** — final reward uses unseen authorization cases. | |
| 4. **Held-out eval split** — at least 20% of scenario families/seeds are never used in training. | |
| Recommended split: | |
| ```text | |
| Train: 70% | |
| Validation: 10% | |
| Held-out: 20% | |
| ``` | |
| ## 7. Evaluation plan | |
| Run before/after evaluation on the same held-out suite. | |
| ### Metrics | |
| | Metric | Meaning | | |
| |---|---| | |
| | `episode_success_rate` | Public + hidden + regression tests pass. | | |
| | `hidden_authz_pass_rate` | Security-critical hidden checks pass. | | |
| | `regression_pass_rate` | Normal valid behavior remains intact. | | |
| | `oversecure_rate` | Agent blocks intended legitimate/public behavior. | | |
| | `patch_compile_rate` | Patch applies and app still runs. | | |
| | `median_steps_to_submit` | Efficiency of the repair workflow. | | |
| | `median_files_changed` | Patch focus/minimality. | | |
| | `reward_hacking_rate` | Attempts to delete tests, hardcode fixtures, or bypass eval. | | |
| ### Eval table template | |
| | Model | Split | Success | Hidden authz | Regression | Oversecure | Median steps | Median files changed | | |
| |---|---|---:|---:|---:|---:|---:|---:| | |
| | Base model | heldout | TBD | TBD | TBD | TBD | TBD | TBD | | |
| | RL-trained model | heldout | TBD | TBD | TBD | TBD | TBD | TBD | | |
| ## 8. Training flow | |
| ```text | |
| 1. Build CyberSecurity_OWASP OpenEnv server. | |
| 2. Generate 600 MVP scenarios. | |
| 3. Run baseline eval with the base model. | |
| 4. Train with GRPO/TRL or Unsloth using rollout episodes. | |
| 5. Log reward components to Trackio. | |
| 6. Run held-out eval every N training steps. | |
| 7. Inspect failure clusters. | |
| 8. Add scenario mutations only if failures reveal overfitting. | |
| 9. Produce final demo: before/after trace + reward curve + held-out eval table. | |
| ``` | |
| Recommended initial training setup: | |
| ```text | |
| Model: Qwen/Qwen3-1.7B or similar small instruct model | |
| Algorithm: GRPO via TRL or Unsloth-compatible loop | |
| Dataset prompt: repeated task instruction with randomized scenario IDs | |
| Max steps per episode: 30 | |
| Rollouts per prompt: 2-4 | |
| Logging: Trackio | |
| Primary eval: held-out deterministic test pass rate | |
| ``` | |
| ## 9. Deployment architecture | |
| The environment should be runnable in 3 modes: | |
| | Mode | Purpose | | |
| |---|---| | |
| | Local Uvicorn | Fast engineer iteration. | | |
| | Docker | Reproducible local training/eval. | | |
| | Hugging Face Spaces | Public hackathon demo and OpenEnv-compliant hosting. | | |
| Expected endpoints: | |
| ```text | |
| /ws OpenEnv client session | |
| /health health check | |
| /reset debug reset | |
| /step debug step | |
| /state debug state | |
| /docs FastAPI docs | |
| /web optional web UI | |
| ``` | |
| ## 10. Implementation milestones | |
| ### Milestone 1 — Skeleton environment | |
| - `models.py` | |
| - `client.py` | |
| - `server/environment.py` | |
| - `server/app.py` | |
| - `server/Dockerfile` | |
| - `openenv.yaml` | |
| - health check | |
| - one hand-written scenario | |
| ### Milestone 2 — Scenario compiler | |
| - policy graph format | |
| - app template renderer | |
| - bug injector | |
| - DB fixture generator | |
| - public and hidden test generator | |
| ### Milestone 3 — Reward engine | |
| - public test score | |
| - hidden invariant score | |
| - regression score | |
| - patch minimality score | |
| - safety/reward-hacking penalties | |
| - reward component logging | |
| ### Milestone 4 — Training script | |
| - rollout loop | |
| - GRPO/TRL or Unsloth training script | |
| - Trackio logging | |
| - checkpoint save/push | |
| - baseline and post-training eval | |
| ### Milestone 5 — Hackathon demo | |
| - HF Spaces deployment | |
| - mini-blog | |
| - 2-minute video | |
| - before/after traces | |
| - reward curve | |
| - held-out eval table | |
| ## 11. Engineering notes | |
| - Keep scenario apps small: ideally 5-15 files each. | |
| - Prefer deterministic tests over LLM judging. | |
| - Hide final hidden test details from observations. | |
| - Log enough trace data to debug failures but never leak hidden tests to the agent. | |
| - Include intentionally public routes and allowed cross-role cases so the model does not learn “add auth everywhere.” | |
| - The best demo is not just “agent finds bug,” but “agent learns not to break valid business behavior.” | |
| ## 12. Source notes and credibility | |
| | Source | How it informs this architecture | Credibility | | |
| |---|---|---:| | |
| | OWASP Top 10 2025 / A01 Broken Access Control | Confirms why access control is the right security focus. | 10/10 | | |
| | OWASP ASVS access-control guidance | Informs policy invariants and server-side authorization checks. | 9.5/10 | | |
| | OpenEnv environment-building docs | Defines required models, reset/step/state, FastAPI server, Docker, and client. | 8.5/10 | | |
| | OpenEnv quickstart/architecture docs | Informs WebSocket client/server design, typed EnvClient, and container isolation. | 8.5/10 | | |
| | OpenEnv deployment docs | Informs HF Spaces deployment, endpoints, Docker workflow, and installable client package. | 8.5/10 | | |
| | Hackathon judging criteria | Informs demo priorities: innovation, storytelling, reward improvement, and training pipeline. | 9/10 | | |
| | TRL/OpenEnv training example | Informs rollout function, decomposed reward functions, and Trackio logging pattern. | 8/10 | | |