# 01_ARCHITECTURE.md # CyberSecurity_OWASP — Architecture ## 1. System goal `CyberSecurity_OWASP` is an OpenEnv environment for training a **single LLM policy** to perform a complete defensive authorization-repair workflow: ```text Understand policy → discover local evidence → patch code → validate → submit ``` The environment is intentionally not a two-agent red-team/blue-team setup. The agent is one model with one trajectory. It must learn both sides of the defensive workflow: finding the policy violation and fixing it safely. ## 2. Final architecture diagram ```mermaid flowchart TB %% ========================= %% Offline Build Layer %% ========================= subgraph A[Offline Scenario Factory] A1[Policy Graph Generator\nroles, users, tenants, ownership, route intent] A2[App Template Library\nFastAPI, Express, Django MVP templates] A3[Bug Injector\nmissing guard, IDOR, tenant leak, role confusion, query omission] A4[Scenario Compiler\nmaterializes app + DB + public tests + hidden invariants] A5[Split Manager\ntrain seeds, validation seeds, hidden held-out seeds] A1 --> A4 A2 --> A4 A3 --> A4 A5 --> A4 end %% ========================= %% OpenEnv Runtime %% ========================= subgraph B[CyberSecurity_OWASP OpenEnv Server] B1[reset\(\)\nselect scenario + start sandbox] B2[Sandbox App Runtime\nlocal app, DB fixture, logs, route map] B3[Tool API exposed through step\(action\)\nReadFile, ListRoutes, SendLocalRequest, RunTests, ApplyPatch, SubmitFix] B4[State Store\nepisode_id, step_count, scenario_id, patch diff, test history] B5[Deterministic Reward Engine\npolicy tests + hidden tests + regression tests + penalties] B6[state\(\)\nstructured metadata for debugging/eval] B1 --> B2 B2 --> B3 B3 --> B4 B4 --> B5 B4 --> B6 end %% ========================= %% Agent + Training %% ========================= subgraph C[Single LLM Agent] C1[Observation Parser] C2[Planner\npolicy reasoning + patch strategy] C3[Action Generator\nchooses next OpenEnv action] C1 --> C2 --> C3 end subgraph D[Training + Evaluation] D1[Rollout Loop\nreset → step* → final reward] D2[GRPO / TRL / Unsloth Training] D3[Trackio Metrics\nreward curves, pass rates, patch size, steps] D4[Held-out Eval Suite\nunseen templates, seeds, names, route structures] D5[Demo Artifacts\nbefore/after traces, mini-blog, 2-minute video] D1 --> D2 --> D3 D3 --> D4 --> D5 end A4 --> B1 C3 -->|typed action| B3 B3 -->|observation + reward + done| C1 B5 --> D1 D2 --> C1 B5 --> D4 ``` ## 3. Component responsibilities ### 3.1 Scenario Factory The scenario factory generates many small but realistic web apps from a structured authorization policy. It should output: - application code; - route map; - database fixture; - user/session/token fixtures; - policy graph; - intentionally injected access-control bug; - public tests visible to the agent; - hidden tests invisible to the agent; - metadata for eval and debugging. The scenario compiler is the main anti-overfitting mechanism. It should vary: - route names; - schema names; - ORM query structure; - framework template; - role names; - tenant IDs; - object ownership patterns; - file layout; - visible test coverage; - hidden invariant seeds. ### 3.2 Policy Graph Generator The policy graph is the ground truth for intended behavior. Example internal representation: ```yaml resources: invoice: owner_field: owner_user_id tenant_field: tenant_id roles: user: can: - read:invoice where owner_user_id == actor.user_id - update:invoice where owner_user_id == actor.user_id and status != locked support: can: - read:invoice where tenant_id == actor.tenant_id admin: can: - read:any_invoice where tenant_id == actor.tenant_id - update:any_invoice where tenant_id == actor.tenant_id public_routes: - GET /health - GET /pricing forbidden: - cross_tenant_read - cross_tenant_update - user_reads_other_user_invoice ``` The policy graph prevents false rewards for over-securing intentionally public or intentionally allowed routes. ### 3.3 Bug Injector The bug injector creates controlled, defensive lab scenarios. It should only generate bugs inside local synthetic apps. MVP bug classes: | Bug class | Example failure mode | Expected fix type | |---|---|---| | Missing route guard | Protected endpoint lacks authorization middleware | Add policy check/middleware | | IDOR / ownership bug | User can access another user’s object by changing ID | Add owner check in query/policy | | Tenant leak | Tenant A can list Tenant B records | Add tenant filter | | Role confusion | Support/editor/admin boundary is wrong | Correct role-to-permission mapping | | Client-side-only auth | Server trusts UI to hide forbidden action | Enforce server-side authorization | | Query omission | List/export/search endpoint lacks auth filter | Filter query by actor permissions | | Over-broad mutation | User can update/delete forbidden object | Add mutation permission check | | Public route decoy | Agent may wrongly lock down intended public endpoint | Preserve intended public behavior | ### 3.4 OpenEnv Server The OpenEnv server should implement the standard lifecycle: - `reset()` — initialize a fresh scenario instance. - `step(action)` — execute one typed action and return observation, reward, and done. - `state()` — expose episode metadata for debugging and evaluation. Recommended package/class names: ```text Repo name: CyberSecurity_OWASP Python package: cybersecurity_owasp Client class: CyberSecurityOWASPEnv Action class: CyberSecurityOWASPAction Observation: CyberSecurityOWASPObservation State: CyberSecurityOWASPState ``` ### 3.5 Tool API The agent should interact through typed actions. Keep the interface small enough for RL but expressive enough for realistic repair. ```python @dataclass class CyberSecurityOWASPAction(Action): action_type: Literal[ "read_file", "list_files", "list_routes", "inspect_policy", "send_local_request", "run_public_tests", "apply_patch", "submit_fix", ] arguments: dict ``` Recommended actions: | Action | Purpose | Safety boundary | |---|---|---| | `inspect_policy` | Read intended authorization rules. | Only synthetic policy. | | `list_routes` | See local app route map. | No internet target. | | `read_file` | Inspect selected source file. | Sandbox allowlist only. | | `send_local_request` | Validate behavior against local app. | Local generated app only. | | `run_public_tests` | Run visible tests. | No hidden test disclosure. | | `apply_patch` | Modify source through unified diff. | Patch size and file allowlist limits. | | `submit_fix` | End episode and trigger hidden eval. | Final hidden score only, no leaked test details. | ### 3.6 Observation schema Observations should be compact and structured. ```python @dataclass class CyberSecurityOWASPObservation(Observation): message: str visible_policy_summary: str route_summary: list[dict] last_action_result: dict public_test_summary: dict patch_summary: dict done_reason: str | None = None ``` Do not expose hidden test bodies, hidden expected outputs, or seed-specific solution hints. ### 3.7 State schema State should support debugging and training analytics. ```python @dataclass class CyberSecurityOWASPState(State): episode_id: str scenario_id: str split: Literal["train", "validation", "heldout"] step_count: int = 0 max_steps: int = 30 scenario_family: str = "" app_template: str = "" files_touched: list[str] = field(default_factory=list) public_tests_passed: int = 0 public_tests_total: int = 0 hidden_tests_passed: int = 0 hidden_tests_total: int = 0 accumulated_reward: float = 0.0 ``` ## 4. Episode lifecycle ```text 1. reset() - sample train/validation scenario seed - compile app from policy graph + template + injected bug - start local sandbox app and DB fixture - return initial observation 2. agent loop - inspect policy/routes/files - send local requests only inside sandbox - run public tests - apply one or more patches - rerun public tests 3. submit_fix - freeze patch - run public tests - run hidden authorization invariants - run regression tests - compute deterministic reward - return final observation, reward, done=True 4. logging - record scenario_id, action trace, patch diff, reward components - send metrics to Trackio during training/eval ``` ## 5. Reward design The reward should be deterministic, decomposed, and resistant to reward hacking. Recommended reward formula: ```text R = 0.35 * public_policy_tests + 0.30 * hidden_authz_invariants + 0.15 * regression_preservation + 0.10 * evidence_quality + 0.05 * patch_minimality + 0.05 * efficiency - penalties ``` ### Reward components | Component | Weight | What it rewards | |---|---:|---| | Public policy tests | 0.35 | Agent fixes known failing behavior. | | Hidden authz invariants | 0.30 | Patch generalizes beyond visible tests. | | Regression preservation | 0.15 | Valid user flows and intended public routes still work. | | Evidence quality | 0.10 | Agent gathered relevant policy/test/file evidence before patching. | | Patch minimality | 0.05 | Small focused patches instead of broad rewrites. | | Efficiency | 0.05 | Fewer wasted steps and repeated actions. | ### Penalties | Penalty | Trigger | |---|---| | `-0.25` | Breaks public route intentionally marked public. | | `-0.25` | Deletes tests, policy file, or route instead of fixing authorization. | | `-0.20` | Hardcodes seed-specific IDs, users, tenants, or hidden assumptions. | | `-0.15` | Over-broad denial that blocks legitimate authorized users. | | `-0.10` | Patch exceeds file or diff-size budget. | | `-1.00` | Attempts external network access, credential extraction, persistence, or unsafe behavior. | The LLM judge, if used at all, should only annotate trace quality for analysis. It must not decide security-critical reward. ## 6. Hidden tests and anti-overfitting Hidden tests are necessary because visible tests can be gamed or memorized. They should test policy invariants rather than exact implementation details. Use **4 anti-overfitting layers**: 1. **Seed diversity** — route names, user IDs, tenant IDs, object names, and schemas change every episode. 2. **Template diversity** — same policy bug appears in different frameworks and file layouts. 3. **Hidden invariant tests** — final reward uses unseen authorization cases. 4. **Held-out eval split** — at least 20% of scenario families/seeds are never used in training. Recommended split: ```text Train: 70% Validation: 10% Held-out: 20% ``` ## 7. Evaluation plan Run before/after evaluation on the same held-out suite. ### Metrics | Metric | Meaning | |---|---| | `episode_success_rate` | Public + hidden + regression tests pass. | | `hidden_authz_pass_rate` | Security-critical hidden checks pass. | | `regression_pass_rate` | Normal valid behavior remains intact. | | `oversecure_rate` | Agent blocks intended legitimate/public behavior. | | `patch_compile_rate` | Patch applies and app still runs. | | `median_steps_to_submit` | Efficiency of the repair workflow. | | `median_files_changed` | Patch focus/minimality. | | `reward_hacking_rate` | Attempts to delete tests, hardcode fixtures, or bypass eval. | ### Eval table template | Model | Split | Success | Hidden authz | Regression | Oversecure | Median steps | Median files changed | |---|---|---:|---:|---:|---:|---:|---:| | Base model | heldout | TBD | TBD | TBD | TBD | TBD | TBD | | RL-trained model | heldout | TBD | TBD | TBD | TBD | TBD | TBD | ## 8. Training flow ```text 1. Build CyberSecurity_OWASP OpenEnv server. 2. Generate 600 MVP scenarios. 3. Run baseline eval with the base model. 4. Train with GRPO/TRL or Unsloth using rollout episodes. 5. Log reward components to Trackio. 6. Run held-out eval every N training steps. 7. Inspect failure clusters. 8. Add scenario mutations only if failures reveal overfitting. 9. Produce final demo: before/after trace + reward curve + held-out eval table. ``` Recommended initial training setup: ```text Model: Qwen/Qwen3-1.7B or similar small instruct model Algorithm: GRPO via TRL or Unsloth-compatible loop Dataset prompt: repeated task instruction with randomized scenario IDs Max steps per episode: 30 Rollouts per prompt: 2-4 Logging: Trackio Primary eval: held-out deterministic test pass rate ``` ## 9. Deployment architecture The environment should be runnable in 3 modes: | Mode | Purpose | |---|---| | Local Uvicorn | Fast engineer iteration. | | Docker | Reproducible local training/eval. | | Hugging Face Spaces | Public hackathon demo and OpenEnv-compliant hosting. | Expected endpoints: ```text /ws OpenEnv client session /health health check /reset debug reset /step debug step /state debug state /docs FastAPI docs /web optional web UI ``` ## 10. Implementation milestones ### Milestone 1 — Skeleton environment - `models.py` - `client.py` - `server/environment.py` - `server/app.py` - `server/Dockerfile` - `openenv.yaml` - health check - one hand-written scenario ### Milestone 2 — Scenario compiler - policy graph format - app template renderer - bug injector - DB fixture generator - public and hidden test generator ### Milestone 3 — Reward engine - public test score - hidden invariant score - regression score - patch minimality score - safety/reward-hacking penalties - reward component logging ### Milestone 4 — Training script - rollout loop - GRPO/TRL or Unsloth training script - Trackio logging - checkpoint save/push - baseline and post-training eval ### Milestone 5 — Hackathon demo - HF Spaces deployment - mini-blog - 2-minute video - before/after traces - reward curve - held-out eval table ## 11. Engineering notes - Keep scenario apps small: ideally 5-15 files each. - Prefer deterministic tests over LLM judging. - Hide final hidden test details from observations. - Log enough trace data to debug failures but never leak hidden tests to the agent. - Include intentionally public routes and allowed cross-role cases so the model does not learn “add auth everywhere.” - The best demo is not just “agent finds bug,” but “agent learns not to break valid business behavior.” ## 12. Source notes and credibility | Source | How it informs this architecture | Credibility | |---|---|---:| | OWASP Top 10 2025 / A01 Broken Access Control | Confirms why access control is the right security focus. | 10/10 | | OWASP ASVS access-control guidance | Informs policy invariants and server-side authorization checks. | 9.5/10 | | OpenEnv environment-building docs | Defines required models, reset/step/state, FastAPI server, Docker, and client. | 8.5/10 | | OpenEnv quickstart/architecture docs | Informs WebSocket client/server design, typed EnvClient, and container isolation. | 8.5/10 | | OpenEnv deployment docs | Informs HF Spaces deployment, endpoints, Docker workflow, and installable client package. | 8.5/10 | | Hackathon judging criteria | Informs demo priorities: innovation, storytelling, reward improvement, and training pipeline. | 9/10 | | TRL/OpenEnv training example | Informs rollout function, decomposed reward functions, and Trackio logging pattern. | 8/10 |