Humanlearning commited on
Commit
f3080d1
·
1 Parent(s): 3807ea3

feat: implement core RL training infrastructure and architecture documentation

Browse files
.dockerignore ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ .git
2
+ .venv
3
+ __pycache__
4
+ *.pyc
5
+ .pytest_cache
6
+ openenv_CyberSecurity_OWASP.egg-info
7
+ outputs
8
+ .env
9
+ .env.*
.gitignore ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ .env.local
2
+ .env.*
3
+ __pycache__/
4
+ *.pyc
.hfignore ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .git/
2
+ .venv/
3
+ __pycache__/
4
+ **/__pycache__/
5
+ *.pyc
6
+ .pytest_cache/
7
+ openenv_CyberSecurity_OWASP.egg-info/
8
+ outputs/logs/*
9
+ outputs/evals/*
10
+ outputs/rollouts/*
11
+ .env
12
+ .env.*
01_ARCHITECTURE.md CHANGED
@@ -14,6 +14,12 @@ The environment is intentionally not a two-agent red-team/blue-team setup. The a
14
 
15
  ## 2. Final architecture diagram
16
 
 
 
 
 
 
 
17
  ```mermaid
18
  flowchart TB
19
  %% =========================
@@ -363,6 +369,12 @@ Run before/after evaluation on the same held-out suite.
363
 
364
  ## 8. Training flow
365
 
 
 
 
 
 
 
366
  ```text
367
  1. Build CyberSecurity_OWASP OpenEnv server.
368
  2. Generate 600 MVP scenarios.
@@ -476,4 +488,3 @@ Expected endpoints:
476
  | OpenEnv deployment docs | Informs HF Spaces deployment, endpoints, Docker workflow, and installable client package. | 8.5/10 |
477
  | Hackathon judging criteria | Informs demo priorities: innovation, storytelling, reward improvement, and training pipeline. | 9/10 |
478
  | TRL/OpenEnv training example | Informs rollout function, decomposed reward functions, and Trackio logging pattern. | 8/10 |
479
-
 
14
 
15
  ## 2. Final architecture diagram
16
 
17
+ Rendered asset:
18
+
19
+ ![CyberSecurity_OWASP architecture](assets/architecture_diagram.svg)
20
+
21
+ Editable source: `assets/architecture_diagram.mmd`
22
+
23
  ```mermaid
24
  flowchart TB
25
  %% =========================
 
369
 
370
  ## 8. Training flow
371
 
372
+ Rendered asset:
373
+
374
+ ![CyberSecurity_OWASP RL training flow](assets/env_rl_training_flow_diagram.svg)
375
+
376
+ Editable source: `assets/env_rl_training_flow_diagram.mmd`
377
+
378
  ```text
379
  1. Build CyberSecurity_OWASP OpenEnv server.
380
  2. Generate 600 MVP scenarios.
 
488
  | OpenEnv deployment docs | Informs HF Spaces deployment, endpoints, Docker workflow, and installable client package. | 8.5/10 |
489
  | Hackathon judging criteria | Informs demo priorities: innovation, storytelling, reward improvement, and training pipeline. | 9/10 |
490
  | TRL/OpenEnv training example | Informs rollout function, decomposed reward functions, and Trackio logging pattern. | 8/10 |
 
Dockerfile ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
2
+ FROM ${BASE_IMAGE} AS builder
3
+
4
+ WORKDIR /app/env
5
+
6
+ COPY pyproject.toml uv.lock ./
7
+ COPY README.md openenv.yaml ./
8
+ COPY __init__.py client.py models.py ./
9
+ COPY bug_mutator.py evals.py fixture_generator.py policy_graph.py rewards.py safety.py scenario_compiler.py template_renderer.py validators.py ./
10
+ COPY server ./server
11
+ COPY training ./training
12
+ COPY scripts ./scripts
13
+ COPY tests ./tests
14
+
15
+ RUN --mount=type=cache,target=/root/.cache/uv \
16
+ uv sync --frozen --no-editable
17
+
18
+ FROM ${BASE_IMAGE}
19
+
20
+ WORKDIR /app/env
21
+ COPY --from=builder /app/env /app/env
22
+ ENV PATH="/app/env/.venv/bin:$PATH"
23
+ ENV PYTHONPATH="/app/env:$PYTHONPATH"
24
+
25
+ HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
26
+ CMD curl -f http://localhost:8000/health || exit 1
27
+
28
+ CMD ["uvicorn", "CyberSecurity_OWASP.server.app:app", "--host", "0.0.0.0", "--port", "8000"]
README.md CHANGED
@@ -23,6 +23,14 @@ inspect generated app + policy -> discover authorization bug -> submit finding -
23
 
24
  The current implementation includes a functional MVP scenario: an invoices FastAPI-style app with one injected OWASP A01 BOLA/IDOR defect, visible tests, hidden deterministic verifier checks, anti-cheat safeguards, and decomposed reward.
25
 
 
 
 
 
 
 
 
 
26
  ## Quick Start
27
 
28
  ```bash
@@ -155,6 +163,39 @@ The shell wrapper is equivalent:
155
  MODE=smoke EPISODES=4 uv run --extra modal bash scripts/modal_run_ephemeral.sh
156
  ```
157
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
158
  ## Docker / Spaces
159
 
160
  ```bash
 
23
 
24
  The current implementation includes a functional MVP scenario: an invoices FastAPI-style app with one injected OWASP A01 BOLA/IDOR defect, visible tests, hidden deterministic verifier checks, anti-cheat safeguards, and decomposed reward.
25
 
26
+ ## Diagrams
27
+
28
+ ![CyberSecurity_OWASP architecture](assets/architecture_diagram.svg)
29
+
30
+ ![CyberSecurity_OWASP RL training flow](assets/env_rl_training_flow_diagram.svg)
31
+
32
+ Editable Mermaid sources are available in `assets/architecture_diagram.mmd` and `assets/env_rl_training_flow_diagram.mmd`.
33
+
34
  ## Quick Start
35
 
36
  ```bash
 
163
  MODE=smoke EPISODES=4 uv run --extra modal bash scripts/modal_run_ephemeral.sh
164
  ```
165
 
166
+ ## Modal GRPO Training
167
+
168
+ The persistent GPU training launcher packages this local repo into Modal, trains
169
+ a small LoRA GRPO run, logs metrics and traces to Trackio, stores checkpoints in
170
+ the `CyberSecurity_OWASP-grpo-runs` Modal volume, and pushes the output adapter
171
+ to Hugging Face Hub.
172
+
173
+ Create a Modal secret named `CyberSecurity_OWASP-secrets` with `HF_TOKEN`, then
174
+ run the import/config check:
175
+
176
+ ```bash
177
+ uv run --extra modal modal run scripts/modal_train_grpo.py --mode config
178
+ ```
179
+
180
+ Run the default smoke GRPO job:
181
+
182
+ ```bash
183
+ uv run --extra modal modal run scripts/modal_train_grpo.py \
184
+ --max-steps 10 \
185
+ --dataset-size 16 \
186
+ --num-generations 2 \
187
+ --difficulty 0
188
+ ```
189
+
190
+ Defaults are derived from `HF_TOKEN`:
191
+
192
+ - Trackio Space: `<hf-user>/CyberSecurity_OWASP-trackio`
193
+ - Trackio project: `CyberSecurity_OWASP-grpo`
194
+ - Output repo: `<hf-user>/CyberSecurity_OWASP-qwen3-1.7b-grpo-lora`
195
+
196
+ Override these with `--trackio-space-id`, `--trackio-project`, and
197
+ `--output-repo-id` when needed.
198
+
199
  ## Docker / Spaces
200
 
201
  ```bash
assets/architecture_diagram.mmd ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ flowchart LR
2
+ subgraph Factory["Scenario Factory"]
3
+ Policy["Policy graph\nusers, roles, tenants, ownership"]
4
+ Templates["FastAPI template renderer\nroutes, services, auth helpers"]
5
+ Mutator["A01 bug mutator\none injected authorization defect"]
6
+ Fixtures["Fixture generator\nvisible tests + hidden facts"]
7
+ Compiler["Scenario compiler\nseeded workspace"]
8
+ Policy --> Compiler
9
+ Templates --> Compiler
10
+ Mutator --> Compiler
11
+ Fixtures --> Compiler
12
+ end
13
+
14
+ subgraph Runtime["CyberSecurity_OWASP OpenEnv Runtime"]
15
+ Reset["reset(seed)\ncompile fresh scenario"]
16
+ Env["Environment state\nphase, history, metrics, hidden facts"]
17
+ Tools["Typed step(action) tools\ninspect, read, request, patch, test, submit"]
18
+ Sandbox["Generated local app workspace\neditable app files only"]
19
+ Verifier["Deterministic verifier\nsecurity + regression + public routes"]
20
+ Reward["Reward engine\nstable component breakdown"]
21
+ App["FastAPI OpenEnv server\n/ws, /reset, /step, /state"]
22
+ Reset --> Env
23
+ Env --> Tools
24
+ Tools <--> Sandbox
25
+ Tools --> Verifier
26
+ Verifier --> Reward
27
+ Reward --> Env
28
+ Env --> App
29
+ end
30
+
31
+ subgraph Agent["Single LLM Agent"]
32
+ Obs["Observation parser"]
33
+ Reason["Policy and code reasoning"]
34
+ Act["One JSON action"]
35
+ Obs --> Reason --> Act
36
+ end
37
+
38
+ subgraph Ops["Training, Evaluation, Demo"]
39
+ Rollout["Rollout loop\nreset -> step* -> terminal reward"]
40
+ GRPO["TRL GRPO / LoRA training"]
41
+ Trackio["Trackio metrics\nreward and pass rates"]
42
+ Eval["Held-out evaluation\nunseen seeds/layouts/domains"]
43
+ Artifacts["Rollout artifacts\nbefore/after traces"]
44
+ Rollout --> GRPO --> Trackio --> Eval --> Artifacts
45
+ end
46
+
47
+ Compiler --> Reset
48
+ App --> Obs
49
+ Act --> App
50
+ Reward --> Rollout
51
+ GRPO --> Agent
assets/architecture_diagram.svg ADDED
assets/env_rl_training_flow_diagram.mmd ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ flowchart TD
2
+ Start["Start run\nselect base model + config"] --> Cache["Prepare scenario splits\ntrain, validation, hidden_eval"]
3
+ Cache --> Baseline["Baseline evaluation\nscripted/model rollouts"]
4
+ Baseline --> TrainLoop["GRPO training loop"]
5
+
6
+ subgraph Episode["One OpenEnv Episode"]
7
+ Reset["env.reset(seed)\nnew generated app + policy"] --> Observe["Observation\nphase, hints, available tools"]
8
+ Observe --> Prompt["Build action prompt\nJSON action only"]
9
+ Prompt --> Generate["LLM generates action"]
10
+ Generate --> Step["env.step(action)\nphase gate + execute tool"]
11
+ Step --> Intermediate{"done?"}
12
+ Intermediate -- "no" --> Observe
13
+ Intermediate -- "yes" --> Final["Terminal verifier\nhidden security + regression + anti-cheat"]
14
+ end
15
+
16
+ TrainLoop --> Reset
17
+ Final --> Rewards["Reward components\ndiscovery, security, regression, public_routes,\npatch_quality, visible_tests, safety, anti_cheat"]
18
+ Rewards --> Update["GRPO update\nLoRA adapter checkpoint"]
19
+ Update --> Metrics["Trackio logging\nreward means, pass rates, invalid actions, latency"]
20
+ Metrics --> Validate{"Validation plateau\nor failure cluster?"}
21
+ Validate -- "continue" --> TrainLoop
22
+ Validate -- "adjust curriculum" --> Curriculum["Curriculum controller\nrebalance difficulty and traps"]
23
+ Curriculum --> TrainLoop
24
+ Validate -- "final checkpoint" --> Heldout["Held-out eval\nunseen seeds/layouts/domain combos"]
25
+ Heldout --> Compare["Before/after summary\nsuccess, reward, exploit-block, regression preservation"]
26
+ Compare --> Artifacts["Saved artifacts\noutputs/evals + outputs/rollouts"]
assets/env_rl_training_flow_diagram.svg ADDED
evals.py CHANGED
@@ -5,7 +5,10 @@ from __future__ import annotations
5
  import difflib
6
  from typing import Iterable
7
 
8
- from .models import CyberSecurityOWASPAction
 
 
 
9
 
10
 
11
  def random_policy() -> Iterable[CyberSecurityOWASPAction]:
 
5
  import difflib
6
  from typing import Iterable
7
 
8
+ try:
9
+ from .models import CyberSecurityOWASPAction
10
+ except ImportError: # pragma: no cover
11
+ from models import CyberSecurityOWASPAction
12
 
13
 
14
  def random_policy() -> Iterable[CyberSecurityOWASPAction]:
rewards.py CHANGED
@@ -2,7 +2,10 @@
2
 
3
  from __future__ import annotations
4
 
5
- from .models import CyberSecurityOWASPAction, CyberSecurityOWASPState
 
 
 
6
 
7
 
8
  REWARD_KEYS = (
 
2
 
3
  from __future__ import annotations
4
 
5
+ try:
6
+ from .models import CyberSecurityOWASPAction, CyberSecurityOWASPState
7
+ except ImportError: # pragma: no cover
8
+ from models import CyberSecurityOWASPAction, CyberSecurityOWASPState
9
 
10
 
11
  REWARD_KEYS = (
scenario_compiler.py CHANGED
@@ -6,9 +6,14 @@ import tempfile
6
  from pathlib import Path
7
  from typing import Any
8
 
9
- from .fixture_generator import visible_workspace_summary
10
- from .policy_graph import build_invoice_policy
11
- from .template_renderer import render_fastapi_basic
 
 
 
 
 
12
 
13
 
14
  def compile_scenario(seed: int, split: str = "train", difficulty: int = 0) -> dict[str, Any]:
 
6
  from pathlib import Path
7
  from typing import Any
8
 
9
+ try:
10
+ from .fixture_generator import visible_workspace_summary
11
+ from .policy_graph import build_invoice_policy
12
+ from .template_renderer import render_fastapi_basic
13
+ except ImportError: # pragma: no cover
14
+ from fixture_generator import visible_workspace_summary
15
+ from policy_graph import build_invoice_policy
16
+ from template_renderer import render_fastapi_basic
17
 
18
 
19
  def compile_scenario(seed: int, split: str = "train", difficulty: int = 0) -> dict[str, Any]:
scripts/modal_train_grpo.py ADDED
@@ -0,0 +1,765 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Persistent Modal GRPO launcher for CyberSecurity_OWASP.
2
+
3
+ This packages the local repository into a Modal GPU image, runs a small
4
+ tool-use GRPO job against the in-process CyberSecurity_OWASP environment, logs
5
+ metrics/traces to Trackio, and saves LoRA checkpoints in a persistent Modal
6
+ volume.
7
+
8
+ Example:
9
+
10
+ uv run --extra modal modal run scripts/modal_train_grpo.py \
11
+ --max-steps 10 \
12
+ --dataset-size 16 \
13
+ --num-generations 2 \
14
+ --difficulty 0
15
+ """
16
+
17
+ from __future__ import annotations
18
+
19
+ import os
20
+ import pathlib
21
+ import subprocess
22
+ import sys
23
+ from datetime import datetime, timezone
24
+ from typing import Any
25
+
26
+ import modal
27
+
28
+
29
+ APP_NAME = "CyberSecurity_OWASP-grpo"
30
+ VOLUME_NAME = "CyberSecurity_OWASP-grpo-runs"
31
+ SECRET_NAME = "CyberSecurity_OWASP-secrets"
32
+ RUNS_DIR = pathlib.Path("/runs")
33
+ REMOTE_PROJECT = "/root/CyberSecurity_OWASP"
34
+ PROJECT_ROOT = pathlib.Path(__file__).resolve().parents[1]
35
+
36
+
37
+ def _load_local_env_file() -> None:
38
+ env_path = PROJECT_ROOT / ".env.local"
39
+ if not env_path.exists():
40
+ return
41
+ for raw_line in env_path.read_text(encoding="utf-8").splitlines():
42
+ line = raw_line.strip()
43
+ if not line or line.startswith("#") or "=" not in line:
44
+ continue
45
+ key, value = line.split("=", 1)
46
+ key = key.strip()
47
+ if key not in {"TRACKIO_SPACE_ID", "TRACKIO_PROJECT"}:
48
+ continue
49
+ value = value.strip().strip('"').strip("'")
50
+ os.environ.setdefault(key, value)
51
+
52
+
53
+ def _modal_secrets() -> list[modal.Secret]:
54
+ if _is_config_mode():
55
+ return []
56
+ return [modal.Secret.from_name(SECRET_NAME, required_keys=["HF_TOKEN"])]
57
+
58
+
59
+ def _is_config_mode() -> bool:
60
+ args = sys.argv[1:]
61
+ for index, arg in enumerate(args):
62
+ if arg == "--mode" and index + 1 < len(args):
63
+ return args[index + 1] == "config"
64
+ if arg.startswith("--mode="):
65
+ return arg.split("=", 1)[1] == "config"
66
+ return False
67
+
68
+
69
+ _load_local_env_file()
70
+
71
+
72
+ def _training_image() -> modal.Image:
73
+ return (
74
+ modal.Image.from_registry(
75
+ "nvidia/cuda:12.8.0-devel-ubuntu22.04",
76
+ add_python="3.11",
77
+ )
78
+ .apt_install("git", "build-essential", "curl")
79
+ .uv_pip_install(
80
+ "torch==2.10.0",
81
+ "triton>=3.4.0",
82
+ "torchvision==0.25.0",
83
+ "bitsandbytes",
84
+ "accelerate",
85
+ "datasets",
86
+ "huggingface_hub",
87
+ "peft",
88
+ "tokenizers",
89
+ "nvidia-ml-py",
90
+ "trackio>=0.25.0",
91
+ "transformers>=5.5.0",
92
+ "trl>=0.28.0",
93
+ "openenv-core[core]>=0.2.3",
94
+ "pydantic>=2.11.7,<3",
95
+ )
96
+ .uv_pip_install(
97
+ "unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo",
98
+ "unsloth[base] @ git+https://github.com/unslothai/unsloth",
99
+ )
100
+ .uv_pip_install("mergekit", "immutables==0.21", extra_options="--no-deps")
101
+ .uv_pip_install("trl>=0.28.0", "transformers>=5.5.0", "jmespath")
102
+ .add_local_dir(
103
+ PROJECT_ROOT,
104
+ remote_path=REMOTE_PROJECT,
105
+ copy=True,
106
+ ignore=[
107
+ ".git",
108
+ ".venv",
109
+ "__pycache__",
110
+ ".pytest_cache",
111
+ "outputs",
112
+ "*.pyc",
113
+ ],
114
+ )
115
+ .run_commands(
116
+ f"python -m pip install -e {REMOTE_PROJECT}",
117
+ "python -c \"import os, torch; import transformers.utils.hub as hub; "
118
+ "hub.TRANSFORMERS_CACHE = getattr(hub, 'TRANSFORMERS_CACHE', "
119
+ "os.path.join(os.path.expanduser('~'), '.cache', 'huggingface', 'hub')); "
120
+ "from trl import GRPOConfig, GRPOTrainer; "
121
+ "from CyberSecurity_OWASP.server.CyberSecurity_OWASP_environment import "
122
+ "CybersecurityOwaspEnvironment; print('trainer import ok', torch.__version__)\"",
123
+ )
124
+ .workdir(REMOTE_PROJECT)
125
+ )
126
+
127
+
128
+ app = modal.App(APP_NAME)
129
+ volume = modal.Volume.from_name(VOLUME_NAME, create_if_missing=True)
130
+ secrets = _modal_secrets()
131
+
132
+
133
+ @app.function(
134
+ image=_training_image(),
135
+ gpu=["L4", "A10G"],
136
+ timeout=4 * 60 * 60,
137
+ volumes={RUNS_DIR: volume},
138
+ secrets=secrets,
139
+ )
140
+ def check_training_imports() -> dict[str, str]:
141
+ import torch
142
+ import trackio
143
+ from datasets import Dataset
144
+ from trl import GRPOConfig, GRPOTrainer
145
+ from unsloth import FastLanguageModel
146
+
147
+ from CyberSecurity_OWASP.server.CyberSecurity_OWASP_environment import (
148
+ CybersecurityOwaspEnvironment,
149
+ )
150
+
151
+ env = CybersecurityOwaspEnvironment()
152
+ obs = env.reset(seed=0, split="validation", difficulty=0)
153
+ return {
154
+ "torch": torch.__version__,
155
+ "trackio": getattr(trackio, "__version__", "unknown"),
156
+ "dataset": Dataset.__name__,
157
+ "grpo_config": GRPOConfig.__name__,
158
+ "grpo_trainer": GRPOTrainer.__name__,
159
+ "unsloth_model": FastLanguageModel.__name__,
160
+ "env": CybersecurityOwaspEnvironment.__name__,
161
+ "reset_phase": obs.phase,
162
+ }
163
+
164
+
165
+ @app.function(
166
+ image=_training_image(),
167
+ gpu=["L4", "A10G"],
168
+ timeout=4 * 60 * 60,
169
+ volumes={RUNS_DIR: volume},
170
+ secrets=secrets,
171
+ )
172
+ def train_cybersecurity_owasp_grpo(
173
+ env_repo_id: str = "",
174
+ output_repo_id: str = "",
175
+ max_steps: int = 10,
176
+ dataset_size: int = 16,
177
+ difficulty: int = 0,
178
+ split: str = "train",
179
+ model_name: str = "Qwen/Qwen3-1.7B",
180
+ max_seq_length: int = 4096,
181
+ max_completion_length: int = 768,
182
+ lora_rank: int = 32,
183
+ trackio_space_id: str = "",
184
+ trackio_project: str = "CyberSecurity_OWASP-grpo",
185
+ num_generations: int = 2,
186
+ seed_start: int = 0,
187
+ git_sha: str = "nogit",
188
+ run_name: str = "",
189
+ ) -> dict[str, str | int | float]:
190
+ import statistics
191
+
192
+ import torch
193
+ import transformers.utils.hub as transformers_hub
194
+ from datasets import Dataset
195
+ from huggingface_hub import whoami
196
+ from transformers import TrainerCallback
197
+ from trl import GRPOConfig, GRPOTrainer
198
+ from unsloth import FastLanguageModel
199
+
200
+ import trackio
201
+
202
+ from CyberSecurity_OWASP.models import CyberSecurityOWASPAction
203
+ from CyberSecurity_OWASP.server.CyberSecurity_OWASP_environment import (
204
+ CybersecurityOwaspEnvironment,
205
+ )
206
+
207
+ if not hasattr(transformers_hub, "TRANSFORMERS_CACHE"):
208
+ transformers_hub.TRANSFORMERS_CACHE = os.path.join(
209
+ os.path.expanduser("~"),
210
+ ".cache",
211
+ "huggingface",
212
+ "hub",
213
+ )
214
+
215
+ hf_token = os.environ.get("HF_TOKEN")
216
+ if not hf_token:
217
+ raise RuntimeError(
218
+ f"HF_TOKEN is missing from the Modal secret {SECRET_NAME}."
219
+ )
220
+
221
+ user = whoami(token=hf_token)["name"]
222
+ env_repo_id = env_repo_id or f"{user}/CyberSecurity_OWASP"
223
+ output_repo_id = output_repo_id or f"{user}/CyberSecurity_OWASP-qwen3-1.7b-grpo-lora"
224
+ trackio_space_id = trackio_space_id or f"{user}/CyberSecurity_OWASP-trackio"
225
+
226
+ os.environ["TRACKIO_SPACE_ID"] = trackio_space_id
227
+ os.environ["TRACKIO_PROJECT"] = trackio_project
228
+
229
+ model_slug = model_name.replace("/", "-")
230
+ stamp = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
231
+ run_name = run_name or (
232
+ f"CyberSecurity_OWASP-{model_slug}-grpo-level{difficulty}-{stamp}-{git_sha[:8]}"
233
+ )
234
+ output_dir = RUNS_DIR / run_name
235
+ output_dir.mkdir(parents=True, exist_ok=True)
236
+
237
+ training_prompt = (
238
+ "You are a defensive AppSec repair agent in the local CyberSecurity_OWASP "
239
+ "OpenEnv environment. Use only the provided local tools. Do not target real "
240
+ "systems. Work step by step: inspect policy and generated code, reproduce the "
241
+ "authorization issue locally, submit a policy-tied finding, patch the generated "
242
+ "app, run visible tests, then submit the fix. Do not write explanations unless "
243
+ "a tool argument needs evidence text."
244
+ )
245
+
246
+ dataset = Dataset.from_list(
247
+ [
248
+ {
249
+ "prompt": [{"role": "user", "content": training_prompt}],
250
+ "seed": seed_start + index,
251
+ "difficulty": difficulty,
252
+ "split": split,
253
+ }
254
+ for index in range(dataset_size)
255
+ ]
256
+ )
257
+
258
+ def _state_snapshot(env: CybersecurityOwaspEnvironment) -> dict[str, Any]:
259
+ state = env.state
260
+ return {
261
+ "episode_id": state.episode_id,
262
+ "task_id": state.task_id,
263
+ "seed": state.seed,
264
+ "split": state.split,
265
+ "difficulty": state.difficulty,
266
+ "domain": state.domain,
267
+ "bug_family": state.bug_family,
268
+ "phase": state.phase,
269
+ "step_count": state.step_count,
270
+ "done": state.done,
271
+ "success": state.success,
272
+ "failure_reason": state.failure_reason,
273
+ "anti_cheat_flags": list(state.anti_cheat_flags),
274
+ }
275
+
276
+ class CyberSecurityOWASPToolEnv:
277
+ def __init__(self):
278
+ self._env = CybersecurityOwaspEnvironment()
279
+ self.reward = 0.0
280
+ self.reward_breakdown: dict[str, float] = {}
281
+ self.done = False
282
+ self.success = False
283
+ self.invalid_actions = 0
284
+ self.trace_messages: list[dict[str, str]] = []
285
+ self.trace_metadata: dict[str, Any] = {}
286
+
287
+ def reset(self, **kwargs) -> str:
288
+ seed = int(kwargs.get("seed", seed_start))
289
+ current_difficulty = int(kwargs.get("difficulty", difficulty))
290
+ current_split = str(kwargs.get("split", split))
291
+ obs = self._env.reset(
292
+ seed=seed,
293
+ split=current_split,
294
+ difficulty=current_difficulty,
295
+ )
296
+ self.reward = 0.0
297
+ self.reward_breakdown = {}
298
+ self.done = bool(obs.done)
299
+ self.success = False
300
+ self.invalid_actions = 0
301
+ self.trace_messages = [
302
+ {
303
+ "role": "user",
304
+ "content": (
305
+ f"{training_prompt}\n\nInitial observation:\n"
306
+ f"Phase: {obs.phase}\n"
307
+ f"Task: {obs.task_brief}\n"
308
+ f"Available actions: {obs.available_actions}\n"
309
+ f"Workspace summary: {obs.workspace_summary}\n"
310
+ f"Policy hint: {obs.visible_policy_hint}\n"
311
+ f"Message: {obs.message}"
312
+ ),
313
+ }
314
+ ]
315
+ self.trace_metadata = _state_snapshot(self._env)
316
+ return obs.message
317
+
318
+ def _step(self, tool_name: str, arguments: dict[str, Any] | None = None) -> str:
319
+ if self.done:
320
+ raise ValueError("Episode is already over.")
321
+ action = CyberSecurityOWASPAction(
322
+ tool_name=tool_name,
323
+ arguments=arguments or {},
324
+ )
325
+ obs = self._env.step(action)
326
+ if not obs.last_action_valid:
327
+ self.invalid_actions += 1
328
+ self.reward = float(obs.reward_breakdown.get("total", obs.reward or 0.0))
329
+ self.reward_breakdown = dict(obs.reward_breakdown or {})
330
+ self.done = bool(obs.done)
331
+ self.success = bool(self._env.state.success)
332
+ self.trace_messages.extend(
333
+ [
334
+ {
335
+ "role": "assistant",
336
+ "content": f"{tool_name}({arguments or {}})",
337
+ },
338
+ {"role": "tool", "content": obs.message},
339
+ ]
340
+ )
341
+ self.trace_metadata.update(_state_snapshot(self._env))
342
+ self.trace_metadata.update(
343
+ {
344
+ "last_action_valid": obs.last_action_valid,
345
+ "last_action_error": obs.last_action_error,
346
+ "reward": self.reward,
347
+ "reward_breakdown": self.reward_breakdown,
348
+ "invalid_actions": self.invalid_actions,
349
+ }
350
+ )
351
+ return obs.message
352
+
353
+ def inspect_policy_graph(self) -> str:
354
+ """Return public policy hints for the generated local scenario."""
355
+ return self._step("inspect_policy_graph")
356
+
357
+ def list_routes(self) -> str:
358
+ """List generated local app route summaries."""
359
+ return self._step("list_routes")
360
+
361
+ def read_openapi(self) -> str:
362
+ """Read generated OpenAPI metadata for the local app."""
363
+ return self._step("read_openapi")
364
+
365
+ def read_file(self, path: str) -> str:
366
+ """Read an editable generated workspace file by relative path."""
367
+ return self._step("read_file", {"path": path})
368
+
369
+ def search_code(self, query: str) -> str:
370
+ """Search editable generated workspace files for a string."""
371
+ return self._step("search_code", {"query": query})
372
+
373
+ def send_local_request(
374
+ self,
375
+ path: str,
376
+ method: str = "GET",
377
+ user_id: str | None = None,
378
+ ) -> str:
379
+ """Send a request to the generated local app only."""
380
+ return self._step(
381
+ "send_local_request",
382
+ {"path": path, "method": method, "user_id": user_id},
383
+ )
384
+
385
+ def compare_identities(
386
+ self,
387
+ path: str,
388
+ first_user_id: str,
389
+ second_user_id: str,
390
+ method: str = "GET",
391
+ ) -> str:
392
+ """Compare one local request as two generated users."""
393
+ return self._step(
394
+ "compare_identities",
395
+ {
396
+ "path": path,
397
+ "method": method,
398
+ "first_user_id": first_user_id,
399
+ "second_user_id": second_user_id,
400
+ },
401
+ )
402
+
403
+ def submit_finding(
404
+ self,
405
+ summary: str,
406
+ evidence: str,
407
+ policy_rule: str,
408
+ ) -> str:
409
+ """Submit structured evidence for the suspected authorization bug."""
410
+ return self._step(
411
+ "submit_finding",
412
+ {
413
+ "summary": summary,
414
+ "evidence": evidence,
415
+ "policy_rule": policy_rule,
416
+ },
417
+ )
418
+
419
+ def patch_file(
420
+ self,
421
+ path: str,
422
+ content: str | None = None,
423
+ diff: str | None = None,
424
+ ) -> str:
425
+ """Patch an editable generated app file with full content or a unified diff."""
426
+ args: dict[str, Any] = {"path": path}
427
+ if content is not None:
428
+ args["content"] = content
429
+ if diff is not None:
430
+ args["diff"] = diff
431
+ return self._step("patch_file", args)
432
+
433
+ def run_visible_tests(self) -> str:
434
+ """Run visible tests only; hidden tests are never exposed."""
435
+ return self._step("run_visible_tests")
436
+
437
+ def submit_fix(self) -> str:
438
+ """Submit the final patch to the hidden deterministic verifier."""
439
+ return self._step("submit_fix")
440
+
441
+ def noop(self) -> str:
442
+ """Take no action."""
443
+ return self._step("noop")
444
+
445
+ def _score(self) -> float:
446
+ return float(self.reward)
447
+
448
+ def __del__(self):
449
+ try:
450
+ self._env.close()
451
+ except Exception:
452
+ pass
453
+
454
+ trace_step = {"value": 0}
455
+
456
+ def _completion_to_text(completion) -> str:
457
+ if completion is None:
458
+ return ""
459
+ if isinstance(completion, str):
460
+ return completion
461
+ if isinstance(completion, list):
462
+ parts = []
463
+ for item in completion:
464
+ if isinstance(item, dict):
465
+ parts.append(str(item.get("content", item)))
466
+ else:
467
+ parts.append(str(item))
468
+ return "\n".join(parts)
469
+ return str(completion)
470
+
471
+ def _mean(values: list[float]) -> float:
472
+ return float(sum(values) / len(values)) if values else 0.0
473
+
474
+ def cybersecurity_owasp_reward(environments, **kwargs) -> list[float]:
475
+ rewards = [float(env._score()) for env in environments]
476
+ completions = kwargs.get("completions") or kwargs.get("completion") or []
477
+ trace_step["value"] += 1
478
+
479
+ breakdowns = [getattr(env, "reward_breakdown", {}) or {} for env in environments]
480
+ metrics = {
481
+ "train/reward_total_mean": _mean(rewards),
482
+ "train/reward_discovery_mean": _mean(
483
+ [float(item.get("discovery", 0.0)) for item in breakdowns]
484
+ ),
485
+ "train/reward_security_mean": _mean(
486
+ [float(item.get("security", 0.0)) for item in breakdowns]
487
+ ),
488
+ "train/reward_regression_mean": _mean(
489
+ [float(item.get("regression", 0.0)) for item in breakdowns]
490
+ ),
491
+ "train/reward_public_routes_mean": _mean(
492
+ [float(item.get("public_routes", 0.0)) for item in breakdowns]
493
+ ),
494
+ "train/reward_patch_quality_mean": _mean(
495
+ [float(item.get("patch_quality", 0.0)) for item in breakdowns]
496
+ ),
497
+ "train/reward_visible_tests_mean": _mean(
498
+ [float(item.get("visible_tests", 0.0)) for item in breakdowns]
499
+ ),
500
+ "train/reward_anti_cheat_mean": _mean(
501
+ [float(item.get("anti_cheat", 0.0)) for item in breakdowns]
502
+ ),
503
+ "train/success_rate": _mean(
504
+ [1.0 if bool(getattr(env, "success", False)) else 0.0 for env in environments]
505
+ ),
506
+ "train/invalid_action_rate": _mean(
507
+ [float(getattr(env, "invalid_actions", 0)) for env in environments]
508
+ ),
509
+ "train/episode_length_mean": _mean(
510
+ [
511
+ float(getattr(env, "trace_metadata", {}).get("step_count", 0))
512
+ for env in environments
513
+ ]
514
+ ),
515
+ }
516
+
517
+ try:
518
+ trackio.log(metrics, step=trace_step["value"])
519
+ except Exception as exc:
520
+ print(f"Trackio metric logging skipped: {exc!r}")
521
+
522
+ for index, env in enumerate(environments):
523
+ messages = list(getattr(env, "trace_messages", []))
524
+ if index < len(completions):
525
+ completion_text = _completion_to_text(completions[index])
526
+ if completion_text:
527
+ messages.append(
528
+ {
529
+ "role": "assistant",
530
+ "content": f"Raw generated completion:\n{completion_text}",
531
+ }
532
+ )
533
+ metadata = dict(getattr(env, "trace_metadata", {}))
534
+ metadata.update(
535
+ {
536
+ "sample_index": index,
537
+ "reward": rewards[index],
538
+ "trace_step": trace_step["value"],
539
+ "run_name": run_name,
540
+ }
541
+ )
542
+ try:
543
+ trackio.log(
544
+ {
545
+ f"cybersecurity_owasp_trace/sample_{index}": trackio.Trace(
546
+ messages=messages,
547
+ metadata=metadata,
548
+ )
549
+ },
550
+ step=trace_step["value"],
551
+ )
552
+ except Exception as exc:
553
+ print(f"Trackio trace logging skipped: {exc!r}")
554
+
555
+ if rewards:
556
+ print(
557
+ "Reward batch: "
558
+ f"mean={statistics.mean(rewards):.3f}, "
559
+ f"min={min(rewards):.3f}, max={max(rewards):.3f}"
560
+ )
561
+ return rewards
562
+
563
+ class TrackioSystemMetricsCallback(TrainerCallback):
564
+ def on_log(self, args, state, control, logs=None, **kwargs):
565
+ try:
566
+ metrics = trackio.log_gpu()
567
+ except Exception as exc:
568
+ print(f"Trackio GPU metrics skipped: {exc!r}")
569
+ return control
570
+ if metrics:
571
+ summary = ", ".join(f"{key}={value}" for key, value in sorted(metrics.items())[:4])
572
+ print(f"Trackio GPU metrics logged at step {state.global_step}: {summary}")
573
+ return control
574
+
575
+ print(f"CUDA available: {torch.cuda.is_available()}")
576
+ print(f"Packaged local CyberSecurity_OWASP repo; default env repo id: {env_repo_id}")
577
+ print(f"Trackio Space: {trackio_space_id}")
578
+ print(f"Trackio Project: {trackio_project}")
579
+ print(f"Output repo: {output_repo_id}")
580
+ print(f"Run name: {run_name}")
581
+
582
+ model, tokenizer = FastLanguageModel.from_pretrained(
583
+ model_name=model_name,
584
+ max_seq_length=max_seq_length,
585
+ load_in_4bit=False,
586
+ fast_inference=False,
587
+ token=hf_token,
588
+ )
589
+ model = FastLanguageModel.get_peft_model(
590
+ model,
591
+ r=lora_rank,
592
+ target_modules=[
593
+ "q_proj",
594
+ "k_proj",
595
+ "v_proj",
596
+ "o_proj",
597
+ "gate_proj",
598
+ "up_proj",
599
+ "down_proj",
600
+ ],
601
+ lora_alpha=lora_rank * 2,
602
+ use_gradient_checkpointing="unsloth",
603
+ random_state=3407,
604
+ )
605
+ FastLanguageModel.for_training(model)
606
+
607
+ training_args = GRPOConfig(
608
+ temperature=1.0,
609
+ learning_rate=5e-6,
610
+ weight_decay=0.001,
611
+ warmup_ratio=0.1,
612
+ lr_scheduler_type="linear",
613
+ optim="adamw_8bit",
614
+ logging_steps=1,
615
+ per_device_train_batch_size=1,
616
+ gradient_accumulation_steps=max(2, num_generations),
617
+ num_generations=num_generations,
618
+ max_prompt_length=max_seq_length,
619
+ max_completion_length=max_completion_length,
620
+ max_steps=max_steps,
621
+ save_steps=max(10, max_steps),
622
+ report_to="trackio",
623
+ trackio_space_id=trackio_space_id,
624
+ run_name=run_name,
625
+ output_dir=str(output_dir),
626
+ push_to_hub=True,
627
+ hub_model_id=output_repo_id,
628
+ hub_private_repo=True,
629
+ hub_strategy="every_save",
630
+ gradient_checkpointing=True,
631
+ gradient_checkpointing_kwargs={"use_reentrant": False},
632
+ epsilon=0.2,
633
+ epsilon_high=0.28,
634
+ delta=1.5,
635
+ loss_type="bnpo",
636
+ mask_truncated_completions=False,
637
+ )
638
+
639
+ trainer = GRPOTrainer(
640
+ model=model,
641
+ processing_class=tokenizer,
642
+ reward_funcs=cybersecurity_owasp_reward,
643
+ args=training_args,
644
+ train_dataset=dataset,
645
+ environment_factory=CyberSecurityOWASPToolEnv,
646
+ callbacks=[TrackioSystemMetricsCallback()],
647
+ )
648
+ trainer.train()
649
+ trainer.push_to_hub()
650
+ volume.commit()
651
+
652
+ return {
653
+ "run_name": run_name,
654
+ "env_repo_id": env_repo_id,
655
+ "output_repo_id": output_repo_id,
656
+ "trackio_space_id": trackio_space_id,
657
+ "trackio_project": trackio_project,
658
+ "max_steps": max_steps,
659
+ "dataset_size": dataset_size,
660
+ "difficulty": difficulty,
661
+ "split": split,
662
+ "model_name": model_name,
663
+ "max_completion_length": max_completion_length,
664
+ "num_generations": num_generations,
665
+ }
666
+
667
+
668
+ @app.local_entrypoint()
669
+ def main(
670
+ mode: str = "train",
671
+ env_repo_id: str = "",
672
+ output_repo_id: str = "",
673
+ max_steps: int = 10,
674
+ dataset_size: int = 16,
675
+ difficulty: int = 0,
676
+ split: str = "train",
677
+ model_name: str = "Qwen/Qwen3-1.7B",
678
+ max_seq_length: int = 4096,
679
+ max_completion_length: int = 768,
680
+ lora_rank: int = 32,
681
+ trackio_space_id: str = "",
682
+ trackio_project: str = "CyberSecurity_OWASP-grpo",
683
+ num_generations: int = 2,
684
+ seed_start: int = 0,
685
+ git_sha: str = "nogit",
686
+ ) -> None:
687
+ if mode == "config":
688
+ result = check_training_imports.remote()
689
+ print(result)
690
+ return
691
+ if mode != "train":
692
+ raise ValueError("mode must be 'train' or 'config'")
693
+
694
+ trackio_space_id = trackio_space_id or os.environ.get("TRACKIO_SPACE_ID", "")
695
+ trackio_project = trackio_project or os.environ.get(
696
+ "TRACKIO_PROJECT", "CyberSecurity_OWASP-grpo"
697
+ )
698
+ resolved_trackio_space_id = trackio_space_id
699
+ resolved_output_repo_id = output_repo_id
700
+ if not resolved_trackio_space_id or not resolved_output_repo_id:
701
+ hf_token = os.environ.get("HF_TOKEN")
702
+ if hf_token:
703
+ try:
704
+ from huggingface_hub import whoami
705
+
706
+ user = whoami(token=hf_token)["name"]
707
+ resolved_trackio_space_id = (
708
+ resolved_trackio_space_id or f"{user}/CyberSecurity_OWASP-trackio"
709
+ )
710
+ resolved_output_repo_id = (
711
+ resolved_output_repo_id
712
+ or f"{user}/CyberSecurity_OWASP-qwen3-1.7b-grpo-lora"
713
+ )
714
+ except Exception as exc:
715
+ print(f"Could not resolve Hugging Face defaults locally: {exc!r}")
716
+
717
+ if git_sha == "nogit":
718
+ try:
719
+ git_sha = subprocess.check_output(
720
+ ["git", "rev-parse", "HEAD"],
721
+ cwd=PROJECT_ROOT,
722
+ text=True,
723
+ stderr=subprocess.DEVNULL,
724
+ ).strip()
725
+ except Exception:
726
+ git_sha = "nogit"
727
+
728
+ model_slug = model_name.replace("/", "-")
729
+ local_stamp = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
730
+ run_name = (
731
+ f"CyberSecurity_OWASP-{model_slug}-grpo-level{difficulty}-"
732
+ f"{local_stamp}-{git_sha[:8]}"
733
+ )
734
+
735
+ call = train_cybersecurity_owasp_grpo.spawn(
736
+ env_repo_id=env_repo_id,
737
+ output_repo_id=output_repo_id,
738
+ max_steps=max_steps,
739
+ dataset_size=dataset_size,
740
+ difficulty=difficulty,
741
+ split=split,
742
+ model_name=model_name,
743
+ max_seq_length=max_seq_length,
744
+ max_completion_length=max_completion_length,
745
+ lora_rank=lora_rank,
746
+ trackio_space_id=trackio_space_id,
747
+ trackio_project=trackio_project,
748
+ num_generations=num_generations,
749
+ seed_start=seed_start,
750
+ git_sha=git_sha,
751
+ run_name=run_name,
752
+ )
753
+ print(f"Spawned Modal training call: {call.object_id}")
754
+ print(f"Run name: {run_name}")
755
+ if resolved_trackio_space_id:
756
+ print(f"Trackio Space: https://huggingface.co/spaces/{resolved_trackio_space_id}")
757
+ else:
758
+ print("Trackio Space: derived remotely from HF_TOKEN as <hf-user>/CyberSecurity_OWASP-trackio")
759
+ if resolved_output_repo_id:
760
+ print(f"Output model repo: https://huggingface.co/{resolved_output_repo_id}")
761
+ else:
762
+ print(
763
+ "Output model repo: derived remotely from HF_TOKEN as "
764
+ "<hf-user>/CyberSecurity_OWASP-qwen3-1.7b-grpo-lora"
765
+ )
server/app.py CHANGED
@@ -16,7 +16,7 @@ except Exception as e: # pragma: no cover
16
  try:
17
  from ..models import CyberSecurityOWASPAction, CyberSecurityOWASPObservation
18
  from .CyberSecurity_OWASP_environment import CybersecurityOwaspEnvironment
19
- except ModuleNotFoundError:
20
  from models import CyberSecurityOWASPAction, CyberSecurityOWASPObservation
21
  from server.CyberSecurity_OWASP_environment import CybersecurityOwaspEnvironment
22
 
 
16
  try:
17
  from ..models import CyberSecurityOWASPAction, CyberSecurityOWASPObservation
18
  from .CyberSecurity_OWASP_environment import CybersecurityOwaspEnvironment
19
+ except ImportError:
20
  from models import CyberSecurityOWASPAction, CyberSecurityOWASPObservation
21
  from server.CyberSecurity_OWASP_environment import CybersecurityOwaspEnvironment
22
 
validators.py CHANGED
@@ -5,7 +5,10 @@ from __future__ import annotations
5
  from pathlib import Path
6
  from typing import Any
7
 
8
- from .models import CyberSecurityOWASPAction, CyberSecurityOWASPState
 
 
 
9
 
10
 
11
  BLOCKED_PATH_MARKERS = (
 
5
  from pathlib import Path
6
  from typing import Any
7
 
8
+ try:
9
+ from .models import CyberSecurityOWASPAction, CyberSecurityOWASPState
10
+ except ImportError: # pragma: no cover
11
+ from models import CyberSecurityOWASPAction, CyberSecurityOWASPState
12
 
13
 
14
  BLOCKED_PATH_MARKERS = (