Humanlearning commited on
Commit
4e663d8
·
1 Parent(s): 0ff6d8a

feat: integrate Trackio for experiment tracking and add Modal training infrastructure with environment and test utilities.

Browse files
.gitignore CHANGED
@@ -2,5 +2,8 @@
2
  .env.*
3
  __pycache__/
4
  *.pyc
 
 
 
5
 
6
- *.egg*
 
2
  .env.*
3
  __pycache__/
4
  *.pyc
5
+ .pytest_cache/
6
+ outputs/
7
+ codex_tmp_*/
8
 
9
+ *.egg*
Dockerfile CHANGED
@@ -21,6 +21,7 @@ WORKDIR /app/env
21
  COPY --from=builder /app/env /app/env
22
  ENV PATH="/app/env/.venv/bin:$PATH"
23
  ENV PYTHONPATH="/app/env:$PYTHONPATH"
 
24
 
25
  HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
26
  CMD curl -f http://localhost:8000/health || exit 1
 
21
  COPY --from=builder /app/env /app/env
22
  ENV PATH="/app/env/.venv/bin:$PATH"
23
  ENV PYTHONPATH="/app/env:$PYTHONPATH"
24
+ ENV ENABLE_WEB_INTERFACE=true
25
 
26
  HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
27
  CMD curl -f http://localhost:8000/health || exit 1
README.md CHANGED
@@ -133,6 +133,24 @@ Training files are under `training/`:
133
 
134
  The training scaffold is intentionally minimal until the environment/verifier behavior is stable. Trackio metric names and GRPO defaults follow the project brief.
135
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
136
  ## Modal Ephemeral Runs
137
 
138
  Modal Labs support is kept in a separate launcher script so the local OpenEnv server and core training scaffold stay unchanged.
@@ -149,7 +167,7 @@ Run a temporary Modal app for a cheap environment/training smoke check:
149
  uv run --extra modal modal run scripts/modal_ephemeral_train.py --mode smoke --episodes 4
150
  ```
151
 
152
- The app is ephemeral: Modal starts it for the command and stops it when the command exits. The remote result is written locally under `outputs/rollouts/`.
153
 
154
  You can also validate the GRPO config construction remotely:
155
 
@@ -187,6 +205,20 @@ uv run --extra modal modal run scripts/modal_train_grpo.py \
187
  --difficulty 0
188
  ```
189
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
190
  Defaults are derived from `HF_TOKEN`:
191
 
192
  - Trackio Space: `<hf-user>/CyberSecurity_OWASP-trackio`
 
133
 
134
  The training scaffold is intentionally minimal until the environment/verifier behavior is stable. Trackio metric names and GRPO defaults follow the project brief.
135
 
136
+ ## Trackio Run Tracking
137
+
138
+ Trackio is the default tracker for official runs. Set `TRACKIO_SPACE_ID` to log to a hosted Hugging Face Trackio Space; otherwise Trackio records locally.
139
+
140
+ ```bash
141
+ export TRACKIO_SPACE_ID=<hf-user>/CyberSecurity_OWASP-trackio
142
+ export TRACKIO_PROJECT=CyberSecurity_OWASP-grpo
143
+ ```
144
+
145
+ Use the tracked smoke wrapper instead of invoking pytest directly when producing run artifacts:
146
+
147
+ ```bash
148
+ bash scripts/smoke_test.sh
149
+ uv run python scripts/track_pytest.py tests
150
+ ```
151
+
152
+ Evaluation summaries saved through `training.eval_before_after.save_eval_summary(...)`, Modal smoke runs, and GRPO training configs all initialize Trackio runs with CyberSecurity_OWASP run names.
153
+
154
  ## Modal Ephemeral Runs
155
 
156
  Modal Labs support is kept in a separate launcher script so the local OpenEnv server and core training scaffold stay unchanged.
 
167
  uv run --extra modal modal run scripts/modal_ephemeral_train.py --mode smoke --episodes 4
168
  ```
169
 
170
+ The app is ephemeral: Modal starts it for the command and stops it when the command exits. The remote result is written locally under `outputs/rollouts/` and the summary metrics are logged to Trackio.
171
 
172
  You can also validate the GRPO config construction remotely:
173
 
 
205
  --difficulty 0
206
  ```
207
 
208
+ If running from a public repository and you do not want Modal to package the
209
+ local workspace, use public source mode:
210
+
211
+ ```bash
212
+ uv run --extra modal modal run scripts/modal_train_grpo.py \
213
+ --source-mode public \
214
+ --repo-url https://github.com/humandotlearning/CyberSecurity_OWASP.git \
215
+ --repo-branch master \
216
+ --max-steps 10 \
217
+ --dataset-size 16 \
218
+ --num-generations 2 \
219
+ --difficulty 0
220
+ ```
221
+
222
  Defaults are derived from `HF_TOKEN`:
223
 
224
  - Trackio Space: `<hf-user>/CyberSecurity_OWASP-trackio`
pyproject.toml CHANGED
@@ -18,6 +18,7 @@ dependencies = [
18
  # install from github
19
  # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
20
  "openenv-core[core]>=0.2.2",
 
21
  # Environment-specific dependencies
22
  # Add all dependencies needed for your environment here
23
  # Examples:
 
18
  # install from github
19
  # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
20
  "openenv-core[core]>=0.2.2",
21
+ "trackio>=0.22.0",
22
  # Environment-specific dependencies
23
  # Add all dependencies needed for your environment here
24
  # Examples:
scenario_compiler.py CHANGED
@@ -2,9 +2,11 @@
2
 
3
  from __future__ import annotations
4
 
 
5
  import tempfile
6
  from pathlib import Path
7
  from typing import Any
 
8
 
9
  try:
10
  from .fixture_generator import visible_workspace_summary
@@ -16,11 +18,24 @@ except ImportError: # pragma: no cover
16
  from template_renderer import render_fastapi_basic
17
 
18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  def compile_scenario(seed: int, split: str = "train", difficulty: int = 0) -> dict[str, Any]:
20
  """Compile one isolated MVP authorization-repair scenario."""
21
 
22
  compiled = build_invoice_policy(seed)
23
- workspace = Path(tempfile.mkdtemp(prefix=f"cybersecurity_owasp_{split}_{seed}_"))
24
  editable_files = render_fastapi_basic(workspace, compiled.public_hint, compiled.hidden_facts)
25
  task_id = f"{split}-invoices-bola-{seed}"
26
  hidden = dict(compiled.hidden_facts)
 
2
 
3
  from __future__ import annotations
4
 
5
+ import os
6
  import tempfile
7
  from pathlib import Path
8
  from typing import Any
9
+ from uuid import uuid4
10
 
11
  try:
12
  from .fixture_generator import visible_workspace_summary
 
18
  from template_renderer import render_fastapi_basic
19
 
20
 
21
+ def _make_workspace(prefix: str) -> Path:
22
+ root = Path(os.getenv("CYBERSECURITY_OWASP_WORKSPACE_ROOT", tempfile.gettempdir()))
23
+ root.mkdir(parents=True, exist_ok=True)
24
+ for _ in range(100):
25
+ workspace = root / f"{prefix}{uuid4().hex[:12]}"
26
+ try:
27
+ workspace.mkdir()
28
+ except FileExistsError:
29
+ continue
30
+ return workspace
31
+ raise RuntimeError("Unable to create isolated scenario workspace")
32
+
33
+
34
  def compile_scenario(seed: int, split: str = "train", difficulty: int = 0) -> dict[str, Any]:
35
  """Compile one isolated MVP authorization-repair scenario."""
36
 
37
  compiled = build_invoice_policy(seed)
38
+ workspace = _make_workspace(prefix=f"cybersecurity_owasp_{split}_{seed}_")
39
  editable_files = render_fastapi_basic(workspace, compiled.public_hint, compiled.hidden_facts)
40
  task_id = f"{split}-invoices-bola-{seed}"
41
  hidden = dict(compiled.hidden_facts)
scripts/modal_ephemeral_train.py CHANGED
@@ -62,12 +62,18 @@ class NoopTrainer:
62
 
63
 
64
  @app.function(image=image, timeout=60 * 30)
65
- def run_ephemeral_smoke(episodes: int = 4, seed_start: int = 0) -> dict[str, Any]:
 
 
 
 
 
66
  from CyberSecurity_OWASP.models import CyberSecurityOWASPAction
67
  from CyberSecurity_OWASP.server.CyberSecurity_OWASP_environment import (
68
  CybersecurityOwaspEnvironment,
69
  )
70
  from training.rollout import rollout_once
 
71
 
72
  baseline = []
73
  oracle = []
@@ -128,8 +134,9 @@ def run_ephemeral_smoke(episodes: int = 4, seed_start: int = 0) -> dict[str, Any
128
  def mean(items: list[dict[str, Any]], key: str) -> float:
129
  return sum(float(item.get(key, 0.0)) for item in items) / max(1, len(items))
130
 
131
- return {
132
- "run_name": f"{APP_NAME}-{datetime.utcnow().strftime('%Y%m%d-%H%M%S')}",
 
133
  "mode": "smoke",
134
  "episodes": episodes,
135
  "seed_start": seed_start,
@@ -139,6 +146,28 @@ def run_ephemeral_smoke(episodes: int = 4, seed_start: int = 0) -> dict[str, Any
139
  "baseline": baseline,
140
  "oracle": oracle,
141
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
142
 
143
 
144
  @app.function(image=image, timeout=60 * 10)
@@ -149,9 +178,20 @@ def run_grpo_config_check() -> str:
149
 
150
 
151
  @app.local_entrypoint()
152
- def main(mode: str = "smoke", episodes: int = 4, seed_start: int = 0) -> None:
 
 
 
 
 
 
153
  if mode == "smoke":
154
- result = run_ephemeral_smoke.remote(episodes=episodes, seed_start=seed_start)
 
 
 
 
 
155
  output_dir = PROJECT_ROOT / "outputs" / "rollouts"
156
  output_dir.mkdir(parents=True, exist_ok=True)
157
  output_path = output_dir / f"{result['run_name']}.json"
 
62
 
63
 
64
  @app.function(image=image, timeout=60 * 30)
65
+ def run_ephemeral_smoke(
66
+ episodes: int = 4,
67
+ seed_start: int = 0,
68
+ trackio_space_id: str = "",
69
+ trackio_project: str = "CyberSecurity_OWASP-smoke",
70
+ ) -> dict[str, Any]:
71
  from CyberSecurity_OWASP.models import CyberSecurityOWASPAction
72
  from CyberSecurity_OWASP.server.CyberSecurity_OWASP_environment import (
73
  CybersecurityOwaspEnvironment,
74
  )
75
  from training.rollout import rollout_once
76
+ from training.trackio_utils import log_trackio_metrics, trackio_run
77
 
78
  baseline = []
79
  oracle = []
 
134
  def mean(items: list[dict[str, Any]], key: str) -> float:
135
  return sum(float(item.get(key, 0.0)) for item in items) / max(1, len(items))
136
 
137
+ run_name = f"{APP_NAME}-{datetime.utcnow().strftime('%Y%m%d-%H%M%S')}"
138
+ result = {
139
+ "run_name": run_name,
140
  "mode": "smoke",
141
  "episodes": episodes,
142
  "seed_start": seed_start,
 
146
  "baseline": baseline,
147
  "oracle": oracle,
148
  }
149
+ with trackio_run(
150
+ run_name=run_name,
151
+ run_type="modal_ephemeral_smoke",
152
+ project=trackio_project,
153
+ space_id=trackio_space_id,
154
+ config={
155
+ "episodes": episodes,
156
+ "seed_start": seed_start,
157
+ "mode": "smoke",
158
+ },
159
+ group="smoke",
160
+ ):
161
+ log_trackio_metrics(
162
+ {
163
+ "smoke/baseline_mean_reward": result["baseline_mean_reward"],
164
+ "smoke/oracle_mean_reward": result["oracle_mean_reward"],
165
+ "smoke/oracle_success_rate": result["oracle_success_rate"],
166
+ "smoke/episodes": episodes,
167
+ },
168
+ step=0,
169
+ )
170
+ return result
171
 
172
 
173
  @app.function(image=image, timeout=60 * 10)
 
178
 
179
 
180
  @app.local_entrypoint()
181
+ def main(
182
+ mode: str = "smoke",
183
+ episodes: int = 4,
184
+ seed_start: int = 0,
185
+ trackio_space_id: str = "",
186
+ trackio_project: str = "CyberSecurity_OWASP-smoke",
187
+ ) -> None:
188
  if mode == "smoke":
189
+ result = run_ephemeral_smoke.remote(
190
+ episodes=episodes,
191
+ seed_start=seed_start,
192
+ trackio_space_id=trackio_space_id,
193
+ trackio_project=trackio_project,
194
+ )
195
  output_dir = PROJECT_ROOT / "outputs" / "rollouts"
196
  output_dir.mkdir(parents=True, exist_ok=True)
197
  output_path = output_dir / f"{result['run_name']}.json"
scripts/modal_run_ephemeral.sh CHANGED
@@ -1,3 +1,8 @@
1
  #!/usr/bin/env bash
2
  set -euo pipefail
3
- modal run scripts/modal_ephemeral_train.py --mode "${MODE:-smoke}" --episodes "${EPISODES:-4}" --seed-start "${SEED_START:-0}"
 
 
 
 
 
 
1
  #!/usr/bin/env bash
2
  set -euo pipefail
3
+ modal run scripts/modal_ephemeral_train.py \
4
+ --mode "${MODE:-smoke}" \
5
+ --episodes "${EPISODES:-4}" \
6
+ --seed-start "${SEED_START:-0}" \
7
+ --trackio-space-id "${TRACKIO_SPACE_ID:-}" \
8
+ --trackio-project "${TRACKIO_PROJECT:-CyberSecurity_OWASP-smoke}"
scripts/modal_train_grpo.py CHANGED
@@ -32,6 +32,8 @@ SECRET_NAME = "CyberSecurity_OWASP-secrets"
32
  RUNS_DIR = pathlib.Path("/runs")
33
  REMOTE_PROJECT = "/root/CyberSecurity_OWASP"
34
  PROJECT_ROOT = pathlib.Path(__file__).resolve().parents[1]
 
 
35
 
36
 
37
  def _load_local_env_file() -> None:
@@ -44,7 +46,7 @@ def _load_local_env_file() -> None:
44
  continue
45
  key, value = line.split("=", 1)
46
  key = key.strip()
47
- if key not in {"TRACKIO_SPACE_ID", "TRACKIO_PROJECT"}:
48
  continue
49
  value = value.strip().strip('"').strip("'")
50
  os.environ.setdefault(key, value)
@@ -69,8 +71,23 @@ def _is_config_mode() -> bool:
69
  _load_local_env_file()
70
 
71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
  def _training_image() -> modal.Image:
73
- return (
74
  modal.Image.from_registry(
75
  "nvidia/cuda:12.8.0-devel-ubuntu22.04",
76
  add_python="3.11",
@@ -85,21 +102,33 @@ def _training_image() -> modal.Image:
85
  "datasets",
86
  "huggingface_hub",
87
  "peft",
 
88
  "tokenizers",
89
  "nvidia-ml-py",
90
  "trackio>=0.25.0",
91
  "transformers>=5.5.0",
92
  "trl>=0.28.0",
93
  "openenv-core[core]>=0.2.3",
94
- "pydantic>=2.11.7,<3",
95
  )
96
  .uv_pip_install(
97
  "unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo",
98
  "unsloth[base] @ git+https://github.com/unslothai/unsloth",
99
  )
 
100
  .uv_pip_install("mergekit", "immutables==0.21", extra_options="--no-deps")
 
101
  .uv_pip_install("trl>=0.28.0", "transformers>=5.5.0", "jmespath")
102
- .add_local_dir(
 
 
 
 
 
 
 
 
 
 
103
  PROJECT_ROOT,
104
  remote_path=REMOTE_PROJECT,
105
  copy=True,
@@ -112,17 +141,18 @@ def _training_image() -> modal.Image:
112
  "*.pyc",
113
  ],
114
  )
115
- .run_commands(
116
  f"python -m pip install -e {REMOTE_PROJECT}",
117
- "python -c \"import os, torch; import transformers.utils.hub as hub; "
118
- "hub.TRANSFORMERS_CACHE = getattr(hub, 'TRANSFORMERS_CACHE', "
119
- "os.path.join(os.path.expanduser('~'), '.cache', 'huggingface', 'hub')); "
120
- "from trl import GRPOConfig, GRPOTrainer; "
121
- "from CyberSecurity_OWASP.server.CyberSecurity_OWASP_environment import "
122
- "CybersecurityOwaspEnvironment; print('trainer import ok', torch.__version__)\"",
123
  )
124
- .workdir(REMOTE_PROJECT)
125
- )
 
 
 
 
 
 
 
126
 
127
 
128
  app = modal.App(APP_NAME)
@@ -186,16 +216,21 @@ def train_cybersecurity_owasp_grpo(
186
  seed_start: int = 0,
187
  git_sha: str = "nogit",
188
  run_name: str = "",
 
 
 
189
  ) -> dict[str, str | int | float]:
 
190
  import statistics
191
 
192
  import torch
 
193
  import transformers.utils.hub as transformers_hub
194
  from datasets import Dataset
195
  from huggingface_hub import whoami
196
  from transformers import TrainerCallback
197
- from trl import GRPOConfig, GRPOTrainer
198
- from unsloth import FastLanguageModel
199
 
200
  import trackio
201
 
@@ -363,11 +398,27 @@ def train_cybersecurity_owasp_grpo(
363
  return self._step("read_openapi")
364
 
365
  def read_file(self, path: str) -> str:
366
- """Read an editable generated workspace file by relative path."""
 
 
 
 
 
 
 
 
367
  return self._step("read_file", {"path": path})
368
 
369
  def search_code(self, query: str) -> str:
370
- """Search editable generated workspace files for a string."""
 
 
 
 
 
 
 
 
371
  return self._step("search_code", {"query": query})
372
 
373
  def send_local_request(
@@ -376,7 +427,17 @@ def train_cybersecurity_owasp_grpo(
376
  method: str = "GET",
377
  user_id: str | None = None,
378
  ) -> str:
379
- """Send a request to the generated local app only."""
 
 
 
 
 
 
 
 
 
 
380
  return self._step(
381
  "send_local_request",
382
  {"path": path, "method": method, "user_id": user_id},
@@ -389,7 +450,18 @@ def train_cybersecurity_owasp_grpo(
389
  second_user_id: str,
390
  method: str = "GET",
391
  ) -> str:
392
- """Compare one local request as two generated users."""
 
 
 
 
 
 
 
 
 
 
 
393
  return self._step(
394
  "compare_identities",
395
  {
@@ -406,7 +478,17 @@ def train_cybersecurity_owasp_grpo(
406
  evidence: str,
407
  policy_rule: str,
408
  ) -> str:
409
- """Submit structured evidence for the suspected authorization bug."""
 
 
 
 
 
 
 
 
 
 
410
  return self._step(
411
  "submit_finding",
412
  {
@@ -422,7 +504,17 @@ def train_cybersecurity_owasp_grpo(
422
  content: str | None = None,
423
  diff: str | None = None,
424
  ) -> str:
425
- """Patch an editable generated app file with full content or a unified diff."""
 
 
 
 
 
 
 
 
 
 
426
  args: dict[str, Any] = {"path": path}
427
  if content is not None:
428
  args["content"] = content
@@ -573,7 +665,10 @@ def train_cybersecurity_owasp_grpo(
573
  return control
574
 
575
  print(f"CUDA available: {torch.cuda.is_available()}")
576
- print(f"Packaged local CyberSecurity_OWASP repo; default env repo id: {env_repo_id}")
 
 
 
577
  print(f"Trackio Space: {trackio_space_id}")
578
  print(f"Trackio Project: {trackio_project}")
579
  print(f"Output repo: {output_repo_id}")
@@ -586,6 +681,18 @@ def train_cybersecurity_owasp_grpo(
586
  fast_inference=False,
587
  token=hf_token,
588
  )
 
 
 
 
 
 
 
 
 
 
 
 
589
  model = FastLanguageModel.get_peft_model(
590
  model,
591
  r=lora_rank,
@@ -604,46 +711,68 @@ def train_cybersecurity_owasp_grpo(
604
  )
605
  FastLanguageModel.for_training(model)
606
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
607
  training_args = GRPOConfig(
608
- temperature=1.0,
609
- learning_rate=5e-6,
610
- weight_decay=0.001,
611
- warmup_ratio=0.1,
612
- lr_scheduler_type="linear",
613
- optim="adamw_8bit",
614
- logging_steps=1,
615
- per_device_train_batch_size=1,
616
- gradient_accumulation_steps=max(2, num_generations),
617
- num_generations=num_generations,
618
- max_prompt_length=max_seq_length,
619
- max_completion_length=max_completion_length,
620
- max_steps=max_steps,
621
- save_steps=max(10, max_steps),
622
- report_to="trackio",
623
- trackio_space_id=trackio_space_id,
624
- run_name=run_name,
625
- output_dir=str(output_dir),
626
- push_to_hub=True,
627
- hub_model_id=output_repo_id,
628
- hub_private_repo=True,
629
- hub_strategy="every_save",
630
- gradient_checkpointing=True,
631
- gradient_checkpointing_kwargs={"use_reentrant": False},
632
- epsilon=0.2,
633
- epsilon_high=0.28,
634
- delta=1.5,
635
- loss_type="bnpo",
636
- mask_truncated_completions=False,
637
  )
638
 
 
 
 
 
 
 
 
 
 
 
 
 
 
639
  trainer = GRPOTrainer(
640
- model=model,
641
- processing_class=tokenizer,
642
- reward_funcs=cybersecurity_owasp_reward,
643
- args=training_args,
644
- train_dataset=dataset,
645
- environment_factory=CyberSecurityOWASPToolEnv,
646
- callbacks=[TrackioSystemMetricsCallback()],
647
  )
648
  trainer.train()
649
  trainer.push_to_hub()
@@ -662,6 +791,9 @@ def train_cybersecurity_owasp_grpo(
662
  "model_name": model_name,
663
  "max_completion_length": max_completion_length,
664
  "num_generations": num_generations,
 
 
 
665
  }
666
 
667
 
@@ -683,6 +815,10 @@ def main(
683
  num_generations: int = 2,
684
  seed_start: int = 0,
685
  git_sha: str = "nogit",
 
 
 
 
686
  ) -> None:
687
  if mode == "config":
688
  result = check_training_imports.remote()
@@ -732,7 +868,23 @@ def main(
732
  f"{local_stamp}-{git_sha[:8]}"
733
  )
734
 
735
- call = train_cybersecurity_owasp_grpo.spawn(
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
736
  env_repo_id=env_repo_id,
737
  output_repo_id=output_repo_id,
738
  max_steps=max_steps,
@@ -749,17 +901,13 @@ def main(
749
  seed_start=seed_start,
750
  git_sha=git_sha,
751
  run_name=run_name,
 
 
 
752
  )
753
- print(f"Spawned Modal training call: {call.object_id}")
754
- print(f"Run name: {run_name}")
755
- if resolved_trackio_space_id:
756
- print(f"Trackio Space: https://huggingface.co/spaces/{resolved_trackio_space_id}")
757
  else:
758
- print("Trackio Space: derived remotely from HF_TOKEN as <hf-user>/CyberSecurity_OWASP-trackio")
759
- if resolved_output_repo_id:
760
- print(f"Output model repo: https://huggingface.co/{resolved_output_repo_id}")
761
- else:
762
- print(
763
- "Output model repo: derived remotely from HF_TOKEN as "
764
- "<hf-user>/CyberSecurity_OWASP-qwen3-1.7b-grpo-lora"
765
- )
 
32
  RUNS_DIR = pathlib.Path("/runs")
33
  REMOTE_PROJECT = "/root/CyberSecurity_OWASP"
34
  PROJECT_ROOT = pathlib.Path(__file__).resolve().parents[1]
35
+ PUBLIC_REPO_URL = "https://github.com/humandotlearning/CyberSecurity_OWASP.git"
36
+ PUBLIC_REPO_BRANCH = "master"
37
 
38
 
39
  def _load_local_env_file() -> None:
 
46
  continue
47
  key, value = line.split("=", 1)
48
  key = key.strip()
49
+ if key not in {"TRACKIO_PROJECT"}:
50
  continue
51
  value = value.strip().strip('"').strip("'")
52
  os.environ.setdefault(key, value)
 
71
  _load_local_env_file()
72
 
73
 
74
+ def _cli_arg_value(name: str, default: str = "") -> str:
75
+ args = sys.argv[1:]
76
+ flag = f"--{name}"
77
+ for index, arg in enumerate(args):
78
+ if arg == flag and index + 1 < len(args):
79
+ return args[index + 1]
80
+ if arg.startswith(f"{flag}="):
81
+ return arg.split("=", 1)[1]
82
+ return default
83
+
84
+
85
+ def _source_mode() -> str:
86
+ return _cli_arg_value("source-mode", os.environ.get("MODAL_SOURCE_MODE", "local"))
87
+
88
+
89
  def _training_image() -> modal.Image:
90
+ image = (
91
  modal.Image.from_registry(
92
  "nvidia/cuda:12.8.0-devel-ubuntu22.04",
93
  add_python="3.11",
 
102
  "datasets",
103
  "huggingface_hub",
104
  "peft",
105
+ "pillow",
106
  "tokenizers",
107
  "nvidia-ml-py",
108
  "trackio>=0.25.0",
109
  "transformers>=5.5.0",
110
  "trl>=0.28.0",
111
  "openenv-core[core]>=0.2.3",
 
112
  )
113
  .uv_pip_install(
114
  "unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo",
115
  "unsloth[base] @ git+https://github.com/unslothai/unsloth",
116
  )
117
+ .uv_pip_install("pydantic==2.10.6")
118
  .uv_pip_install("mergekit", "immutables==0.21", extra_options="--no-deps")
119
+ .uv_pip_install("llm-blender", "weave")
120
  .uv_pip_install("trl>=0.28.0", "transformers>=5.5.0", "jmespath")
121
+ )
122
+
123
+ if _source_mode() == "public":
124
+ repo_url = _cli_arg_value("repo-url", PUBLIC_REPO_URL)
125
+ repo_branch = _cli_arg_value("repo-branch", PUBLIC_REPO_BRANCH)
126
+ image = image.run_commands(
127
+ f"git clone --depth 1 --branch {repo_branch} {repo_url} {REMOTE_PROJECT}",
128
+ f"python -m pip install -e {REMOTE_PROJECT}",
129
+ )
130
+ else:
131
+ image = image.add_local_dir(
132
  PROJECT_ROOT,
133
  remote_path=REMOTE_PROJECT,
134
  copy=True,
 
141
  "*.pyc",
142
  ],
143
  )
144
+ image = image.run_commands(
145
  f"python -m pip install -e {REMOTE_PROJECT}",
 
 
 
 
 
 
146
  )
147
+
148
+ return image.run_commands(
149
+ "python -c \"import os, torch; import transformers.utils.hub as hub; "
150
+ "hub.TRANSFORMERS_CACHE = getattr(hub, 'TRANSFORMERS_CACHE', "
151
+ "os.path.join(os.path.expanduser('~'), '.cache', 'huggingface', 'hub')); "
152
+ "from trl import GRPOConfig, GRPOTrainer; "
153
+ "from CyberSecurity_OWASP.server.CyberSecurity_OWASP_environment import "
154
+ "CybersecurityOwaspEnvironment; print('trainer import ok', torch.__version__)\"",
155
+ ).workdir(REMOTE_PROJECT)
156
 
157
 
158
  app = modal.App(APP_NAME)
 
216
  seed_start: int = 0,
217
  git_sha: str = "nogit",
218
  run_name: str = "",
219
+ source_mode: str = "local",
220
+ repo_url: str = PUBLIC_REPO_URL,
221
+ repo_branch: str = PUBLIC_REPO_BRANCH,
222
  ) -> dict[str, str | int | float]:
223
+ import inspect
224
  import statistics
225
 
226
  import torch
227
+ from unsloth import FastLanguageModel
228
  import transformers.utils.hub as transformers_hub
229
  from datasets import Dataset
230
  from huggingface_hub import whoami
231
  from transformers import TrainerCallback
232
+ from trl import GRPOConfig, GRPOTrainer, clone_chat_template
233
+ from trl.chat_template_utils import add_response_schema
234
 
235
  import trackio
236
 
 
398
  return self._step("read_openapi")
399
 
400
  def read_file(self, path: str) -> str:
401
+ """
402
+ Read an editable generated workspace file by relative path.
403
+
404
+ Args:
405
+ path: Relative path inside the generated editable workspace.
406
+
407
+ Returns:
408
+ The file contents or a safe tool error observation.
409
+ """
410
  return self._step("read_file", {"path": path})
411
 
412
  def search_code(self, query: str) -> str:
413
+ """
414
+ Search editable generated workspace files for a string.
415
+
416
+ Args:
417
+ query: Search text to find in editable generated app files.
418
+
419
+ Returns:
420
+ Matching file lines or a no-match message.
421
+ """
422
  return self._step("search_code", {"query": query})
423
 
424
  def send_local_request(
 
427
  method: str = "GET",
428
  user_id: str | None = None,
429
  ) -> str:
430
+ """
431
+ Send a request to the generated local app only.
432
+
433
+ Args:
434
+ path: Local route path such as /health or /invoices/<id>.
435
+ method: HTTP method to use for the local request.
436
+ user_id: Optional generated user identifier for authentication.
437
+
438
+ Returns:
439
+ JSON response from the simulated local app request.
440
+ """
441
  return self._step(
442
  "send_local_request",
443
  {"path": path, "method": method, "user_id": user_id},
 
450
  second_user_id: str,
451
  method: str = "GET",
452
  ) -> str:
453
+ """
454
+ Compare one local request as two generated users.
455
+
456
+ Args:
457
+ path: Local route path to request as both generated users.
458
+ first_user_id: First generated user identifier.
459
+ second_user_id: Second generated user identifier.
460
+ method: HTTP method to use for both local requests.
461
+
462
+ Returns:
463
+ JSON summary of both simulated local responses.
464
+ """
465
  return self._step(
466
  "compare_identities",
467
  {
 
478
  evidence: str,
479
  policy_rule: str,
480
  ) -> str:
481
+ """
482
+ Submit structured evidence for the suspected authorization bug.
483
+
484
+ Args:
485
+ summary: Concise description of the suspected access-control bug.
486
+ evidence: Local reproduction evidence from policy, code, or requests.
487
+ policy_rule: Policy rule that the observed behavior violates.
488
+
489
+ Returns:
490
+ Finding acceptance result and next phase information.
491
+ """
492
  return self._step(
493
  "submit_finding",
494
  {
 
504
  content: str | None = None,
505
  diff: str | None = None,
506
  ) -> str:
507
+ """
508
+ Patch an editable generated app file with full content or a unified diff.
509
+
510
+ Args:
511
+ path: Relative path of the editable generated app file to patch.
512
+ content: Complete replacement file content, when using full-file patching.
513
+ diff: Unified diff to apply, when using diff patching.
514
+
515
+ Returns:
516
+ Patch application result.
517
+ """
518
  args: dict[str, Any] = {"path": path}
519
  if content is not None:
520
  args["content"] = content
 
665
  return control
666
 
667
  print(f"CUDA available: {torch.cuda.is_available()}")
668
+ if source_mode == "public":
669
+ print(f"Installed CyberSecurity_OWASP from public repo: {repo_url}@{repo_branch}")
670
+ else:
671
+ print(f"Packaged local CyberSecurity_OWASP repo; default env repo id: {env_repo_id}")
672
  print(f"Trackio Space: {trackio_space_id}")
673
  print(f"Trackio Project: {trackio_project}")
674
  print(f"Output repo: {output_repo_id}")
 
681
  fast_inference=False,
682
  token=hf_token,
683
  )
684
+ try:
685
+ tokenizer = add_response_schema(tokenizer)
686
+ except Exception as exc:
687
+ print(f"Tokenizer response schema add failed before cloning: {exc!r}")
688
+ model, tokenizer, added_tokens = clone_chat_template(
689
+ model,
690
+ tokenizer,
691
+ "Qwen/Qwen3-0.6B",
692
+ )
693
+ print(f"Cloned Qwen3 chat template; added {len(added_tokens)} tokens.")
694
+ tokenizer = add_response_schema(tokenizer)
695
+
696
  model = FastLanguageModel.get_peft_model(
697
  model,
698
  r=lora_rank,
 
711
  )
712
  FastLanguageModel.for_training(model)
713
 
714
+ grpo_config_values = {
715
+ "temperature": 1.0,
716
+ "learning_rate": 5e-6,
717
+ "weight_decay": 0.001,
718
+ "warmup_ratio": 0.1,
719
+ "lr_scheduler_type": "linear",
720
+ "optim": "adamw_8bit",
721
+ "logging_steps": 1,
722
+ "per_device_train_batch_size": 1,
723
+ "gradient_accumulation_steps": max(2, num_generations),
724
+ "num_generations": num_generations,
725
+ "max_prompt_length": max_seq_length,
726
+ "max_completion_length": max_completion_length,
727
+ "max_steps": max_steps,
728
+ "save_steps": max(10, max_steps),
729
+ "report_to": "trackio",
730
+ "trackio_space_id": trackio_space_id,
731
+ "run_name": run_name,
732
+ "output_dir": str(output_dir),
733
+ "push_to_hub": True,
734
+ "hub_model_id": output_repo_id,
735
+ "hub_private_repo": True,
736
+ "hub_strategy": "every_save",
737
+ "gradient_checkpointing": True,
738
+ "gradient_checkpointing_kwargs": {"use_reentrant": False},
739
+ "epsilon": 0.2,
740
+ "epsilon_high": 0.28,
741
+ "delta": 1.5,
742
+ "loss_type": "bnpo",
743
+ "mask_truncated_completions": False,
744
+ }
745
+ grpo_config_parameters = set(inspect.signature(GRPOConfig).parameters)
746
+ skipped_config_keys = sorted(set(grpo_config_values) - grpo_config_parameters)
747
+ if skipped_config_keys:
748
+ print(f"Skipping unsupported GRPOConfig keys: {skipped_config_keys}")
749
  training_args = GRPOConfig(
750
+ **{
751
+ key: value
752
+ for key, value in grpo_config_values.items()
753
+ if key in grpo_config_parameters
754
+ }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
755
  )
756
 
757
+ trainer_values = {
758
+ "model": model,
759
+ "processing_class": tokenizer,
760
+ "reward_funcs": cybersecurity_owasp_reward,
761
+ "args": training_args,
762
+ "train_dataset": dataset,
763
+ "environment_factory": CyberSecurityOWASPToolEnv,
764
+ "callbacks": [TrackioSystemMetricsCallback()],
765
+ }
766
+ trainer_parameters = set(inspect.signature(GRPOTrainer).parameters)
767
+ skipped_trainer_keys = sorted(set(trainer_values) - trainer_parameters)
768
+ if skipped_trainer_keys:
769
+ print(f"Skipping unsupported GRPOTrainer keys: {skipped_trainer_keys}")
770
  trainer = GRPOTrainer(
771
+ **{
772
+ key: value
773
+ for key, value in trainer_values.items()
774
+ if key in trainer_parameters
775
+ }
 
 
776
  )
777
  trainer.train()
778
  trainer.push_to_hub()
 
791
  "model_name": model_name,
792
  "max_completion_length": max_completion_length,
793
  "num_generations": num_generations,
794
+ "source_mode": source_mode,
795
+ "repo_url": repo_url,
796
+ "repo_branch": repo_branch,
797
  }
798
 
799
 
 
815
  num_generations: int = 2,
816
  seed_start: int = 0,
817
  git_sha: str = "nogit",
818
+ source_mode: str = "local",
819
+ repo_url: str = PUBLIC_REPO_URL,
820
+ repo_branch: str = PUBLIC_REPO_BRANCH,
821
+ detach: bool = False,
822
  ) -> None:
823
  if mode == "config":
824
  result = check_training_imports.remote()
 
868
  f"{local_stamp}-{git_sha[:8]}"
869
  )
870
 
871
+ print(f"Run name: {run_name}")
872
+ print(f"Source mode: {source_mode}")
873
+ if source_mode == "public":
874
+ print(f"Public repo: {repo_url}@{repo_branch}")
875
+ if resolved_trackio_space_id:
876
+ print(f"Trackio Space: https://huggingface.co/spaces/{resolved_trackio_space_id}")
877
+ else:
878
+ print("Trackio Space: derived remotely from HF_TOKEN as <hf-user>/CyberSecurity_OWASP-trackio")
879
+ if resolved_output_repo_id:
880
+ print(f"Output model repo: https://huggingface.co/{resolved_output_repo_id}")
881
+ else:
882
+ print(
883
+ "Output model repo: derived remotely from HF_TOKEN as "
884
+ "<hf-user>/CyberSecurity_OWASP-qwen3-1.7b-grpo-lora"
885
+ )
886
+
887
+ kwargs = dict(
888
  env_repo_id=env_repo_id,
889
  output_repo_id=output_repo_id,
890
  max_steps=max_steps,
 
901
  seed_start=seed_start,
902
  git_sha=git_sha,
903
  run_name=run_name,
904
+ source_mode=source_mode,
905
+ repo_url=repo_url,
906
+ repo_branch=repo_branch,
907
  )
908
+ if detach:
909
+ call = train_cybersecurity_owasp_grpo.spawn(**kwargs)
910
+ print(f"Spawned Modal training call: {call.object_id}")
 
911
  else:
912
+ result = train_cybersecurity_owasp_grpo.remote(**kwargs)
913
+ print(f"Training result: {result}")
 
 
 
 
 
 
scripts/smoke_test.sh CHANGED
@@ -1,3 +1,3 @@
1
  #!/usr/bin/env bash
2
  set -euo pipefail
3
- uv run pytest tests/test_models.py tests/test_reset_step_state.py
 
1
  #!/usr/bin/env bash
2
  set -euo pipefail
3
+ uv run python scripts/track_pytest.py tests/test_models.py tests/test_reset_step_state.py
scripts/track_pytest.py ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Run pytest and record the result as a Trackio run."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import argparse
6
+ import subprocess
7
+ import sys
8
+ import time
9
+ from pathlib import Path
10
+
11
+ PROJECT_ROOT = Path(__file__).resolve().parents[1]
12
+ sys.path.insert(0, str(PROJECT_ROOT))
13
+
14
+ from training.trackio_utils import build_run_name, get_git_sha, log_trackio_metrics, trackio_run
15
+
16
+
17
+ def main() -> int:
18
+ parser = argparse.ArgumentParser(description="Run pytest with Trackio tracking.")
19
+ parser.add_argument("pytest_args", nargs="*", help="Arguments passed through to pytest.")
20
+ parser.add_argument("--run-name", default="", help="Trackio run name override.")
21
+ parser.add_argument("--difficulty", type=int, default=0)
22
+ args, passthrough = parser.parse_known_args()
23
+
24
+ run_name = args.run_name or build_run_name(
25
+ "pytest",
26
+ "smoke",
27
+ args.difficulty,
28
+ git_sha=get_git_sha(),
29
+ )
30
+ pytest_args = [*args.pytest_args, *passthrough] or ["tests"]
31
+ command = [sys.executable, "-m", "pytest", *pytest_args]
32
+ started = time.perf_counter()
33
+
34
+ with trackio_run(
35
+ run_name=run_name,
36
+ run_type="pytest",
37
+ config={
38
+ "command": " ".join(command),
39
+ "pytest_args": pytest_args,
40
+ },
41
+ group="smoke",
42
+ ):
43
+ completed = subprocess.run(command)
44
+ duration = time.perf_counter() - started
45
+ log_trackio_metrics(
46
+ {
47
+ "smoke/pytest_exit_code": completed.returncode,
48
+ "smoke/pytest_passed": completed.returncode == 0,
49
+ "smoke/duration_seconds": duration,
50
+ },
51
+ step=0,
52
+ )
53
+
54
+ return completed.returncode
55
+
56
+
57
+ if __name__ == "__main__":
58
+ raise SystemExit(main())
server/Dockerfile CHANGED
@@ -70,6 +70,7 @@ ENV PATH="/app/.venv/bin:$PATH"
70
 
71
  # Set PYTHONPATH so imports work correctly
72
  ENV PYTHONPATH="/app/env:$PYTHONPATH"
 
73
 
74
  # Health check
75
  HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
 
70
 
71
  # Set PYTHONPATH so imports work correctly
72
  ENV PYTHONPATH="/app/env:$PYTHONPATH"
73
+ ENV ENABLE_WEB_INTERFACE=true
74
 
75
  # Health check
76
  HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
server/app.py CHANGED
@@ -6,6 +6,13 @@
6
 
7
  """FastAPI application for the CyberSecurity_OWASP OpenEnv server."""
8
 
 
 
 
 
 
 
 
9
  try:
10
  from openenv.core.env_server.http_server import create_app
11
  except Exception as e: # pragma: no cover
 
6
 
7
  """FastAPI application for the CyberSecurity_OWASP OpenEnv server."""
8
 
9
+ import os
10
+
11
+ # OpenEnv disables the Gradio playground unless this flag is enabled. Default it
12
+ # on so Docker/HF Spaces show the reset/step/state UI, while explicit env values
13
+ # such as ENABLE_WEB_INTERFACE=false still take precedence.
14
+ os.environ.setdefault("ENABLE_WEB_INTERFACE", "true")
15
+
16
  try:
17
  from openenv.core.env_server.http_server import create_app
18
  except Exception as e: # pragma: no cover
tests/test_web_interface.py ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi.testclient import TestClient
2
+
3
+ from server.app import app
4
+
5
+
6
+ def test_space_root_redirects_to_openenv_web_ui():
7
+ client = TestClient(app)
8
+
9
+ response = client.get("/", follow_redirects=False)
10
+
11
+ assert response.status_code == 307
12
+ assert response.headers["location"] == "/web/"
13
+
14
+
15
+ def test_openenv_web_ui_and_api_routes_are_available():
16
+ client = TestClient(app)
17
+
18
+ web_response = client.get("/web/")
19
+ health_response = client.get("/health")
20
+ state_response = client.get("/web/state")
21
+
22
+ assert web_response.status_code == 200
23
+ assert "text/html" in web_response.headers["content-type"]
24
+ assert "Reset" in web_response.text
25
+ assert "Step" in web_response.text
26
+ assert "Get state" in web_response.text
27
+
28
+ assert health_response.status_code == 200
29
+ assert health_response.json() == {"status": "healthy"}
30
+
31
+ assert state_response.status_code == 200
32
+ state = state_response.json()
33
+ assert "episode_id" in state
34
+ assert "step_count" in state
35
+
36
+
37
+ def test_web_reset_returns_cybersecurity_observation():
38
+ client = TestClient(app)
39
+
40
+ response = client.post("/web/reset")
41
+
42
+ assert response.status_code == 200
43
+ payload = response.json()
44
+ observation = payload["observation"]
45
+ assert observation["phase"] == "discover"
46
+ assert "authorization" in observation["task_brief"]
47
+ assert "inspect_policy_graph" in observation["available_actions"]
training/eval_before_after.py CHANGED
@@ -5,6 +5,8 @@ from __future__ import annotations
5
  import json
6
  from pathlib import Path
7
 
 
 
8
 
9
  def summarize_runs(baseline: list[dict], trained: list[dict], heldout: list[dict]) -> dict:
10
  def mean(items: list[dict], key: str) -> float:
@@ -19,11 +21,27 @@ def summarize_runs(baseline: list[dict], trained: list[dict], heldout: list[dict
19
  "absolute_reward_improvement": mean(trained, "reward_total") - mean(baseline, "reward_total"),
20
  "heldout_success_rate": mean(heldout, "success"),
21
  "heldout_mean_reward": mean(heldout, "reward_total"),
 
 
 
 
 
 
 
 
22
  }
23
 
24
 
25
- def save_eval_summary(run_name: str, summary: dict) -> Path:
 
 
 
 
 
 
26
  output = Path("outputs/evals") / f"{run_name}_eval_summary.json"
27
  output.parent.mkdir(parents=True, exist_ok=True)
28
  output.write_text(json.dumps(summary, indent=2, sort_keys=True), encoding="utf-8")
 
 
29
  return output
 
5
  import json
6
  from pathlib import Path
7
 
8
+ from training.trackio_utils import log_eval_summary
9
+
10
 
11
  def summarize_runs(baseline: list[dict], trained: list[dict], heldout: list[dict]) -> dict:
12
  def mean(items: list[dict], key: str) -> float:
 
21
  "absolute_reward_improvement": mean(trained, "reward_total") - mean(baseline, "reward_total"),
22
  "heldout_success_rate": mean(heldout, "success"),
23
  "heldout_mean_reward": mean(heldout, "reward_total"),
24
+ "exploit_block_rate": mean(trained, "exploit_blocked"),
25
+ "regression_preservation_rate": mean(trained, "regression_preserved"),
26
+ "public_route_preservation_rate": mean(trained, "public_routes_preserved"),
27
+ "anti_cheat_pass_rate": mean(trained, "anti_cheat_pass"),
28
+ "invalid_action_rate": mean(trained, "invalid_action_rate"),
29
+ "timeout_rate": mean(trained, "timeout"),
30
+ "safety_violation_rate": mean(trained, "safety_violation"),
31
+ "mean_episode_length": mean(trained, "episode_length"),
32
  }
33
 
34
 
35
+ def save_eval_summary(
36
+ run_name: str,
37
+ summary: dict,
38
+ *,
39
+ track: bool = True,
40
+ trackio_config: dict | None = None,
41
+ ) -> Path:
42
  output = Path("outputs/evals") / f"{run_name}_eval_summary.json"
43
  output.parent.mkdir(parents=True, exist_ok=True)
44
  output.write_text(json.dumps(summary, indent=2, sort_keys=True), encoding="utf-8")
45
+ if track:
46
+ log_eval_summary(run_name, summary, config=trackio_config)
47
  return output
training/trackio_utils.py CHANGED
@@ -2,7 +2,12 @@
2
 
3
  from __future__ import annotations
4
 
 
 
 
5
  from datetime import datetime
 
 
6
 
7
 
8
  TRAIN_METRICS = [
@@ -34,7 +39,133 @@ TRAIN_METRICS = [
34
  ]
35
 
36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  def build_run_name(model: str, algo: str, difficulty: int, git_sha: str = "nogit") -> str:
38
- stamp = datetime.utcnow().strftime("%Y%m%d-%H%M")
39
  model_slug = model.replace("/", "-")
40
  return f"CyberSecurity_OWASP-{model_slug}-{algo}-level{difficulty}-{stamp}-{git_sha[:8]}"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
  from __future__ import annotations
4
 
5
+ import os
6
+ import subprocess
7
+ from contextlib import contextmanager
8
  from datetime import datetime
9
+ from pathlib import Path
10
+ from typing import Any, Iterator
11
 
12
 
13
  TRAIN_METRICS = [
 
39
  ]
40
 
41
 
42
+ EVAL_METRICS = [
43
+ "eval/baseline_success_rate",
44
+ "eval/trained_success_rate",
45
+ "eval/absolute_success_improvement",
46
+ "eval/baseline_mean_reward",
47
+ "eval/trained_mean_reward",
48
+ "eval/absolute_reward_improvement",
49
+ "eval/heldout_success_rate",
50
+ "eval/heldout_mean_reward",
51
+ "eval/exploit_block_rate",
52
+ "eval/regression_preservation_rate",
53
+ "eval/public_route_preservation_rate",
54
+ "eval/anti_cheat_pass_rate",
55
+ "eval/invalid_action_rate",
56
+ "eval/timeout_rate",
57
+ "eval/safety_violation_rate",
58
+ "eval/mean_episode_length",
59
+ ]
60
+
61
+
62
  def build_run_name(model: str, algo: str, difficulty: int, git_sha: str = "nogit") -> str:
63
+ stamp = datetime.utcnow().strftime("%Y%m%d-%H%M%S")
64
  model_slug = model.replace("/", "-")
65
  return f"CyberSecurity_OWASP-{model_slug}-{algo}-level{difficulty}-{stamp}-{git_sha[:8]}"
66
+
67
+
68
+ def get_git_sha(default: str = "nogit") -> str:
69
+ try:
70
+ result = subprocess.run(
71
+ ["git", "rev-parse", "HEAD"],
72
+ check=True,
73
+ capture_output=True,
74
+ text=True,
75
+ )
76
+ except Exception:
77
+ return default
78
+ return result.stdout.strip() or default
79
+
80
+
81
+ def _load_trackio():
82
+ os.environ.setdefault("TRACKIO_DIR", str((Path.cwd() / "outputs" / "trackio").resolve()))
83
+ try:
84
+ import trackio
85
+ except ImportError as exc:
86
+ raise RuntimeError(
87
+ "Trackio is required for CyberSecurity_OWASP runs. Install dependencies "
88
+ "with `uv sync` and set TRACKIO_SPACE_ID when you want remote HF Spaces tracking."
89
+ ) from exc
90
+ return trackio
91
+
92
+
93
+ def init_trackio_run(
94
+ *,
95
+ run_name: str,
96
+ run_type: str,
97
+ config: dict[str, Any] | None = None,
98
+ project: str | None = None,
99
+ space_id: str | None = None,
100
+ group: str | None = None,
101
+ ):
102
+ trackio = _load_trackio()
103
+ project = project or os.getenv("TRACKIO_PROJECT", "CyberSecurity_OWASP")
104
+ space_id = space_id if space_id is not None else os.getenv("TRACKIO_SPACE_ID", "")
105
+ run_config = {
106
+ "environment": "CyberSecurity_OWASP",
107
+ "run_type": run_type,
108
+ **(config or {}),
109
+ }
110
+ kwargs: dict[str, Any] = {
111
+ "project": project,
112
+ "name": run_name,
113
+ "config": run_config,
114
+ }
115
+ if space_id:
116
+ kwargs["space_id"] = space_id
117
+ if group:
118
+ kwargs["group"] = group
119
+ return trackio.init(**kwargs)
120
+
121
+
122
+ def log_trackio_metrics(metrics: dict[str, Any], step: int | None = None) -> None:
123
+ trackio = _load_trackio()
124
+ numeric = {
125
+ key: value
126
+ for key, value in metrics.items()
127
+ if isinstance(value, (int, float, bool))
128
+ }
129
+ if step is None:
130
+ trackio.log(numeric)
131
+ else:
132
+ trackio.log(numeric, step=step)
133
+
134
+
135
+ def finish_trackio_run() -> None:
136
+ trackio = _load_trackio()
137
+ trackio.finish()
138
+
139
+
140
+ @contextmanager
141
+ def trackio_run(
142
+ *,
143
+ run_name: str,
144
+ run_type: str,
145
+ config: dict[str, Any] | None = None,
146
+ project: str | None = None,
147
+ space_id: str | None = None,
148
+ group: str | None = None,
149
+ ) -> Iterator[Any]:
150
+ run = init_trackio_run(
151
+ run_name=run_name,
152
+ run_type=run_type,
153
+ config=config,
154
+ project=project,
155
+ space_id=space_id,
156
+ group=group,
157
+ )
158
+ try:
159
+ yield run
160
+ finally:
161
+ finish_trackio_run()
162
+
163
+
164
+ def log_eval_summary(run_name: str, summary: dict[str, Any], config: dict[str, Any] | None = None) -> None:
165
+ metrics = {
166
+ f"eval/{key}": float(value)
167
+ for key, value in summary.items()
168
+ if isinstance(value, (int, float, bool))
169
+ }
170
+ with trackio_run(run_name=run_name, run_type="eval", config=config, group="eval"):
171
+ log_trackio_metrics(metrics, step=0)
training/train_grpo.py CHANGED
@@ -9,16 +9,26 @@ from __future__ import annotations
9
 
10
  import os
11
 
 
 
12
 
13
  def build_grpo_config():
14
  from trl import GRPOConfig
15
 
 
 
16
  output_dir = os.getenv("OUTPUT_DIR", "CyberSecurity_OWASP-qwen3-1.7b-grpo")
17
  trackio_space_id = os.getenv("TRACKIO_SPACE_ID", output_dir)
 
 
 
 
 
18
  return GRPOConfig(
19
  output_dir=output_dir,
20
  report_to="trackio",
21
  trackio_space_id=trackio_space_id,
 
22
  logging_steps=1,
23
  save_steps=25,
24
  learning_rate=5e-6,
 
9
 
10
  import os
11
 
12
+ from training.trackio_utils import build_run_name, get_git_sha
13
+
14
 
15
  def build_grpo_config():
16
  from trl import GRPOConfig
17
 
18
+ model_name = os.getenv("MODEL_NAME", "Qwen/Qwen3-1.7B")
19
+ difficulty = int(os.getenv("DIFFICULTY", "0"))
20
  output_dir = os.getenv("OUTPUT_DIR", "CyberSecurity_OWASP-qwen3-1.7b-grpo")
21
  trackio_space_id = os.getenv("TRACKIO_SPACE_ID", output_dir)
22
+ os.environ.setdefault("TRACKIO_PROJECT", "CyberSecurity_OWASP-grpo")
23
+ run_name = os.getenv(
24
+ "RUN_NAME",
25
+ build_run_name(model_name, "grpo", difficulty, git_sha=get_git_sha()),
26
+ )
27
  return GRPOConfig(
28
  output_dir=output_dir,
29
  report_to="trackio",
30
  trackio_space_id=trackio_space_id,
31
+ run_name=run_name,
32
  logging_steps=1,
33
  save_steps=25,
34
  learning_rate=5e-6,
uv.lock CHANGED
@@ -1283,6 +1283,49 @@ wheels = [
1283
  { url = "https://files.pythonhosted.org/packages/7e/f5/f66802a942d491edb555dd61e3a9961140fd64c90bce1eafd741609d334d/httpcore-1.0.9-py3-none-any.whl", hash = "sha256:2d400746a40668fc9dec9810239072b40b4484b640a8c38fd654a024c7a1bf55", size = 78784, upload-time = "2025-04-24T22:06:20.566Z" },
1284
  ]
1285
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1286
  [[package]]
1287
  name = "httpx"
1288
  version = "0.28.1"
@@ -2136,6 +2179,7 @@ version = "0.1.0"
2136
  source = { editable = "." }
2137
  dependencies = [
2138
  { name = "openenv-core", extra = ["core"] },
 
2139
  ]
2140
 
2141
  [package.optional-dependencies]
@@ -2153,6 +2197,7 @@ requires-dist = [
2153
  { name = "openenv-core", extras = ["core"], specifier = ">=0.2.2" },
2154
  { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0.0" },
2155
  { name = "pytest-cov", marker = "extra == 'dev'", specifier = ">=4.0.0" },
 
2156
  ]
2157
  provides-extras = ["dev", "modal"]
2158
 
@@ -3411,6 +3456,26 @@ wheels = [
3411
  { url = "https://files.pythonhosted.org/packages/16/e1/3079a9ff9b8e11b846c6ac5c8b5bfb7ff225eee721825310c91b3b50304f/tqdm-4.67.3-py3-none-any.whl", hash = "sha256:ee1e4c0e59148062281c49d80b25b67771a127c85fc9676d3be5f243206826bf", size = 78374, upload-time = "2026-02-03T17:35:50.982Z" },
3412
  ]
3413
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3414
  [[package]]
3415
  name = "typer"
3416
  version = "0.24.2"
@@ -3506,6 +3571,61 @@ wheels = [
3506
  { url = "https://files.pythonhosted.org/packages/31/a3/5b1562db76a5a488274b2332a97199b32d0442aca0ed193697fd47786316/uvicorn-0.46.0-py3-none-any.whl", hash = "sha256:bbebbcbed972d162afca128605223022bedd345b7bc7855ce66deb31487a9048", size = 70926, upload-time = "2026-04-23T07:15:58.355Z" },
3507
  ]
3508
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3509
  [[package]]
3510
  name = "watchfiles"
3511
  version = "1.1.1"
 
1283
  { url = "https://files.pythonhosted.org/packages/7e/f5/f66802a942d491edb555dd61e3a9961140fd64c90bce1eafd741609d334d/httpcore-1.0.9-py3-none-any.whl", hash = "sha256:2d400746a40668fc9dec9810239072b40b4484b640a8c38fd654a024c7a1bf55", size = 78784, upload-time = "2025-04-24T22:06:20.566Z" },
1284
  ]
1285
 
1286
+ [[package]]
1287
+ name = "httptools"
1288
+ version = "0.7.1"
1289
+ source = { registry = "https://pypi.org/simple" }
1290
+ sdist = { url = "https://files.pythonhosted.org/packages/b5/46/120a669232c7bdedb9d52d4aeae7e6c7dfe151e99dc70802e2fc7a5e1993/httptools-0.7.1.tar.gz", hash = "sha256:abd72556974f8e7c74a259655924a717a2365b236c882c3f6f8a45fe94703ac9", size = 258961, upload-time = "2025-10-10T03:55:08.559Z" }
1291
+ wheels = [
1292
+ { url = "https://files.pythonhosted.org/packages/c7/e5/c07e0bcf4ec8db8164e9f6738c048b2e66aabf30e7506f440c4cc6953f60/httptools-0.7.1-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:11d01b0ff1fe02c4c32d60af61a4d613b74fad069e47e06e9067758c01e9ac78", size = 204531, upload-time = "2025-10-10T03:54:20.887Z" },
1293
+ { url = "https://files.pythonhosted.org/packages/7e/4f/35e3a63f863a659f92ffd92bef131f3e81cf849af26e6435b49bd9f6f751/httptools-0.7.1-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:84d86c1e5afdc479a6fdabf570be0d3eb791df0ae727e8dbc0259ed1249998d4", size = 109408, upload-time = "2025-10-10T03:54:22.455Z" },
1294
+ { url = "https://files.pythonhosted.org/packages/f5/71/b0a9193641d9e2471ac541d3b1b869538a5fb6419d52fd2669fa9c79e4b8/httptools-0.7.1-cp310-cp310-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:c8c751014e13d88d2be5f5f14fc8b89612fcfa92a9cc480f2bc1598357a23a05", size = 440889, upload-time = "2025-10-10T03:54:23.753Z" },
1295
+ { url = "https://files.pythonhosted.org/packages/eb/d9/2e34811397b76718750fea44658cb0205b84566e895192115252e008b152/httptools-0.7.1-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:654968cb6b6c77e37b832a9be3d3ecabb243bbe7a0b8f65fbc5b6b04c8fcabed", size = 440460, upload-time = "2025-10-10T03:54:25.313Z" },
1296
+ { url = "https://files.pythonhosted.org/packages/01/3f/a04626ebeacc489866bb4d82362c0657b2262bef381d68310134be7f40bb/httptools-0.7.1-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:b580968316348b474b020edf3988eecd5d6eec4634ee6561e72ae3a2a0e00a8a", size = 425267, upload-time = "2025-10-10T03:54:26.81Z" },
1297
+ { url = "https://files.pythonhosted.org/packages/a5/99/adcd4f66614db627b587627c8ad6f4c55f18881549bab10ecf180562e7b9/httptools-0.7.1-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:d496e2f5245319da9d764296e86c5bb6fcf0cf7a8806d3d000717a889c8c0b7b", size = 424429, upload-time = "2025-10-10T03:54:28.174Z" },
1298
+ { url = "https://files.pythonhosted.org/packages/d5/72/ec8fc904a8fd30ba022dfa85f3bbc64c3c7cd75b669e24242c0658e22f3c/httptools-0.7.1-cp310-cp310-win_amd64.whl", hash = "sha256:cbf8317bfccf0fed3b5680c559d3459cccf1abe9039bfa159e62e391c7270568", size = 86173, upload-time = "2025-10-10T03:54:29.5Z" },
1299
+ { url = "https://files.pythonhosted.org/packages/9c/08/17e07e8d89ab8f343c134616d72eebfe03798835058e2ab579dcc8353c06/httptools-0.7.1-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:474d3b7ab469fefcca3697a10d11a32ee2b9573250206ba1e50d5980910da657", size = 206521, upload-time = "2025-10-10T03:54:31.002Z" },
1300
+ { url = "https://files.pythonhosted.org/packages/aa/06/c9c1b41ff52f16aee526fd10fbda99fa4787938aa776858ddc4a1ea825ec/httptools-0.7.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:a3c3b7366bb6c7b96bd72d0dbe7f7d5eead261361f013be5f6d9590465ea1c70", size = 110375, upload-time = "2025-10-10T03:54:31.941Z" },
1301
+ { url = "https://files.pythonhosted.org/packages/cc/cc/10935db22fda0ee34c76f047590ca0a8bd9de531406a3ccb10a90e12ea21/httptools-0.7.1-cp311-cp311-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:379b479408b8747f47f3b253326183d7c009a3936518cdb70db58cffd369d9df", size = 456621, upload-time = "2025-10-10T03:54:33.176Z" },
1302
+ { url = "https://files.pythonhosted.org/packages/0e/84/875382b10d271b0c11aa5d414b44f92f8dd53e9b658aec338a79164fa548/httptools-0.7.1-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:cad6b591a682dcc6cf1397c3900527f9affef1e55a06c4547264796bbd17cf5e", size = 454954, upload-time = "2025-10-10T03:54:34.226Z" },
1303
+ { url = "https://files.pythonhosted.org/packages/30/e1/44f89b280f7e46c0b1b2ccee5737d46b3bb13136383958f20b580a821ca0/httptools-0.7.1-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:eb844698d11433d2139bbeeb56499102143beb582bd6c194e3ba69c22f25c274", size = 440175, upload-time = "2025-10-10T03:54:35.942Z" },
1304
+ { url = "https://files.pythonhosted.org/packages/6f/7e/b9287763159e700e335028bc1824359dc736fa9b829dacedace91a39b37e/httptools-0.7.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:f65744d7a8bdb4bda5e1fa23e4ba16832860606fcc09d674d56e425e991539ec", size = 440310, upload-time = "2025-10-10T03:54:37.1Z" },
1305
+ { url = "https://files.pythonhosted.org/packages/b3/07/5b614f592868e07f5c94b1f301b5e14a21df4e8076215a3bccb830a687d8/httptools-0.7.1-cp311-cp311-win_amd64.whl", hash = "sha256:135fbe974b3718eada677229312e97f3b31f8a9c8ffa3ae6f565bf808d5b6bcb", size = 86875, upload-time = "2025-10-10T03:54:38.421Z" },
1306
+ { url = "https://files.pythonhosted.org/packages/53/7f/403e5d787dc4942316e515e949b0c8a013d84078a915910e9f391ba9b3ed/httptools-0.7.1-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:38e0c83a2ea9746ebbd643bdfb521b9aa4a91703e2cd705c20443405d2fd16a5", size = 206280, upload-time = "2025-10-10T03:54:39.274Z" },
1307
+ { url = "https://files.pythonhosted.org/packages/2a/0d/7f3fd28e2ce311ccc998c388dd1c53b18120fda3b70ebb022b135dc9839b/httptools-0.7.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:f25bbaf1235e27704f1a7b86cd3304eabc04f569c828101d94a0e605ef7205a5", size = 110004, upload-time = "2025-10-10T03:54:40.403Z" },
1308
+ { url = "https://files.pythonhosted.org/packages/84/a6/b3965e1e146ef5762870bbe76117876ceba51a201e18cc31f5703e454596/httptools-0.7.1-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:2c15f37ef679ab9ecc06bfc4e6e8628c32a8e4b305459de7cf6785acd57e4d03", size = 517655, upload-time = "2025-10-10T03:54:41.347Z" },
1309
+ { url = "https://files.pythonhosted.org/packages/11/7d/71fee6f1844e6fa378f2eddde6c3e41ce3a1fb4b2d81118dd544e3441ec0/httptools-0.7.1-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7fe6e96090df46b36ccfaf746f03034e5ab723162bc51b0a4cf58305324036f2", size = 511440, upload-time = "2025-10-10T03:54:42.452Z" },
1310
+ { url = "https://files.pythonhosted.org/packages/22/a5/079d216712a4f3ffa24af4a0381b108aa9c45b7a5cc6eb141f81726b1823/httptools-0.7.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:f72fdbae2dbc6e68b8239defb48e6a5937b12218e6ffc2c7846cc37befa84362", size = 495186, upload-time = "2025-10-10T03:54:43.937Z" },
1311
+ { url = "https://files.pythonhosted.org/packages/e9/9e/025ad7b65278745dee3bd0ebf9314934c4592560878308a6121f7f812084/httptools-0.7.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:e99c7b90a29fd82fea9ef57943d501a16f3404d7b9ee81799d41639bdaae412c", size = 499192, upload-time = "2025-10-10T03:54:45.003Z" },
1312
+ { url = "https://files.pythonhosted.org/packages/6d/de/40a8f202b987d43afc4d54689600ff03ce65680ede2f31df348d7f368b8f/httptools-0.7.1-cp312-cp312-win_amd64.whl", hash = "sha256:3e14f530fefa7499334a79b0cf7e7cd2992870eb893526fb097d51b4f2d0f321", size = 86694, upload-time = "2025-10-10T03:54:45.923Z" },
1313
+ { url = "https://files.pythonhosted.org/packages/09/8f/c77b1fcbfd262d422f12da02feb0d218fa228d52485b77b953832105bb90/httptools-0.7.1-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:6babce6cfa2a99545c60bfef8bee0cc0545413cb0018f617c8059a30ad985de3", size = 202889, upload-time = "2025-10-10T03:54:47.089Z" },
1314
+ { url = "https://files.pythonhosted.org/packages/0a/1a/22887f53602feaa066354867bc49a68fc295c2293433177ee90870a7d517/httptools-0.7.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:601b7628de7504077dd3dcb3791c6b8694bbd967148a6d1f01806509254fb1ca", size = 108180, upload-time = "2025-10-10T03:54:48.052Z" },
1315
+ { url = "https://files.pythonhosted.org/packages/32/6a/6aaa91937f0010d288d3d124ca2946d48d60c3a5ee7ca62afe870e3ea011/httptools-0.7.1-cp313-cp313-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:04c6c0e6c5fb0739c5b8a9eb046d298650a0ff38cf42537fc372b28dc7e4472c", size = 478596, upload-time = "2025-10-10T03:54:48.919Z" },
1316
+ { url = "https://files.pythonhosted.org/packages/6d/70/023d7ce117993107be88d2cbca566a7c1323ccbaf0af7eabf2064fe356f6/httptools-0.7.1-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:69d4f9705c405ae3ee83d6a12283dc9feba8cc6aaec671b412917e644ab4fa66", size = 473268, upload-time = "2025-10-10T03:54:49.993Z" },
1317
+ { url = "https://files.pythonhosted.org/packages/32/4d/9dd616c38da088e3f436e9a616e1d0cc66544b8cdac405cc4e81c8679fc7/httptools-0.7.1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:44c8f4347d4b31269c8a9205d8a5ee2df5322b09bbbd30f8f862185bb6b05346", size = 455517, upload-time = "2025-10-10T03:54:51.066Z" },
1318
+ { url = "https://files.pythonhosted.org/packages/1d/3a/a6c595c310b7df958e739aae88724e24f9246a514d909547778d776799be/httptools-0.7.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:465275d76db4d554918aba40bf1cbebe324670f3dfc979eaffaa5d108e2ed650", size = 458337, upload-time = "2025-10-10T03:54:52.196Z" },
1319
+ { url = "https://files.pythonhosted.org/packages/fd/82/88e8d6d2c51edc1cc391b6e044c6c435b6aebe97b1abc33db1b0b24cd582/httptools-0.7.1-cp313-cp313-win_amd64.whl", hash = "sha256:322d00c2068d125bd570f7bf78b2d367dad02b919d8581d7476d8b75b294e3e6", size = 85743, upload-time = "2025-10-10T03:54:53.448Z" },
1320
+ { url = "https://files.pythonhosted.org/packages/34/50/9d095fcbb6de2d523e027a2f304d4551855c2f46e0b82befd718b8b20056/httptools-0.7.1-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:c08fe65728b8d70b6923ce31e3956f859d5e1e8548e6f22ec520a962c6757270", size = 203619, upload-time = "2025-10-10T03:54:54.321Z" },
1321
+ { url = "https://files.pythonhosted.org/packages/07/f0/89720dc5139ae54b03f861b5e2c55a37dba9a5da7d51e1e824a1f343627f/httptools-0.7.1-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:7aea2e3c3953521c3c51106ee11487a910d45586e351202474d45472db7d72d3", size = 108714, upload-time = "2025-10-10T03:54:55.163Z" },
1322
+ { url = "https://files.pythonhosted.org/packages/b3/cb/eea88506f191fb552c11787c23f9a405f4c7b0c5799bf73f2249cd4f5228/httptools-0.7.1-cp314-cp314-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:0e68b8582f4ea9166be62926077a3334064d422cf08ab87d8b74664f8e9058e1", size = 472909, upload-time = "2025-10-10T03:54:56.056Z" },
1323
+ { url = "https://files.pythonhosted.org/packages/e0/4a/a548bdfae6369c0d078bab5769f7b66f17f1bfaa6fa28f81d6be6959066b/httptools-0.7.1-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:df091cf961a3be783d6aebae963cc9b71e00d57fa6f149025075217bc6a55a7b", size = 470831, upload-time = "2025-10-10T03:54:57.219Z" },
1324
+ { url = "https://files.pythonhosted.org/packages/4d/31/14df99e1c43bd132eec921c2e7e11cda7852f65619bc0fc5bdc2d0cb126c/httptools-0.7.1-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:f084813239e1eb403ddacd06a30de3d3e09a9b76e7894dcda2b22f8a726e9c60", size = 452631, upload-time = "2025-10-10T03:54:58.219Z" },
1325
+ { url = "https://files.pythonhosted.org/packages/22/d2/b7e131f7be8d854d48cb6d048113c30f9a46dca0c9a8b08fcb3fcd588cdc/httptools-0.7.1-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:7347714368fb2b335e9063bc2b96f2f87a9ceffcd9758ac295f8bbcd3ffbc0ca", size = 452910, upload-time = "2025-10-10T03:54:59.366Z" },
1326
+ { url = "https://files.pythonhosted.org/packages/53/cf/878f3b91e4e6e011eff6d1fa9ca39f7eb17d19c9d7971b04873734112f30/httptools-0.7.1-cp314-cp314-win_amd64.whl", hash = "sha256:cfabda2a5bb85aa2a904ce06d974a3f30fb36cc63d7feaddec05d2050acede96", size = 88205, upload-time = "2025-10-10T03:55:00.389Z" },
1327
+ ]
1328
+
1329
  [[package]]
1330
  name = "httpx"
1331
  version = "0.28.1"
 
2179
  source = { editable = "." }
2180
  dependencies = [
2181
  { name = "openenv-core", extra = ["core"] },
2182
+ { name = "trackio" },
2183
  ]
2184
 
2185
  [package.optional-dependencies]
 
2197
  { name = "openenv-core", extras = ["core"], specifier = ">=0.2.2" },
2198
  { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0.0" },
2199
  { name = "pytest-cov", marker = "extra == 'dev'", specifier = ">=4.0.0" },
2200
+ { name = "trackio", specifier = ">=0.22.0" },
2201
  ]
2202
  provides-extras = ["dev", "modal"]
2203
 
 
3456
  { url = "https://files.pythonhosted.org/packages/16/e1/3079a9ff9b8e11b846c6ac5c8b5bfb7ff225eee721825310c91b3b50304f/tqdm-4.67.3-py3-none-any.whl", hash = "sha256:ee1e4c0e59148062281c49d80b25b67771a127c85fc9676d3be5f243206826bf", size = 78374, upload-time = "2026-02-03T17:35:50.982Z" },
3457
  ]
3458
 
3459
+ [[package]]
3460
+ name = "trackio"
3461
+ version = "0.25.0"
3462
+ source = { registry = "https://pypi.org/simple" }
3463
+ dependencies = [
3464
+ { name = "gradio-client" },
3465
+ { name = "huggingface-hub" },
3466
+ { name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" },
3467
+ { name = "numpy", version = "2.4.4", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
3468
+ { name = "orjson" },
3469
+ { name = "pillow" },
3470
+ { name = "python-multipart" },
3471
+ { name = "starlette" },
3472
+ { name = "tomli", marker = "python_full_version < '3.11'" },
3473
+ { name = "uvicorn", extra = ["standard"] },
3474
+ ]
3475
+ wheels = [
3476
+ { url = "https://files.pythonhosted.org/packages/e7/4d/2aa0e1ca6daebdfac79fadd2ab308d5880c8d0305b2ce8b88900f95a8415/trackio-0.25.0-py3-none-any.whl", hash = "sha256:6c1ae7decef6e35d1165a6b2536d6df8c67594329bdf6bd9f1786c153a532b9f", size = 1653706, upload-time = "2026-04-23T15:45:29.887Z" },
3477
+ ]
3478
+
3479
  [[package]]
3480
  name = "typer"
3481
  version = "0.24.2"
 
3571
  { url = "https://files.pythonhosted.org/packages/31/a3/5b1562db76a5a488274b2332a97199b32d0442aca0ed193697fd47786316/uvicorn-0.46.0-py3-none-any.whl", hash = "sha256:bbebbcbed972d162afca128605223022bedd345b7bc7855ce66deb31487a9048", size = 70926, upload-time = "2026-04-23T07:15:58.355Z" },
3572
  ]
3573
 
3574
+ [package.optional-dependencies]
3575
+ standard = [
3576
+ { name = "colorama", marker = "sys_platform == 'win32'" },
3577
+ { name = "httptools" },
3578
+ { name = "python-dotenv" },
3579
+ { name = "pyyaml" },
3580
+ { name = "uvloop", marker = "platform_python_implementation != 'PyPy' and sys_platform != 'cygwin' and sys_platform != 'win32'" },
3581
+ { name = "watchfiles" },
3582
+ { name = "websockets" },
3583
+ ]
3584
+
3585
+ [[package]]
3586
+ name = "uvloop"
3587
+ version = "0.22.1"
3588
+ source = { registry = "https://pypi.org/simple" }
3589
+ sdist = { url = "https://files.pythonhosted.org/packages/06/f0/18d39dbd1971d6d62c4629cc7fa67f74821b0dc1f5a77af43719de7936a7/uvloop-0.22.1.tar.gz", hash = "sha256:6c84bae345b9147082b17371e3dd5d42775bddce91f885499017f4607fdaf39f", size = 2443250, upload-time = "2025-10-16T22:17:19.342Z" }
3590
+ wheels = [
3591
+ { url = "https://files.pythonhosted.org/packages/eb/14/ecceb239b65adaaf7fde510aa8bd534075695d1e5f8dadfa32b5723d9cfb/uvloop-0.22.1-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:ef6f0d4cc8a9fa1f6a910230cd53545d9a14479311e87e3cb225495952eb672c", size = 1343335, upload-time = "2025-10-16T22:16:11.43Z" },
3592
+ { url = "https://files.pythonhosted.org/packages/ba/ae/6f6f9af7f590b319c94532b9567409ba11f4fa71af1148cab1bf48a07048/uvloop-0.22.1-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:7cd375a12b71d33d46af85a3343b35d98e8116134ba404bd657b3b1d15988792", size = 742903, upload-time = "2025-10-16T22:16:12.979Z" },
3593
+ { url = "https://files.pythonhosted.org/packages/09/bd/3667151ad0702282a1f4d5d29288fce8a13c8b6858bf0978c219cd52b231/uvloop-0.22.1-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:ac33ed96229b7790eb729702751c0e93ac5bc3bcf52ae9eccbff30da09194b86", size = 3648499, upload-time = "2025-10-16T22:16:14.451Z" },
3594
+ { url = "https://files.pythonhosted.org/packages/b3/f6/21657bb3beb5f8c57ce8be3b83f653dd7933c2fd00545ed1b092d464799a/uvloop-0.22.1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:481c990a7abe2c6f4fc3d98781cc9426ebd7f03a9aaa7eb03d3bfc68ac2a46bd", size = 3700133, upload-time = "2025-10-16T22:16:16.272Z" },
3595
+ { url = "https://files.pythonhosted.org/packages/09/e0/604f61d004ded805f24974c87ddd8374ef675644f476f01f1df90e4cdf72/uvloop-0.22.1-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:a592b043a47ad17911add5fbd087c76716d7c9ccc1d64ec9249ceafd735f03c2", size = 3512681, upload-time = "2025-10-16T22:16:18.07Z" },
3596
+ { url = "https://files.pythonhosted.org/packages/bb/ce/8491fd370b0230deb5eac69c7aae35b3be527e25a911c0acdffb922dc1cd/uvloop-0.22.1-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:1489cf791aa7b6e8c8be1c5a080bae3a672791fcb4e9e12249b05862a2ca9cec", size = 3615261, upload-time = "2025-10-16T22:16:19.596Z" },
3597
+ { url = "https://files.pythonhosted.org/packages/c7/d5/69900f7883235562f1f50d8184bb7dd84a2fb61e9ec63f3782546fdbd057/uvloop-0.22.1-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:c60ebcd36f7b240b30788554b6f0782454826a0ed765d8430652621b5de674b9", size = 1352420, upload-time = "2025-10-16T22:16:21.187Z" },
3598
+ { url = "https://files.pythonhosted.org/packages/a8/73/c4e271b3bce59724e291465cc936c37758886a4868787da0278b3b56b905/uvloop-0.22.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:3b7f102bf3cb1995cfeaee9321105e8f5da76fdb104cdad8986f85461a1b7b77", size = 748677, upload-time = "2025-10-16T22:16:22.558Z" },
3599
+ { url = "https://files.pythonhosted.org/packages/86/94/9fb7fad2f824d25f8ecac0d70b94d0d48107ad5ece03769a9c543444f78a/uvloop-0.22.1-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:53c85520781d84a4b8b230e24a5af5b0778efdb39142b424990ff1ef7c48ba21", size = 3753819, upload-time = "2025-10-16T22:16:23.903Z" },
3600
+ { url = "https://files.pythonhosted.org/packages/74/4f/256aca690709e9b008b7108bc85fba619a2bc37c6d80743d18abad16ee09/uvloop-0.22.1-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:56a2d1fae65fd82197cb8c53c367310b3eabe1bbb9fb5a04d28e3e3520e4f702", size = 3804529, upload-time = "2025-10-16T22:16:25.246Z" },
3601
+ { url = "https://files.pythonhosted.org/packages/7f/74/03c05ae4737e871923d21a76fe28b6aad57f5c03b6e6bfcfa5ad616013e4/uvloop-0.22.1-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:40631b049d5972c6755b06d0bfe8233b1bd9a8a6392d9d1c45c10b6f9e9b2733", size = 3621267, upload-time = "2025-10-16T22:16:26.819Z" },
3602
+ { url = "https://files.pythonhosted.org/packages/75/be/f8e590fe61d18b4a92070905497aec4c0e64ae1761498cad09023f3f4b3e/uvloop-0.22.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:535cc37b3a04f6cd2c1ef65fa1d370c9a35b6695df735fcff5427323f2cd5473", size = 3723105, upload-time = "2025-10-16T22:16:28.252Z" },
3603
+ { url = "https://files.pythonhosted.org/packages/3d/ff/7f72e8170be527b4977b033239a83a68d5c881cc4775fca255c677f7ac5d/uvloop-0.22.1-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:fe94b4564e865d968414598eea1a6de60adba0c040ba4ed05ac1300de402cd42", size = 1359936, upload-time = "2025-10-16T22:16:29.436Z" },
3604
+ { url = "https://files.pythonhosted.org/packages/c3/c6/e5d433f88fd54d81ef4be58b2b7b0cea13c442454a1db703a1eea0db1a59/uvloop-0.22.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:51eb9bd88391483410daad430813d982010f9c9c89512321f5b60e2cddbdddd6", size = 752769, upload-time = "2025-10-16T22:16:30.493Z" },
3605
+ { url = "https://files.pythonhosted.org/packages/24/68/a6ac446820273e71aa762fa21cdcc09861edd3536ff47c5cd3b7afb10eeb/uvloop-0.22.1-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:700e674a166ca5778255e0e1dc4e9d79ab2acc57b9171b79e65feba7184b3370", size = 4317413, upload-time = "2025-10-16T22:16:31.644Z" },
3606
+ { url = "https://files.pythonhosted.org/packages/5f/6f/e62b4dfc7ad6518e7eff2516f680d02a0f6eb62c0c212e152ca708a0085e/uvloop-0.22.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:7b5b1ac819a3f946d3b2ee07f09149578ae76066d70b44df3fa990add49a82e4", size = 4426307, upload-time = "2025-10-16T22:16:32.917Z" },
3607
+ { url = "https://files.pythonhosted.org/packages/90/60/97362554ac21e20e81bcef1150cb2a7e4ffdaf8ea1e5b2e8bf7a053caa18/uvloop-0.22.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:e047cc068570bac9866237739607d1313b9253c3051ad84738cbb095be0537b2", size = 4131970, upload-time = "2025-10-16T22:16:34.015Z" },
3608
+ { url = "https://files.pythonhosted.org/packages/99/39/6b3f7d234ba3964c428a6e40006340f53ba37993f46ed6e111c6e9141d18/uvloop-0.22.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:512fec6815e2dd45161054592441ef76c830eddaad55c8aa30952e6fe1ed07c0", size = 4296343, upload-time = "2025-10-16T22:16:35.149Z" },
3609
+ { url = "https://files.pythonhosted.org/packages/89/8c/182a2a593195bfd39842ea68ebc084e20c850806117213f5a299dfc513d9/uvloop-0.22.1-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:561577354eb94200d75aca23fbde86ee11be36b00e52a4eaf8f50fb0c86b7705", size = 1358611, upload-time = "2025-10-16T22:16:36.833Z" },
3610
+ { url = "https://files.pythonhosted.org/packages/d2/14/e301ee96a6dc95224b6f1162cd3312f6d1217be3907b79173b06785f2fe7/uvloop-0.22.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:1cdf5192ab3e674ca26da2eada35b288d2fa49fdd0f357a19f0e7c4e7d5077c8", size = 751811, upload-time = "2025-10-16T22:16:38.275Z" },
3611
+ { url = "https://files.pythonhosted.org/packages/b7/02/654426ce265ac19e2980bfd9ea6590ca96a56f10c76e63801a2df01c0486/uvloop-0.22.1-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6e2ea3d6190a2968f4a14a23019d3b16870dd2190cd69c8180f7c632d21de68d", size = 4288562, upload-time = "2025-10-16T22:16:39.375Z" },
3612
+ { url = "https://files.pythonhosted.org/packages/15/c0/0be24758891ef825f2065cd5db8741aaddabe3e248ee6acc5e8a80f04005/uvloop-0.22.1-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0530a5fbad9c9e4ee3f2b33b148c6a64d47bbad8000ea63704fa8260f4cf728e", size = 4366890, upload-time = "2025-10-16T22:16:40.547Z" },
3613
+ { url = "https://files.pythonhosted.org/packages/d2/53/8369e5219a5855869bcee5f4d317f6da0e2c669aecf0ef7d371e3d084449/uvloop-0.22.1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:bc5ef13bbc10b5335792360623cc378d52d7e62c2de64660616478c32cd0598e", size = 4119472, upload-time = "2025-10-16T22:16:41.694Z" },
3614
+ { url = "https://files.pythonhosted.org/packages/f8/ba/d69adbe699b768f6b29a5eec7b47dd610bd17a69de51b251126a801369ea/uvloop-0.22.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:1f38ec5e3f18c8a10ded09742f7fb8de0108796eb673f30ce7762ce1b8550cad", size = 4239051, upload-time = "2025-10-16T22:16:43.224Z" },
3615
+ { url = "https://files.pythonhosted.org/packages/90/cd/b62bdeaa429758aee8de8b00ac0dd26593a9de93d302bff3d21439e9791d/uvloop-0.22.1-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:3879b88423ec7e97cd4eba2a443aa26ed4e59b45e6b76aabf13fe2f27023a142", size = 1362067, upload-time = "2025-10-16T22:16:44.503Z" },
3616
+ { url = "https://files.pythonhosted.org/packages/0d/f8/a132124dfda0777e489ca86732e85e69afcd1ff7686647000050ba670689/uvloop-0.22.1-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:4baa86acedf1d62115c1dc6ad1e17134476688f08c6efd8a2ab076e815665c74", size = 752423, upload-time = "2025-10-16T22:16:45.968Z" },
3617
+ { url = "https://files.pythonhosted.org/packages/a3/94/94af78c156f88da4b3a733773ad5ba0b164393e357cc4bd0ab2e2677a7d6/uvloop-0.22.1-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:297c27d8003520596236bdb2335e6b3f649480bd09e00d1e3a99144b691d2a35", size = 4272437, upload-time = "2025-10-16T22:16:47.451Z" },
3618
+ { url = "https://files.pythonhosted.org/packages/b5/35/60249e9fd07b32c665192cec7af29e06c7cd96fa1d08b84f012a56a0b38e/uvloop-0.22.1-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:c1955d5a1dd43198244d47664a5858082a3239766a839b2102a269aaff7a4e25", size = 4292101, upload-time = "2025-10-16T22:16:49.318Z" },
3619
+ { url = "https://files.pythonhosted.org/packages/02/62/67d382dfcb25d0a98ce73c11ed1a6fba5037a1a1d533dcbb7cab033a2636/uvloop-0.22.1-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:b31dc2fccbd42adc73bc4e7cdbae4fc5086cf378979e53ca5d0301838c5682c6", size = 4114158, upload-time = "2025-10-16T22:16:50.517Z" },
3620
+ { url = "https://files.pythonhosted.org/packages/f0/7a/f1171b4a882a5d13c8b7576f348acfe6074d72eaf52cccef752f748d4a9f/uvloop-0.22.1-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:93f617675b2d03af4e72a5333ef89450dfaa5321303ede6e67ba9c9d26878079", size = 4177360, upload-time = "2025-10-16T22:16:52.646Z" },
3621
+ { url = "https://files.pythonhosted.org/packages/79/7b/b01414f31546caf0919da80ad57cbfe24c56b151d12af68cee1b04922ca8/uvloop-0.22.1-cp314-cp314t-macosx_10_13_universal2.whl", hash = "sha256:37554f70528f60cad66945b885eb01f1bb514f132d92b6eeed1c90fd54ed6289", size = 1454790, upload-time = "2025-10-16T22:16:54.355Z" },
3622
+ { url = "https://files.pythonhosted.org/packages/d4/31/0bb232318dd838cad3fa8fb0c68c8b40e1145b32025581975e18b11fab40/uvloop-0.22.1-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:b76324e2dc033a0b2f435f33eb88ff9913c156ef78e153fb210e03c13da746b3", size = 796783, upload-time = "2025-10-16T22:16:55.906Z" },
3623
+ { url = "https://files.pythonhosted.org/packages/42/38/c9b09f3271a7a723a5de69f8e237ab8e7803183131bc57c890db0b6bb872/uvloop-0.22.1-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:badb4d8e58ee08dad957002027830d5c3b06aea446a6a3744483c2b3b745345c", size = 4647548, upload-time = "2025-10-16T22:16:57.008Z" },
3624
+ { url = "https://files.pythonhosted.org/packages/c1/37/945b4ca0ac27e3dc4952642d4c900edd030b3da6c9634875af6e13ae80e5/uvloop-0.22.1-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:b91328c72635f6f9e0282e4a57da7470c7350ab1c9f48546c0f2866205349d21", size = 4467065, upload-time = "2025-10-16T22:16:58.206Z" },
3625
+ { url = "https://files.pythonhosted.org/packages/97/cc/48d232f33d60e2e2e0b42f4e73455b146b76ebe216487e862700457fbf3c/uvloop-0.22.1-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:daf620c2995d193449393d6c62131b3fbd40a63bf7b307a1527856ace637fe88", size = 4328384, upload-time = "2025-10-16T22:16:59.36Z" },
3626
+ { url = "https://files.pythonhosted.org/packages/e4/16/c1fd27e9549f3c4baf1dc9c20c456cd2f822dbf8de9f463824b0c0357e06/uvloop-0.22.1-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:6cde23eeda1a25c75b2e07d39970f3374105d5eafbaab2a4482be82f272d5a5e", size = 4296730, upload-time = "2025-10-16T22:17:00.744Z" },
3627
+ ]
3628
+
3629
  [[package]]
3630
  name = "watchfiles"
3631
  version = "1.1.1"