File size: 21,891 Bytes
21c7db9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# PolyGuard SFT + GRPO One-Run Runner\n",
        "\n",
        "`POLYGUARD_ONE_RUN_RUNNER`\n",
        "\n",
        "Run this notebook from top to bottom to execute the PolyGuard pipeline from data build through SFT baseline training, GRPO environment-reward training, artifact pull, inference validation, report/chart generation, and Hugging Face Space deployment.\n",
        "\n",
        "Default behavior uses Hugging Face Spaces for GPU training, not local Ollama or local GPU training. Keep `HF_TOKEN` in an environment variable or notebook secret; do not paste it into a cell output or commit it.\n",
        "\n",
        "Reward values are expected to remain numeric, rounded to 3 decimals, and clamped to `[0.001, 0.999]` throughout the API, reports, and charts."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 0) Configuration Notes\n",
        "\n",
        "The notebook is intentionally root-level in `polyguard-rl/`. If opened from Colab without the rest of the repo, the first cell clones the GitHub repo and changes into `polyguard-rl/`.\n",
        "\n",
        "Useful overrides:\n",
        "\n",
        "- `HF_TOKEN`: write token for Spaces, model artifact repos, and private artifact pulls.\n",
        "- `HF_USERNAME`: target Hub namespace. If omitted, the authenticated username is used.\n",
        "- `POLYGUARD_MODEL_SWEEP`: comma-separated models, default Qwen 0.5B, 1.5B, and 3B instruct.\n",
        "- `POLYGUARD_SFT_EPOCHS`, `POLYGUARD_GRPO_EPOCHS`: training epochs.\n",
        "- `POLYGUARD_SFT_MAX_STEPS=0`, `POLYGUARD_GRPO_MAX_STEPS=0`, `POLYGUARD_GRPO_MAX_PROMPTS=0`: full-corpus/full-epoch mode.\n",
        "- `POLYGUARD_WAIT_FOR_REMOTE_TRAINING=1`: keep polling until artifacts are pulled or timeout hits.\n",
        "- `POLYGUARD_RUN_LOCAL_SMOKE=1`: also run a tiny local SFT/GRPO smoke loop."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from __future__ import annotations\n",
        "\n",
        "import json\n",
        "import os\n",
        "from pathlib import Path\n",
        "import subprocess\n",
        "import sys\n",
        "import time\n",
        "\n",
        "PROJECT_SUBDIR = \"polyguard-rl\"\n",
        "DEFAULT_REPO_URL = \"https://github.com/Vishwa-docs/Meta_Pytorch_OpenEnv_Scaler_VK.git\"\n",
        "REPO_URL = os.getenv(\"POLYGUARD_GITHUB_REPO_URL\", DEFAULT_REPO_URL)\n",
        "\n",
        "cwd = Path.cwd().resolve()\n",
        "if (cwd / \"pyproject.toml\").exists() and (cwd / \"scripts\").exists():\n",
        "    ROOT = cwd\n",
        "elif (cwd / PROJECT_SUBDIR / \"pyproject.toml\").exists():\n",
        "    ROOT = cwd / PROJECT_SUBDIR\n",
        "else:\n",
        "    clone_root = Path(os.getenv(\"POLYGUARD_REPO_DIR\", \"/content/Meta_Pytorch_OpenEnv_Scaler_VK\")).resolve()\n",
        "    if not clone_root.exists():\n",
        "        subprocess.run([\"git\", \"clone\", REPO_URL, str(clone_root)], check=True)\n",
        "    ROOT = clone_root / PROJECT_SUBDIR\n",
        "\n",
        "os.chdir(ROOT)\n",
        "print(f\"PolyGuard root: {ROOT}\")\n",
        "\n",
        "def run(cmd: list[str] | str, *, check: bool = True, env: dict[str, str] | None = None) -> subprocess.CompletedProcess[str]:\n",
        "    printable = cmd if isinstance(cmd, str) else \" \".join(cmd)\n",
        "    print(f\"\\n$ {printable}\")\n",
        "    merged_env = os.environ.copy()\n",
        "    if env:\n",
        "        merged_env.update(env)\n",
        "    completed = subprocess.run(cmd, check=False, text=True, env=merged_env)\n",
        "    if check and completed.returncode != 0:\n",
        "        raise RuntimeError(f\"command_failed:{printable}\")\n",
        "    return completed\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Install local runtime dependencies. This keeps the notebook kernel light while project commands run through uv.\n",
        "run([sys.executable, \"-m\", \"pip\", \"install\", \"-q\", \"-U\", \"uv\", \"huggingface_hub\", \"gradio_client\"])\n",
        "run([\"uv\", \"sync\"])\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "def read_colab_secret(name: str) -> str:\n",
        "    try:\n",
        "        from google.colab import userdata  # type: ignore\n",
        "    except Exception:\n",
        "        return \"\"\n",
        "    try:\n",
        "        return str(userdata.get(name) or \"\")\n",
        "    except Exception:\n",
        "        return \"\"\n",
        "\n",
        "HF_TOKEN = os.getenv(\"HF_TOKEN\", \"\") or read_colab_secret(\"HF_TOKEN\")\n",
        "if HF_TOKEN:\n",
        "    os.environ[\"HF_TOKEN\"] = HF_TOKEN\n",
        "\n",
        "if os.getenv(\"POLYGUARD_REQUIRE_HF_TOKEN\", \"1\") == \"1\" and not HF_TOKEN:\n",
        "    raise RuntimeError(\"Set HF_TOKEN as an environment variable or Colab secret before running the remote training cells.\")\n",
        "\n",
        "HF_USERNAME = os.getenv(\"HF_USERNAME\", \"\")\n",
        "if HF_TOKEN and not HF_USERNAME:\n",
        "    from huggingface_hub import HfApi\n",
        "\n",
        "    whoami = HfApi(token=HF_TOKEN).whoami(token=HF_TOKEN)\n",
        "    HF_USERNAME = str(whoami.get(\"name\") or whoami.get(\"fullname\") or \"\")\n",
        "\n",
        "if not HF_USERNAME:\n",
        "    HF_USERNAME = \"TheJackBright\"\n",
        "\n",
        "MODEL_SWEEP = os.getenv(\n",
        "    \"POLYGUARD_MODEL_SWEEP\",\n",
        "    \"Qwen/Qwen2.5-0.5B-Instruct,Qwen/Qwen2.5-1.5B-Instruct,Qwen/Qwen2.5-3B-Instruct\",\n",
        ")\n",
        "TRAINING_SPACE_REPO_ID = os.getenv(\"POLYGUARD_TRAINING_SPACE_REPO_ID\", f\"{HF_USERNAME}/polyguard-openenv-training-full\")\n",
        "ARTIFACT_REPO_ID = os.getenv(\"POLYGUARD_ARTIFACT_REPO_ID\", f\"{HF_USERNAME}/polyguard-openenv-training-full-artifacts\")\n",
        "PRODUCT_SPACE_REPO_ID = os.getenv(\"POLYGUARD_PRODUCT_SPACE_REPO_ID\", f\"{HF_USERNAME}/polyguard-openenv\")\n",
        "\n",
        "SFT_EPOCHS = os.getenv(\"POLYGUARD_SFT_EPOCHS\", \"2\")\n",
        "GRPO_EPOCHS = os.getenv(\"POLYGUARD_GRPO_EPOCHS\", \"1\")\n",
        "SFT_MAX_STEPS = os.getenv(\"POLYGUARD_SFT_MAX_STEPS\", \"0\")\n",
        "GRPO_MAX_STEPS = os.getenv(\"POLYGUARD_GRPO_MAX_STEPS\", \"0\")\n",
        "GRPO_MAX_PROMPTS = os.getenv(\"POLYGUARD_GRPO_MAX_PROMPTS\", \"0\")\n",
        "GRPO_NUM_GENERATIONS = os.getenv(\"POLYGUARD_GRPO_NUM_GENERATIONS\", \"2\")\n",
        "DATA_PROFILE = os.getenv(\"POLYGUARD_DATA_PROFILE\", \"massive\")\n",
        "\n",
        "RUN_REMOTE_TRAINING = os.getenv(\"POLYGUARD_RUN_REMOTE_TRAINING\", \"1\") == \"1\"\n",
        "WAIT_FOR_REMOTE_TRAINING = os.getenv(\"POLYGUARD_WAIT_FOR_REMOTE_TRAINING\", \"1\") == \"1\"\n",
        "RUN_LOCAL_SMOKE = os.getenv(\"POLYGUARD_RUN_LOCAL_SMOKE\", \"0\") == \"1\"\n",
        "DEPLOY_PRODUCT_SPACE = os.getenv(\"POLYGUARD_DEPLOY_PRODUCT_SPACE\", \"1\") == \"1\"\n",
        "PRODUCT_SPACE_PRIVATE = os.getenv(\"POLYGUARD_PRODUCT_SPACE_PRIVATE\", \"0\") == \"1\"\n",
        "REMOTE_TIMEOUT_HOURS = float(os.getenv(\"POLYGUARD_REMOTE_TIMEOUT_HOURS\", \"12\"))\n",
        "REMOTE_POLL_SECONDS = int(os.getenv(\"POLYGUARD_REMOTE_POLL_SECONDS\", \"300\"))\n",
        "\n",
        "print(json.dumps({\n",
        "    \"hf_username\": HF_USERNAME,\n",
        "    \"model_sweep\": MODEL_SWEEP,\n",
        "    \"training_space_repo_id\": TRAINING_SPACE_REPO_ID,\n",
        "    \"artifact_repo_id\": ARTIFACT_REPO_ID,\n",
        "    \"product_space_repo_id\": PRODUCT_SPACE_REPO_ID,\n",
        "    \"data_profile\": DATA_PROFILE,\n",
        "    \"run_remote_training\": RUN_REMOTE_TRAINING,\n",
        "    \"wait_for_remote_training\": WAIT_FOR_REMOTE_TRAINING,\n",
        "    \"run_local_smoke\": RUN_LOCAL_SMOKE,\n",
        "    \"deploy_product_space\": DEPLOY_PRODUCT_SPACE,\n",
        "}, indent=2))\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 1) Build Data And Training Corpora\n",
        "\n",
        "This builds processed data, scenario artifacts, SFT records, and GRPO prompt episodes. The training Space repeats the full build inside its container so remote training is reproducible."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "run([\"uv\", \"run\", \"python\", \"scripts/bootstrap_data.py\"])\n",
        "run([\n",
        "    \"uv\", \"run\", \"python\", \"scripts/build_training_corpus.py\",\n",
        "    \"--profile\", DATA_PROFILE,\n",
        "    \"--with-local\",\n",
        "    \"--with-synthetic\",\n",
        "    \"--with-hf\",\n",
        "])\n",
        "summary_path = Path(\"data/processed/training_corpus_summary.json\")\n",
        "print(summary_path.read_text(encoding=\"utf-8\") if summary_path.exists() else \"training_corpus_summary_missing\")\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 2) Local Contract Checks\n",
        "\n",
        "These checks verify the package, OpenEnv contract, reward bounds, and report-generation surfaces before spending GPU time."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "run([\"uv\", \"run\", \"pytest\"])\n",
        "run([\"uv\", \"run\", \"openenv\", \"validate\", \".\"])\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 3) Optional Local Smoke SFT And GRPO\n",
        "\n",
        "The final training path is the HF Space below. Set `POLYGUARD_RUN_LOCAL_SMOKE=1` only if you want a tiny local compliance run before the remote job."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "if RUN_LOCAL_SMOKE:\n",
        "    local_model = os.getenv(\"POLYGUARD_LOCAL_SMOKE_MODEL\", \"Qwen/Qwen2.5-0.5B-Instruct\")\n",
        "    run([\n",
        "        \"uv\", \"run\", \"python\", \"scripts/train_sft_trl.py\",\n",
        "        \"--model-id\", local_model,\n",
        "        \"--dataset-path\", \"data/processed/training_corpus_sft.json\",\n",
        "        \"--output-dir\", \"checkpoints/sft_adapter\",\n",
        "        \"--report-path\", \"outputs/reports/sft_trl_run.json\",\n",
        "        \"--epochs\", \"1\",\n",
        "        \"--max-steps\", \"20\",\n",
        "        \"--batch-size\", \"1\",\n",
        "        \"--use-unsloth\",\n",
        "    ])\n",
        "    run([\n",
        "        \"uv\", \"run\", \"python\", \"scripts/train_grpo_trl.py\",\n",
        "        \"--model-id\", local_model,\n",
        "        \"--prompts-path\", \"data/processed/training_corpus_grpo_prompts.jsonl\",\n",
        "        \"--output-dir\", \"checkpoints/grpo_adapter\",\n",
        "        \"--report-path\", \"outputs/reports/grpo_trl_run.json\",\n",
        "        \"--max-steps\", \"20\",\n",
        "        \"--max-prompts\", \"64\",\n",
        "        \"--num-generations\", \"2\",\n",
        "        \"--batch-size\", \"1\",\n",
        "        \"--use-unsloth\",\n",
        "    ])\n",
        "else:\n",
        "    print(\"Local smoke skipped. Remote HF Space training remains the main path.\")\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 4) Start SFT Baseline And GRPO Training On Hugging Face Spaces\n",
        "\n",
        "This deploys the private training Space and artifact repo, starts the Docker runner, builds the full corpus inside the Space, trains SFT as the baseline, trains GRPO with environment-backed rewards, runs post-save inference and ablations, then uploads reports, plots, adapters, and manifests."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "if RUN_REMOTE_TRAINING:\n",
        "    deploy_cmd = [\n",
        "        \"uv\", \"run\", \"python\", \"scripts/deploy_training_space.py\",\n",
        "        \"--repo-id\", TRAINING_SPACE_REPO_ID,\n",
        "        \"--artifact-repo-id\", ARTIFACT_REPO_ID,\n",
        "        \"--hardware\", os.getenv(\"POLYGUARD_HF_HARDWARE\", \"a10g-large\"),\n",
        "        \"--model-sweep\", MODEL_SWEEP,\n",
        "        \"--training-mode\", os.getenv(\"POLYGUARD_TRAINING_MODE\", \"full\"),\n",
        "        \"--sft-epochs\", SFT_EPOCHS,\n",
        "        \"--grpo-epochs\", GRPO_EPOCHS,\n",
        "        \"--sft-max-steps\", SFT_MAX_STEPS,\n",
        "        \"--grpo-max-steps\", GRPO_MAX_STEPS,\n",
        "        \"--grpo-max-prompts\", GRPO_MAX_PROMPTS,\n",
        "        \"--grpo-num-generations\", GRPO_NUM_GENERATIONS,\n",
        "    ]\n",
        "    if os.getenv(\"POLYGUARD_TRAINING_SPACE_PUBLIC\", \"0\") == \"1\":\n",
        "        deploy_cmd.append(\"--public\")\n",
        "    run(deploy_cmd)\n",
        "    print(f\"Training Space: https://huggingface.co/spaces/{TRAINING_SPACE_REPO_ID}\")\n",
        "    print(f\"Artifact repo: https://huggingface.co/{ARTIFACT_REPO_ID}\")\n",
        "else:\n",
        "    print(\"Remote training deployment skipped by POLYGUARD_RUN_REMOTE_TRAINING=0\")\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 5) Monitor Space And Pull Artifacts\n",
        "\n",
        "If `POLYGUARD_WAIT_FOR_REMOTE_TRAINING=1`, this cell keeps polling until `scripts/pull_training_artifacts.py` succeeds or the timeout is reached. It never prints the token."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "monitor_output = \"outputs/reports/training_space_runtime_status.json\"\n",
        "\n",
        "def monitor_once() -> int:\n",
        "    return run([\n",
        "        \"uv\", \"run\", \"python\", \"scripts/monitor_training_space_status.py\",\n",
        "        \"--space-id\", TRAINING_SPACE_REPO_ID,\n",
        "        \"--artifact-repo-id\", ARTIFACT_REPO_ID,\n",
        "        \"--output\", monitor_output,\n",
        "    ], check=False).returncode\n",
        "\n",
        "def pull_once() -> bool:\n",
        "    return run([\n",
        "        \"uv\", \"run\", \"python\", \"scripts/pull_training_artifacts.py\",\n",
        "        \"--artifact-repo-id\", ARTIFACT_REPO_ID,\n",
        "    ], check=False).returncode == 0\n",
        "\n",
        "pulled = False\n",
        "if RUN_REMOTE_TRAINING and WAIT_FOR_REMOTE_TRAINING:\n",
        "    deadline = time.time() + REMOTE_TIMEOUT_HOURS * 3600\n",
        "    attempt = 0\n",
        "    while time.time() < deadline:\n",
        "        attempt += 1\n",
        "        print(f\"Remote poll {attempt}\")\n",
        "        monitor_once()\n",
        "        pulled = pull_once()\n",
        "        if pulled:\n",
        "            print(\"Remote training artifacts pulled successfully.\")\n",
        "            break\n",
        "        print(f\"Artifacts not ready yet. Sleeping {REMOTE_POLL_SECONDS} seconds.\")\n",
        "        time.sleep(REMOTE_POLL_SECONDS)\n",
        "    if not pulled:\n",
        "        raise TimeoutError(\"Remote training did not produce pullable artifacts before timeout.\")\n",
        "else:\n",
        "    monitor_once()\n",
        "    pulled = pull_once()\n",
        "    print(f\"Single pull attempt success: {pulled}\")\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 6) Generate Reports, Charts, And Evidence Bundles\n",
        "\n",
        "This creates SFT-vs-GRPO charts, Qwen model comparison charts, reward component bars, anti-hacking/overfit checks, basic-LLM-vs-PolyGuard evidence, action traces, and curated submission evidence folders."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "run([\"uv\", \"run\", \"python\", \"scripts/generate_hf_training_report.py\", \"--mode\", os.getenv(\"POLYGUARD_TRAINING_MODE\", \"full\")], check=False)\n",
        "run([\"uv\", \"run\", \"python\", \"scripts/evaluate_policy_ablations.py\", \"--episodes\", os.getenv(\"POLYGUARD_ABLATION_EPISODES\", \"8\")], check=False)\n",
        "run([\n",
        "    \"uv\", \"run\", \"python\", \"scripts/generate_submission_evidence.py\",\n",
        "    \"--models\", os.getenv(\"POLYGUARD_EVIDENCE_MODELS\", \"qwen-qwen2-5-0-5b-instruct,qwen-qwen2-5-1-5b-instruct\"),\n",
        "    \"--artifact-repo-id\", ARTIFACT_REPO_ID,\n",
        "    \"--training-space-url\", f\"https://{TRAINING_SPACE_REPO_ID.replace('/', '-').lower()}.hf.space\",\n",
        "    \"--episodes\", os.getenv(\"POLYGUARD_EVIDENCE_EPISODES\", \"8\"),\n",
        "], check=False)\n",
        "run([\"uv\", \"run\", \"python\", \"scripts/build_improvement_evidence_bundle.py\"], check=False)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 7) Activate A Model For Product Inference And Validate Post-Save Inference\n",
        "\n",
        "The app reads `checkpoints/active/active_model_manifest.json`. The default active run is Qwen 0.5B because it is the smallest practical implementation target; switch `POLYGUARD_ACTIVE_RUN_ID` to the 1.5B or 3B run after those artifacts are pulled."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ACTIVE_RUN_ID = os.getenv(\"POLYGUARD_ACTIVE_RUN_ID\", \"qwen-qwen2-5-0-5b-instruct\")\n",
        "run([\n",
        "    \"uv\", \"run\", \"python\", \"scripts/activate_sweep_model.py\",\n",
        "    \"--source\", \"sweep\",\n",
        "    \"--run-id\", ACTIVE_RUN_ID,\n",
        "    \"--preferred-artifact\", os.getenv(\"POLYGUARD_PREFERRED_ARTIFACT\", \"grpo_adapter\"),\n",
        "], check=False)\n",
        "run([\"uv\", \"run\", \"python\", \"scripts/test_inference_postsave.py\", \"--samples\", os.getenv(\"POLYGUARD_INFERENCE_SAMPLES\", \"3\")], check=False)\n",
        "run([\"uv\", \"run\", \"python\", \"scripts/benchmark_inference.py\"], check=False)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 8) Deploy The Product OpenEnv Space\n",
        "\n",
        "This deploys the FastAPI/OpenEnv product Space. It is separate from the private GPU training Space."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "if DEPLOY_PRODUCT_SPACE:\n",
        "    product_cmd = [\"uv\", \"run\", \"python\", \"scripts/deploy_space_api.py\", \"--repo-id\", PRODUCT_SPACE_REPO_ID]\n",
        "    if PRODUCT_SPACE_PRIVATE:\n",
        "        product_cmd.append(\"--private\")\n",
        "    run(product_cmd)\n",
        "    runtime_url = f\"https://{PRODUCT_SPACE_REPO_ID.replace('/', '-').lower()}.hf.space\"\n",
        "    run([\"uv\", \"run\", \"openenv\", \"validate\", \"--url\", runtime_url], check=False)\n",
        "    print(f\"Product Space: https://huggingface.co/spaces/{PRODUCT_SPACE_REPO_ID}\")\n",
        "    print(f\"Runtime URL: {runtime_url}\")\n",
        "else:\n",
        "    print(\"Product Space deploy skipped by POLYGUARD_DEPLOY_PRODUCT_SPACE=0\")\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 9) Final Acceptance Gate And Output Summary"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "run([\"uv\", \"run\", \"python\", \"scripts/acceptance_gate.py\"], check=False)\n",
        "\n",
        "summary = {\n",
        "    \"training_space\": f\"https://huggingface.co/spaces/{TRAINING_SPACE_REPO_ID}\",\n",
        "    \"artifact_repo\": f\"https://huggingface.co/{ARTIFACT_REPO_ID}\",\n",
        "    \"product_space\": f\"https://huggingface.co/spaces/{PRODUCT_SPACE_REPO_ID}\",\n",
        "    \"reports\": [\n",
        "        \"outputs/reports/hf_sweep_summary.json\",\n",
        "        \"outputs/reports/anti_hacking_overfit_report.json\",\n",
        "        \"outputs/reports/postsave_inference.json\",\n",
        "        \"docs/results/submission_evidence_qwen_0_5b_1_5b/README.md\",\n",
        "        \"docs/results/model_improvement_evidence_qwen_0_5b_1_5b/README.md\",\n",
        "    ],\n",
        "    \"plots_dir\": \"outputs/plots\",\n",
        "    \"active_model_manifest\": \"checkpoints/active/active_model_manifest.json\",\n",
        "}\n",
        "print(json.dumps(summary, indent=2))\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.11"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}