File size: 33,710 Bytes
e21fe7d f8a246b e21fe7d f8a246b e21fe7d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 | # PolyGuard Space UI — demo recording script (shot-by-shot)
Use this document while screen-recording the Hugging Face Space (or local Docker). Target length: **8–14 minutes** for a full pass, or **3–5 minutes** for a highlights reel.
---
## Before you hit record
1. **Open the Space** in a clean browser profile or incognito (fewer extensions → fewer glitches).
2. **Set resolution**: 1920×1080 or 1440×900; browser zoom **100%**.
3. **Fullscreen** the Space iframe or use HF “Open in new tab” so the URL bar shows the Space domain.
4. **Wait for cold start**: first load may download the model bundle (several minutes). The **Event Log** and **Model Truth** panel will tell you if the policy failed to load (heuristic fallback is still usable for env steps).
5. **Optional**: hide mouse cursor in OBS if you prefer; otherwise move slowly and pause **2 seconds** on each panel after major clicks.
**Primary Space (product):** `https://huggingface.co/spaces/TheJackBright/polyguard-openenv-workbench`
Runtime: nginx fronts the **product API** (default `8200`) and **OpenEnv service** (`8100`); see `docker/space/entrypoint.sh`.
---
## Where the model lives (Qwen and artifacts)
This matters for what you say on camera.
| Location | What it is |
| --- | --- |
| **On the Space container** | Working directory `/app` (see `entrypoint.sh`: `cd /app`). |
| **Downloaded bundle** | If `checkpoints/active/grpo_adapter/adapter_config.json` is missing at boot, `scripts/install_hf_active_bundle.py` pulls the **HF usable model bundle** into `checkpoints/active/`. |
| **Typical layout after install** | `checkpoints/active/active_model_manifest.json` — which artifact is active (often **GRPO adapter** on top of base). |
| **Weights** | `checkpoints/active/grpo_adapter/` (LoRA/PEFT), optionally `checkpoints/active/merged/` (full merged weights), `checkpoints/active/sft_adapter/`. |
| **Base model name** | Usually **`Qwen/Qwen2.5-0.5B-Instruct`** as the Transformers base for adapters (set via env e.g. `POLYGUARD_HF_MODEL`). |
**What the UI proves:** the **Model Truth** panel calls **`GET /policy/model_status`** (product API). It shows `model_id` / `base_model`, `run_id`, `preferred_artifact` / `loaded_source`, and availability flags. Say on camera: *“This is live from the API, not hard-coded in the frontend.”*
---
## UI map (what appears on screen)
| Region | Purpose |
| --- | --- |
| **Hero** (“PolyGuard neural safety cockpit”) | Marketing copy + quick stats. |
| **Top bar** | **Agent Workbench** vs **Env Explorer**, **Task** dropdown, **Reset Episode**, **Q Tips**. |
| **Status chips** | “Live” / model line; in Env mode one chip reads **ws env** (WebSocket to OpenEnv). |
| **Model Truth** | Qwen / artifact / run / availability. |
| **Advanced strip** | Only if Task = **Advanced** — pick raw `difficulty` + `sub_environment`. |
| **Episode Overview** | Mode, task, difficulty, environment, step budget, last reward, patient id, **Patient Summary**, **Risk Delta**. |
| **Candidate Actions** | Legal moves: `candidate_id`, action type, target/replacement, estimated safety delta (or **Blocked**). |
| **Action Console** | Confidence, rationale, **Submit** vs **Run Agent** (Agent mode only for Run Agent). |
| **Reward Channels** | Bars for total + primary + component scores (see below). |
| **Current Medications** | Cards from observation. |
| **Action History / Warnings** | Step trace and env warnings. |
| **Decision / Explanation / Evidence** | **Agent mode only** (filled after API steps that return those fields). |
| **Event Log** | Human-readable trace of resets, steps, rewards, errors. |
---
## Feature encyclopedia — every panel, branch, and agent
Use this section as a **script appendix** or **judge handout**. It mirrors the React workbench in `app/ui/frontend/src/App.tsx`, the API in `app/api/`, and the orchestrator in `app/agents/orchestrator.py`.
### A. How the Space is wired (under the hood)
| Piece | Role |
| --- | --- |
| **Browser → nginx** | HF Space exposes one origin; nginx routes paths. |
| **Product API** | Vite uses `API_BASE` (default **`/api`**). FastAPI serves catalog, reset, step_candidate, orchestrate, model_status, reward_breakdown, etc. |
| **OpenEnv HTTP/WS** | `ENV_BASE` defaults to **same origin** on Spaces (not localhost). Web UI opens **`ws(s)://<origin>/ws`** for Env Explorer. |
| **Two Python processes** | `entrypoint.sh` starts **uvicorn** for `app.env.fastapi_app` (env, port **8100**) and **uvicorn** for `app.api` (product API, port **8200**). Agent mode reset/step still use the **API’s** in-process `PolyGuardEnv`; Env mode uses the **separate** env service over WebSocket. |
| **Important** | Agent and Env UIs maintain **separate React state** (`agentObservation` vs `envObservation`). Toggling mode **clears the Event Log** and clears the inactive branch’s episode state so you always know which backend path you are exercising. |
### B. Hero (“PolyGuard neural safety cockpit”)
| Stat | Source | What to say on camera |
| --- | --- | --- |
| **Runtime** | `mode === "agent"` → “Agent Workbench”; else “Env Explorer”. | “This is which transport I am using right now.” |
| **Scenario** | Human label for current `taskId` from catalog presets or Advanced. | “Which curriculum preset is bound to difficulty + sub-environment.” |
| **Candidates** | `candidate_action_set.length` from the **active** observation. | “How many legal moves the env is offering after the last reset/step.” |
| **Reward** | Last scalar reward for the active branch (`null` → shown as `-`). | “Verifier scalar after the last step in this mode only.” |
### C. Top bar — every control
| Control | Behavior |
| --- | --- |
| **Agent Workbench** | Sets `mode` to `agent`. Clears env state, event log, error; clears agent panels if switching from env (see `handleModeChange`). |
| **Env Explorer** | Sets `mode` to `env`. Clears agent-specific observation/reward/decision/evidence. |
| **Task** `<select>` | Options: each **task preset** from `GET /env/catalog` (`task_presets`), plus **Advanced**. Changing a preset updates internal `difficulty` + `sub_environment` to match the preset. |
| **Reset Episode** | **Agent:** `POST /env/reset` with body from preset (`{ task_id }`) or `{ difficulty, sub_environment }`. Refreshes **Model Truth** first. Clears reward breakdown, decision, explanation, evidence, sets default candidate. **Env:** WebSocket `reset` with `{ difficulty, sub_environment }` only (no `task_id` in WS path—preset is flattened to those two fields). **Always** clears `events` at start of reset handler, then appends one “Reset … in agent/env” line. |
| **Q Tips** | Opens modal walkthrough; highlights DOM nodes with `[data-guide="…"]`. **Skip** stores `polyguard.qtips.v2.seen` in localStorage so first visit auto-opens tips. |
| **Status chips** | First chip: **Live** if observation loaded and not done, else **Complete** / **Ready**. Second chip: in Agent mode, derived from **`modelSignal()`** (Qwen verified or not); in Env mode shows **`ws env`**. |
### D. Model Truth panel — field by field
Data from **`GET /policy/model_status`** (`PolicyProviderRouter` / `active_model_status`).
| Field in UI | Typical meaning |
| --- | --- |
| **Heading label** | “Qwen 0.5B active” only when Space config matches a strict check (enabled + active + availability + model id regex for **Qwen2.5-0.5B-Instruct**); else “Qwen not verified” or Ollama-specific text if Ollama wins locally. |
| **Detail paragraph** | Human sentence: model name, artifact, `run_id`, optional **load_error**. |
| **Model** | `model_id` or `base_model` — HF id of the loaded or configured base. |
| **Run** | `run_id` from manifest / sweep activation (which training bundle). |
| **Artifact** | `loaded_source` or `preferred_artifact` — e.g. **`grpo_adapter`**, **`merged`**, **`sft_adapter`**. |
| **Availability** | Key/value pairs from `availability` dict (which load stages succeeded). |
**Ollama branch (local dev):** If `status.ollama.enabled && available`, the UI labels **Ollama Qwen active** and mentions `POLYGUARD_PROVIDER_PREFERENCE` order. Spaces Dockerfile sets **`POLYGUARD_ENABLE_OLLAMA=false`** by default.
### E. Advanced strip (Task = Advanced)
Only rendered when `taskId === "advanced"`. Two selects:
1. **Difficulty:** `easy` \| `medium` \| `hard` — passed to reset as `difficulty`.
2. **Environment:** every string in `catalog.sub_environments` (DDI, BANDIT_MINING, REGIMEN_RISK, PRECISION_DOSING, LONGITUDINAL_DEPRESCRIBING, WEB_SEARCH_MISSING_DATA, ALTERNATIVE_SUGGESTION, NEW_DRUG_DECOMPOSITION).
**What each sub-environment stresses (one line each):**
| Sub-environment | What the episode emphasizes |
| --- | --- |
| **DDI** | Drug–drug interaction exposure and pair risk. |
| **BANDIT_MINING** | Policy / bandit exploration style scenario (see preset “Bandit Mining”). |
| **REGIMEN_RISK** | Overall regimen burden and safety tradeoffs. |
| **PRECISION_DOSING** | Dose buckets, organ-sensitive flags in observation. |
| **LONGITUDINAL_DEPRESCRIBING** | Multi-step taper / stop sequences over time. |
| **WEB_SEARCH_MISSING_DATA** | Rewards process fidelity for evidence-fetch actions. |
| **ALTERNATIVE_SUGGESTION** | Substitution / alternative action types rewarded more. |
| **NEW_DRUG_DECOMPOSITION** | Hard track: decompose novel drug string into components. |
### F. Episode Overview — every KPI and subsection
**KPI grid (always eight rows):**
| KPI | Source |
| --- | --- |
| **Mode** | Literal “Agent Workbench” or “Env Explorer”. |
| **Task** | Preset label or “Advanced”. |
| **Difficulty** | `observation.deterministic_contract.difficulty` or `-`. |
| **Environment** | `deterministic_contract.sub_environment` or `observation.sub_environment`. |
| **Step Budget** | `observation.step_budget_remaining`. |
| **Last Reward** | Active branch’s last reward (after reset, Agent clears to `-` until first step). |
| **Patient** | `patient_summary.patient_id` or `patient_summary.id`. |
| **Status** | Complete if `done`, else Live if observation exists, else Ready. |
**Patient Summary `<dl>`:** First **8** keys of `observation.patient_summary` (keys humanized: underscores → spaces, title case). Typical keys include demographics, allergies, high-level clinical flags—whatever the backend puts on `PolyGuardObservation`.
**Risk Delta `<dl>`:** First **8** entries of `observation.burden_score_summary` — burden-related scalars the env uses for reward deltas.
### G. Candidate Actions list — each column
Each row is one **`CandidateAction`** from `candidate_action_set`.
| Column / concept | Meaning |
| --- | --- |
| **`candidate_id`** | Stable id (e.g. `cand_…`) — must match when submitting. |
| **Action label** | Humanized `action_type` (STOP_DRUG, SUBSTITUTE_WITHIN_CLASS, …). |
| **Third column** | `target_drug` **or** `replacement_drug` **or** `mode` — whichever is most informative. |
| **Right column** | `estimated_safety_delta` formatted to 3 decimals, or **Blocked** if `legality_precheck === false`. |
| **Disabled rows** | You cannot select illegal candidates; click does nothing. |
| **Default selection** | **Agent:** first candidate in list. **Env:** first **legal** candidate that is not `KEEP_REGIMEN` and not `REQUEST_*`, else first legal non–KEEP_REGIMEN, else first in list (`defaultCandidateForMode`). |
**Hidden fields you can mention if showing JSON elsewhere:** `dose_bucket`, `taper_days`, `monitoring_plan`, `evidence_query`, `new_drug_name`, `candidate_components`, `uncertainty_score`, `rationale_tags`, `required_monitoring`, `burden_delta`, `disease_stability_estimate`.
### H. Action Console — every input and button
| UI element | Effect |
| --- | --- |
| **Type / Mode / Target / Replacement / Dose / Uncertainty** | Read-only snapshot of the **currently selected** candidate. |
| **Confidence** | Number input **0.001–0.999** step 0.001; sent as `confidence` on **Submit Candidate** (Agent) or embedded in WS payload (Env). |
| **Rationale** | Free text → `rationale_brief` / rationale on the action. |
| **Submit Candidate** (Agent) | Calls `POST /env/step_candidate` with `{ candidate_id, confidence, rationale_brief }`. API finds matching legal action and calls `env.step`. |
| **Submit Env Step** (Env) | Same confidence/rationale + full action payload built by `buildActionPayload` → WS `step`. |
| **Run Agent** | **Only when** `mode === "agent"` **and** observation exists **and** not `done`. Calls `POST /agents/orchestrate` with empty JSON body. **Disabled** in Env mode. |
| **Done notice** | If `done`, shows which mode completed and `termination_reason` from `info` if present. Primary button becomes **Reset Episode** (shortcut). |
### I. Reward Channels — every bar (exact keys)
The UI renders **exactly these keys** in order (`REWARD_KEYS` in `App.tsx` — **14** rows):
| # | Key | Role |
| --- | --- | --- |
| 1 | `total_reward` | Weighted aggregate of component scores (`aggregate_rewards` in `reward_scaling.py`). |
| 2 | `primary_safety_legality` | Roll-up: legality, candidate alignment, anti-cheat, uncertainty calibration (`reward_router.compute_primary_reward_channels`). |
| 3 | `primary_clinical_improvement` | Roll-up: safety delta, burden improvement, disease stability. |
| 4 | `primary_dosing_quality` | Roll-up: dosing quality + abstention quality. |
| 5 | `primary_process_integrity` | Roll-up: format compliance, efficiency, process fidelity, explanation grounding. |
| 6 | `legality_score` | Action legal per safety verifier. |
| 7 | `safety_delta_score` | Movement on severe DDI / risk proxy vs pre-step state. |
| 8 | `burden_improvement_score` | Medication burden before vs after. |
| 9 | `disease_stability_score` | Stability heuristic vs disruptive action types. |
| 10 | `dosing_quality_score` | Dose-mode and bucket appropriateness. |
| 11 | `process_fidelity_score` | Follows intended workflow for sub-environment (e.g. fetch evidence when required). |
| 12 | `explanation_grounding_score` | Rationale present / grounded. |
| 13 | `anti_cheat_score` | Collapses when anti-cheat triggers. |
| 14 | `uncertainty_calibration_score` | Confidence vs uncertainty alignment. |
**Note:** `total_reward` is row 1; rows 2–5 are **primary** channels; rows 6–14 are **exposed component** scores. Other components (`format_compliance_score`, `efficiency_score`, `candidate_alignment_score`, `abstention_quality_score`) still exist **in the backend** `RewardBreakdown` and feed primaries + total, but this UI **does not** give them their own bar rows.
Bars show **`-`** when the value is missing (no step yet or breakdown not returned). Bar width = value × 100% with value clamped to `[0.001, 0.999]`.
**Agent vs breakdown source:** After a step, UI prefers `info.reward_breakdown`; it may also call **`GET /env/reward_breakdown`**. **Env:** uses `info.reward_breakdown` from the WebSocket step packet; if empty, the UI clears the reward panel.
### J. Current Medications cards
Built from `observation.medication_table[]`. Each card:
- **Title:** `drug` / `drug_id` / `name`.
- **High-risk ribbon:** if `high_risk` or `is_high_risk_elderly` or Beers / warning flags.
- **Body:** `indication` or `class_name` or `atc_class`.
- **Meta row:** dose bucket or mg dose; taper vs `monitoring` or `route`.
### K. Action History vs Warnings
| Panel | Source |
| --- | --- |
| **Action History** | `observation.action_history` — each item shows step index and `action_type` / `candidate_id` / reward snippet. |
| **Warnings** | `observation.warning_summary` — list of human-readable env warnings (DDIs, constraints, etc.). |
### L. Decision / Explanation / Evidence (Agent only)
Rendered as JSON `<pre>` blocks:
| Title | When populated | Content origin |
| --- | --- | --- |
| **Decision** | Agent mode only. | **`final_action`** on the packet. For **`step_candidate`**, the API returns the standard **step** payload — **typically no `final_action` field**, so this panel may stay **empty after manual submit**. For **`orchestrate`**, **`final_action`** is the **`PolyGuardAction`** after critic (what actually hit `env.step`). |
| **Explanation** | Agent mode only. | **`explanation`** — output of **`ExplainerAgent`** after the step (`orchestrate` returns it). Usually **empty** after raw `step_candidate` unless API adds it. |
| **Evidence** | Agent mode only. | **`evidence`** key on packet. **`orchestrate`** returns **`evidence_out`** from **`EvidenceAgent.run(state)`** (retrieval / web-fallback bundle). **`step_candidate`** does not attach orchestrator evidence — panel often **empty** on manual clicks. |
**Demo takeaway:** Tell viewers: *“To populate Decision / Explanation / Evidence in the UI, use **Run Agent** (orchestrate). Manual **Submit Candidate** updates the env and rewards but does not replay the full multi-agent JSON into those three panels.”*
### M. Event Log vs Q Tips
| Feature | Behavior |
| --- | --- |
| **Event Log** | Prepends timestamped strings: resets, each step’s reward line, errors. **Capped** at 24 lines. Cleared when you click **Reset Episode** (handler starts with `setEvents([])` then appends) — *not* the same as mode switch clearing. |
| **Q Tips** | 10-step overlay; does not mutate env. |
### N. Orchestrator — every agent in order (`Run Agent`)
When **`POST /agents/orchestrate`** runs, `Orchestrator.run_step` executes:
| Step | Agent class | What it does (operator language) |
| --- | --- | --- |
| 1 | **`MedRecAgent`** | Summarizes current medication list / reconciliation view for downstream modules. Output key: `medrec`. |
| 2 | **`EvidenceAgent`** | Retrieves **local evidence** (and optional web fallback) for missing or thin context. Shown in UI **`evidence`** when orchestrating. |
| 3 | **`GraphSafetyAgent`** | Graph-style **DDI / duplicate therapy** style signals. Output: `graph`. |
| 4 | **`DosingAgent`** | Flags **dose-sensitive** windows and dosing opportunities. Feeds **`dosing_active`** into supervisor. |
| 5 | **`CandidateAgent`** | Wraps env **candidate builder** — produces the legal `CandidateAction` list. |
| 6 | **`SupervisorAgent`** | Chooses planner **mode**: regimen vs dose vs **REVIEW** (conservative routing). |
| 7 | **Contextual bandit** | **`ContextualBanditPolicy`** (LinUCB or Thompson sampling via `POLYGUARD_BANDIT_ALGO`) proposes **top-k** (`POLYGUARD_BANDIT_TOP_K`) candidates for the planner to consider. |
| 8 | **`PlannerAgent`** | Calls **`PolicyProviderRouter.select_candidate`** — this is where **Transformers + Qwen + PEFT** (or Ollama, or **safety ranker fallback**) picks a **`candidate_id`** and rationale. |
| 9 | **`CriticAgent`** | Safety veto / repair. May replace proposed action with a safer **`final_action`**. |
| 10 | **Replan / debate** (optional) | If `coordination_mode` is `replan_on_veto` or `lightweight_debate` and critic rejects, planner may rerun on **review** candidates; `debate_rounds` increments. |
| 11 | **`PolyGuardEnv.step`** | Commits **`final_action`**, returns `observation`, `reward`, `done`, `info`. |
| 12 | **Bandit `update`** | If the chosen candidate was in the bandit pool, **updates bandit statistics with the reward** (learning signal for next orchestrate). |
| 13 | **`ExplainerAgent`** | Builds **`explanation`** object for audit / UI. |
**Environment variables (mention for power users):**
| Variable | Effect |
| --- | --- |
| **`POLYGUARD_POLICY_STACK`** | `llm+bandit` (default): planner sees **bandit-shortlisted** candidates. `llm-only`: all supervisor-filtered candidates. `bandit-only`: **no LLM** — first bandit pick with fixed rationale. |
| **`POLYGUARD_BANDIT_*`** | Algorithm, alpha, epsilon, seed, top-k. |
| **`POLYGUARD_PROVIDER_PREFERENCE`** | e.g. `transformers` vs `ollama` order. |
| **`POLYGUARD_ENABLE_ACTIVE_MODEL`** | Must be true on Space for bundle path; **`POLYGUARD_HF_MODEL`** sets base id for adapters. |
### O. Qwen and fallbacks (planner path)
`PolicyProviderRouter` (`app/models/policy/provider_runtime.py`):
1. Builds a **JSON instruction** listing candidates and asks for `candidate_id=…; rationale=…`.
2. Tries providers in **`POLYGUARD_PROVIDER_PREFERENCE`** (default **Transformers** on Space).
3. Parses model text for a legal `candidate_id`; on failure uses **`safety_ranker`** deterministic ordering.
**So:** Even without Qwen load, **Run Agent** still completes using **ranker / bandit** — mention that if Model Truth is red.
### P. Full observation contract (API / types)
The TypeScript type `EnvObservation` (`lib/types.ts`) lists fields the backend **may** send. The main workbench **highlights** patient summary, medication table, candidates, burden summary, action history, warnings, step budget, and sub-environment. **Not all fields get their own panel** — if you open browser DevTools → Network → `reset` / `step` response, you can narrate extras:
| Field | Typical use |
| --- | --- |
| `comorbidity_summary` | Comorbidity list for the patient. |
| `organ_function_summary` | eGFR / hepatic flags for dosing scenarios. |
| `labs_vitals_summary` | Labs relevant to risk scoring. |
| `graph_safety_summary` | Aggregated graph / DDI context. |
| `precision_dosing_flags` | Tags when sub-environment is dosing-heavy. |
| `unresolved_conflicts` | Specialist conflict strings. |
| `abstention_indicators` | When the env suggests review / abstain. |
| `deterministic_contract` | Difficulty + sub-environment + scenario id contract for reproducibility. |
### Q. Q Tips — copy for each slide (matches `GUIDE_STEPS`)
| # | Title | Body (read aloud or paraphrase) |
| --- | --- | --- |
| 1 | Start here | PolyGuard is an interactive OpenEnv workbench; top bar picks runtime, scenario, reset. |
| 2 | Choose the runtime | Agent Workbench = REST API + reward breakdown + Qwen path; Env Explorer = WebSocket to OpenEnv. |
| 3 | Pick a scenario | Presets load real patient/regimen state from backend. |
| 4 | Check the model truth | `/policy/model_status`; Qwen only “verified” when API says adapters live. |
| 5 | Read the episode state | Task, patient, step budget, reward, risk delta from latest env response. |
| 6 | Review legal actions | Candidate rows = legal moves; inspect safety delta and mode. |
| 7 | Submit or ask the agent | Submit Candidate vs Run Agent; check model panel before claiming LLM. |
| 8 | Inspect reward channels | Real scorer output per channel; empty = no step yet. |
| 9 | Track regimen changes | Medication cards + history + warnings = not canned. |
| 10 | Follow the run | Event log shows resets, steps, rewards, errors plainly. |
---
## Agent Workbench vs Env Explorer (say this exactly on camera)
| | **Agent Workbench** | **Env Explorer** |
| --- | --- | --- |
| **Reset** | `POST /env/reset` with task preset (e.g. `{ "task_id": "easy_screening" }`) via product API. | WebSocket `reset` message to OpenEnv **`/ws`** with `{ difficulty, sub_environment }`. |
| **Submit** | `POST /env/step_candidate` — product API resolves `candidate_id` + your confidence + rationale into a full action and steps the **same** in-process `PolyGuardEnv`. | WebSocket `step` — payload built from selected candidate; talks **directly** to OpenEnv service. |
| **Run Agent** | **`POST /agents/orchestrate`** — runs the full **orchestrator** (med rec, evidence, graph, dosing, candidates, supervisor, bandit, **planner/LLM**, critic, env step, explainer). | Button **disabled** — there is no orchestrator path over raw WS-only mode in this UI. |
| **Decision / Explanation / Evidence panels** | **Populated** after orchestrate or after steps that echo `final_action` / `explanation` / `evidence` (orchestrate returns rich `evidence` from `EvidenceAgent` pipeline). | **Always empty** in the UI by design — those panels are `null` in Env mode (`App.tsx` only passes agent-mode state to DetailPanels). |
| **Reward breakdown** | From step `info.reward_breakdown` or fallback `GET /env/reward_breakdown`. | From WS step packet `info.reward_breakdown` when present. |
| **Switching mode** | Clears the **Event Log** and resets the other mode’s transient state — mention that so viewers don’t think it’s a bug. | Same. |
**One-liner for judges:** *“Agent Workbench is the full product API plus optional LLM-orchestrated policy; Env Explorer is the raw OpenEnv WebSocket contract for the same underlying environment.”*
---
## Reward channels — what they mean and how they’re computed (talk track)
Rewards are **verifier-backed**, **bounded** to roughly **`[0.001, 0.999]`** (3 decimal places in UI).
### Four primary channels (high level)
These are **averages of component groups** (`app/env/reward_router.py` — `compute_primary_reward_channels`):
1. **`primary_safety_legality`** — legality, candidate id alignment, anti-cheat, uncertainty calibration.
2. **`primary_clinical_improvement`** — safety delta vs severe pairs, burden improvement, disease stability.
3. **`primary_dosing_quality`** — dosing quality + abstention (e.g. appropriate review requests under uncertainty).
4. **`primary_process_integrity`** — format compliance, efficiency (step budget), process fidelity, explanation grounding.
### Components (examples — `compute_reward_breakdown`)
The environment builds scores such as:
- **`legality_score`**: high if the action is legal per safety report.
- **`safety_delta_score` / `burden_improvement_score`**: from **before/after** burden and severe DDI pair counts (`_delta_to_reward`).
- **`anti_cheat_score`**: collapses if anti-cheat flags the trajectory.
- **`uncertainty_calibration_score`**: penalizes overconfidence vs modeled uncertainty.
- **Sub-environment tweaks**: e.g. `WEB_SEARCH_MISSING_DATA` boosts process fidelity when using `FETCH_EXTERNAL_EVIDENCE`; `NEW_DRUG_DECOMPOSITION` rewards decomposition actions with components.
Then components are **scaled/clamped**, **primary channels** recomputed, and **`total_reward`** = weighted aggregate (`aggregate_rewards`).
**Demo line:** *“Bars update only after a real step — empty fields mean we haven’t stepped yet, not fake filler.”*
---
## Built-in **Q Tips** (on-screen tour)
Click **Q Tips** in the top bar. The app cycles **10 slides** (`App.tsx` → `GUIDE_STEPS`):
1. Start here — top bar, scenarios, reset.
2. Choose the runtime — Agent vs Env.
3. Pick a scenario — presets load real patient/regimen state.
4. Check the model truth — `/policy/model_status`.
5. Read episode state — overview + patient summary.
6. Review legal actions — candidates.
7. Submit or ask the agent — Submit vs Run Agent.
8. Inspect reward channels.
9. Medications + history/warnings.
10. Event log — errors and connectivity.
**Recording tip:** Record **Q Tips** once in full voiceover (“I’ll use the in-app tour…”) then dismiss and do the live walkthrough below.
---
## Shot-by-shot recording script
### Scene 0 — Intro (30–45 s)
**Action:** Scroll slightly so hero + top bar are visible.
**Say:** *“This is PolyGuard on Hugging Face Spaces: an OpenEnv workbench for polypharmacy safety. The backend runs a real `PolyGuardEnv` with verifiable rewards; the UI can drive it through the product API or raw OpenEnv WebSockets.”*
---
### Scene 1 — Model Truth (45–60 s)
**Action:** Stay on **Agent Workbench**. Click nothing yet; point at **Model Truth**.
**Say:** *“Model Truth is live from `/policy/model_status`. Here we see the base model—typically Qwen 2.5 0.5B Instruct—which artifact is loaded—often the GRPO adapter—and the run id. On Spaces, weights are under `/app/checkpoints/active` after the bundle installer runs.”*
**If panel shows unavailable:** *“Cold start or CPU load can delay the bundle; the environment still works for manual candidate submission; Run Agent may fall back to non-LLM routing depending on config.”*
---
### Scene 2 — Easy task, manual submit (Agent) (90–120 s)
**Action:** Task → **Easy Screening** (DDI, easy). **Reset Episode.**
**Say:** *“Easy Screening fixes difficulty easy and sub-environment DDI—drug–drug interaction screening.”*
**Action:** Pan **Episode Overview** — read **Patient Summary** and **Risk Delta** aloud briefly.
**Say:** *“This patient block and risk delta come straight from the observation object.”*
**Action:** **Candidate Actions** — click 2–3 rows; show **Blocked** vs legal. Select a **legal** row.
**Say:** *“Candidates are legal moves from the env; illegal rows are disabled.”*
**Action:** **Action Console** — tweak **Confidence** and **Rationale** slightly. Click **Submit Candidate**.
**Say:** *“Submit Candidate hits `/env/step_candidate` with my chosen legal action, confidence, and rationale.”*
**Action:** After response, pause on **Reward Channels** and **Last Reward** in overview.
**Say:** *“These bars are the verifier breakdown; total reward is the scalar GRPO-style signal we train on.”*
**Action:** **Action History** — show one new line. **Event Log** — show the new reward line.
**Say:** *“History and event log give an audit trail—not a canned animation.”*
---
### Scene 3 — Run Agent (orchestrator + LLM path) (90–120 s)
**Prerequisite:** Prefer recording when Model Truth shows **enabled** and **active** with Qwen artifacts.
**Action:** **Reset Episode** again (same or different task). Click **Run Agent**. Wait for completion.
**Say:** *“Run Agent calls `/agents/orchestrate`. That runs med reconciliation, evidence retrieval, graph safety, dosing hints, candidate generation, supervisor mode, a contextual bandit shortlist, then the planner—here that’s where the loaded Qwen policy can choose among candidates—the critic veto, environment step, and explainer.”*
**Action:** Scroll to **Decision**, **Explanation**, **Evidence** JSON panels.
**Say:** *“These three panels are only populated in Agent Workbench mode. Env Explorer deliberately hides them because the raw WebSocket client doesn’t run the full orchestrator response bundle.”*
**Action:** Point at **Evidence** — mention structured retriever output vs empty object if task didn’t fetch.
**Say:** *“Evidence is whatever the evidence agent produced for this state—grounding for clinician trust.”*
---
### Scene 4 — Env Explorer contrast (60–90 s)
**Action:** Click **Env Explorer**. **Reset Episode** (same task: Easy Screening).
**Say:** *“Now the UI resets over WebSocket `reset` to the OpenEnv service on port 8100—same scenarios, different transport.”*
**Action:** Select a candidate, **Submit Env Step**.
**Say:** *“Submit Env Step sends a WebSocket `step` with the action payload—no `/agents/orchestrate`.”*
**Action:** Scroll to **Decision / Explanation / Evidence** — show they stay **empty** or “No data.”
**Say:** *“This is intentional: I’m proving the low-level env API, not the full agent stack.”*
**Action:** **Event Log** — note new lines tagged from env step.
---
### Scene 5 — Task variety (2–3 minutes, optional montage)
For each preset, do **Reset** + **one** legal **Submit** (Agent mode is enough):
| Task | Difficulty | Sub-environment | What to say |
| --- | --- | --- | --- |
| **Easy Screening** | easy | DDI | “Fast DDI-focused episode.” |
| **Budgeted Screening** | medium | REGIMEN_RISK | “More steps, regimen-risk tradeoffs.” |
| **Complex Tradeoff** | hard | REGIMEN_RISK | “Harder patient draw, tighter budgets.” |
| **Bandit Mining** | hard | BANDIT_MINING | “Bandit-style policy mining scenario.” |
**Action:** Switch Task to **Advanced**. Set e.g. **hard** + **PRECISION_DOSING**. Reset.
**Say:** *“Advanced exposes every sub-environment enum the backend supports—precision dosing, deprescribing, web-search missing data, alternatives, new-drug decomposition.”*
---
### Scene 6 — Medications + warnings (45 s)
**Action:** After any step with regimen change, show **Current Medications** cards (high-risk styling).
**Say:** *“Cards mirror `medication_table` from the observation; warnings list is explicit env output.”*
---
### Scene 7 — Closing (30 s)
**Say:** *“That’s the full loop: HF Space hosts OpenEnv + API, Qwen adapters live under checkpoints/active, Agent Workbench demonstrates orchestrated LLM decisions with evidence and explanations, and Env Explorer proves the same environment over raw WebSockets for OpenEnv compatibility.”*
---
## OBS / QuickTime checklist
- [ ] Capture **system audio** if you add voiceover in post; or record mic in OBS.
- [ ] **1920×1080**, 30 fps (or 60 if you want smooth cursor).
- [ ] **2 s pause** after each button click before scrolling away.
- [ ] If Space sleeps, **mouse jiggle** or refresh before recording.
- [ ] Export **MP4 H.264** for YouTube / HF dataset card.
---
## Quick troubleshooting on camera (if something breaks)
| Symptom | What to say / do |
| --- | --- |
| WebSocket errors in Event Log | “Env service reconnect—refresh page; WS URL is derived from the Space origin.” |
| Run Agent fails | “Check Model Truth—model may still be downloading or Ollama disabled on Space.” |
| Reward bars all dash | “No step yet—reset and submit once.” |
| Candidates empty | “Reset episode—env didn’t initialize.” |
---
## Related docs
- [UI overview](ui.md)
- [Deployment](deployment.md)
- [Environment design](environment_design.md)
- [Reward design](reward_design.md)
- [Architecture](architecture.md)
|