Spaces:

TheJackBright
/

polyguard-openenv-workbench

Sleeping

App Files Files Community

polyguard-openenv-workbench / polyguard-rl /docs /DEMO_RECORDING_SCRIPT.md

TheJackBright

Deploy GitHub root master to Space

c296d62 11 days ago

preview code

raw

history blame contribute delete

33.7 kB

PolyGuard Space UI — demo recording script (shot-by-shot)

Use this document while screen-recording the Hugging Face Space (or local Docker). Target length: 8–14 minutes for a full pass, or 3–5 minutes for a highlights reel.

Before you hit record

Open the Space in a clean browser profile or incognito (fewer extensions → fewer glitches).
Set resolution: 1920×1080 or 1440×900; browser zoom 100%.
Fullscreen the Space iframe or use HF “Open in new tab” so the URL bar shows the Space domain.
Wait for cold start: first load may download the model bundle (several minutes). The Event Log and Model Truth panel will tell you if the policy failed to load (heuristic fallback is still usable for env steps).
Optional: hide mouse cursor in OBS if you prefer; otherwise move slowly and pause 2 seconds on each panel after major clicks.

Primary Space (product): https://huggingface.co/spaces/TheJackBright/polyguard-openenv-workbench Runtime: nginx fronts the product API (default 8200) and OpenEnv service (8100); see docker/space/entrypoint.sh.

Where the model lives (Qwen and artifacts)

This matters for what you say on camera.

Location	What it is
On the Space container	Working directory `/app` (see `entrypoint.sh`: `cd /app`).
Downloaded bundle	If `checkpoints/active/grpo_adapter/adapter_config.json` is missing at boot, `scripts/install_hf_active_bundle.py` pulls the HF usable model bundle into `checkpoints/active/`.
Typical layout after install	`checkpoints/active/active_model_manifest.json` — which artifact is active (often GRPO adapter on top of base).
Weights	`checkpoints/active/grpo_adapter/` (LoRA/PEFT), optionally `checkpoints/active/merged/` (full merged weights), `checkpoints/active/sft_adapter/`.
Base model name	Usually `Qwen/Qwen2.5-0.5B-Instruct` as the Transformers base for adapters (set via env e.g. `POLYGUARD_HF_MODEL`).

What the UI proves: the Model Truth panel calls GET /policy/model_status (product API). It shows model_id / base_model, run_id, preferred_artifact / loaded_source, and availability flags. Say on camera: “This is live from the API, not hard-coded in the frontend.”

UI map (what appears on screen)

Region	Purpose
Hero (“PolyGuard neural safety cockpit”)	Marketing copy + quick stats.
Top bar	Agent Workbench vs Env Explorer, Task dropdown, Reset Episode, Q Tips.
Status chips	“Live” / model line; in Env mode one chip reads ws env (WebSocket to OpenEnv).
Model Truth	Qwen / artifact / run / availability.
Advanced strip	Only if Task = Advanced — pick raw `difficulty` + `sub_environment`.
Episode Overview	Mode, task, difficulty, environment, step budget, last reward, patient id, Patient Summary, Risk Delta.
Candidate Actions	Legal moves: `candidate_id`, action type, target/replacement, estimated safety delta (or Blocked).
Action Console	Confidence, rationale, Submit vs Run Agent (Agent mode only for Run Agent).
Reward Channels	Bars for total + primary + component scores (see below).
Current Medications	Cards from observation.
Action History / Warnings	Step trace and env warnings.
Decision / Explanation / Evidence	Agent mode only (filled after API steps that return those fields).
Event Log	Human-readable trace of resets, steps, rewards, errors.

Feature encyclopedia — every panel, branch, and agent

Use this section as a script appendix or judge handout. It mirrors the React workbench in app/ui/frontend/src/App.tsx, the API in app/api/, and the orchestrator in app/agents/orchestrator.py.

A. How the Space is wired (under the hood)

Piece	Role
Browser → nginx	HF Space exposes one origin; nginx routes paths.
Product API	Vite uses `API_BASE` (default `/api`). FastAPI serves catalog, reset, step_candidate, orchestrate, model_status, reward_breakdown, etc.
OpenEnv HTTP/WS	`ENV_BASE` defaults to same origin on Spaces (not localhost). Web UI opens `ws(s)://<origin>/ws` for Env Explorer.
Two Python processes	`entrypoint.sh` starts uvicorn for `app.env.fastapi_app` (env, port 8100) and uvicorn for `app.api` (product API, port 8200). Agent mode reset/step still use the API’s in-process `PolyGuardEnv`; Env mode uses the separate env service over WebSocket.
Important	Agent and Env UIs maintain separate React state (`agentObservation` vs `envObservation`). Toggling mode clears the Event Log and clears the inactive branch’s episode state so you always know which backend path you are exercising.

B. Hero (“PolyGuard neural safety cockpit”)

Stat	Source	What to say on camera
Runtime	`mode === "agent"` → “Agent Workbench”; else “Env Explorer”.	“This is which transport I am using right now.”
Scenario	Human label for current `taskId` from catalog presets or Advanced.	“Which curriculum preset is bound to difficulty + sub-environment.”
Candidates	`candidate_action_set.length` from the active observation.	“How many legal moves the env is offering after the last reset/step.”
Reward	Last scalar reward for the active branch (`null` → shown as `-`).	“Verifier scalar after the last step in this mode only.”

C. Top bar — every control

Control	Behavior
Agent Workbench	Sets `mode` to `agent`. Clears env state, event log, error; clears agent panels if switching from env (see `handleModeChange`).
Env Explorer	Sets `mode` to `env`. Clears agent-specific observation/reward/decision/evidence.
Task `<select>`	Options: each task preset from `GET /env/catalog` (`task_presets`), plus Advanced. Changing a preset updates internal `difficulty` + `sub_environment` to match the preset.
Reset Episode	Agent: `POST /env/reset` with body from preset (`{ task_id }`) or `{ difficulty, sub_environment }`. Refreshes Model Truth first. Clears reward breakdown, decision, explanation, evidence, sets default candidate. Env: WebSocket `reset` with `{ difficulty, sub_environment }` only (no `task_id` in WS path—preset is flattened to those two fields). Always clears `events` at start of reset handler, then appends one “Reset … in agent/env” line.
Q Tips	Opens modal walkthrough; highlights DOM nodes with `[data-guide="…"]`. Skip stores `polyguard.qtips.v2.seen` in localStorage so first visit auto-opens tips.
Status chips	First chip: Live if observation loaded and not done, else Complete / Ready. Second chip: in Agent mode, derived from `modelSignal()` (Qwen verified or not); in Env mode shows `ws env`.

D. Model Truth panel — field by field

Data from GET /policy/model_status (PolicyProviderRouter / active_model_status).

Field in UI	Typical meaning
Heading label	“Qwen 0.5B active” only when Space config matches a strict check (enabled + active + availability + model id regex for Qwen2.5-0.5B-Instruct); else “Qwen not verified” or Ollama-specific text if Ollama wins locally.
Detail paragraph	Human sentence: model name, artifact, `run_id`, optional load_error.
Model	`model_id` or `base_model` — HF id of the loaded or configured base.
Run	`run_id` from manifest / sweep activation (which training bundle).
Artifact	`loaded_source` or `preferred_artifact` — e.g. `grpo_adapter`, `merged`, `sft_adapter`.
Availability	Key/value pairs from `availability` dict (which load stages succeeded).

Ollama branch (local dev): If status.ollama.enabled && available, the UI labels Ollama Qwen active and mentions POLYGUARD_PROVIDER_PREFERENCE order. Spaces Dockerfile sets POLYGUARD_ENABLE_OLLAMA=false by default.

E. Advanced strip (Task = Advanced)

Only rendered when taskId === "advanced". Two selects:

Difficulty: easy | medium | hard — passed to reset as difficulty.
Environment: every string in catalog.sub_environments (DDI, BANDIT_MINING, REGIMEN_RISK, PRECISION_DOSING, LONGITUDINAL_DEPRESCRIBING, WEB_SEARCH_MISSING_DATA, ALTERNATIVE_SUGGESTION, NEW_DRUG_DECOMPOSITION).

What each sub-environment stresses (one line each):

Sub-environment	What the episode emphasizes
DDI	Drug–drug interaction exposure and pair risk.
BANDIT_MINING	Policy / bandit exploration style scenario (see preset “Bandit Mining”).
REGIMEN_RISK	Overall regimen burden and safety tradeoffs.
PRECISION_DOSING	Dose buckets, organ-sensitive flags in observation.
LONGITUDINAL_DEPRESCRIBING	Multi-step taper / stop sequences over time.
WEB_SEARCH_MISSING_DATA	Rewards process fidelity for evidence-fetch actions.
ALTERNATIVE_SUGGESTION	Substitution / alternative action types rewarded more.
NEW_DRUG_DECOMPOSITION	Hard track: decompose novel drug string into components.

F. Episode Overview — every KPI and subsection

KPI grid (always eight rows):

KPI	Source
Mode	Literal “Agent Workbench” or “Env Explorer”.
Task	Preset label or “Advanced”.
Difficulty	`observation.deterministic_contract.difficulty` or `-`.
Environment	`deterministic_contract.sub_environment` or `observation.sub_environment`.
Step Budget	`observation.step_budget_remaining`.
Last Reward	Active branch’s last reward (after reset, Agent clears to `-` until first step).
Patient	`patient_summary.patient_id` or `patient_summary.id`.
Status	Complete if `done`, else Live if observation exists, else Ready.

Patient Summary <dl>: First 8 keys of observation.patient_summary (keys humanized: underscores → spaces, title case). Typical keys include demographics, allergies, high-level clinical flags—whatever the backend puts on PolyGuardObservation.

Risk Delta <dl>: First 8 entries of observation.burden_score_summary — burden-related scalars the env uses for reward deltas.

G. Candidate Actions list — each column

Each row is one CandidateAction from candidate_action_set.

Column / concept	Meaning
`candidate_id`	Stable id (e.g. `cand_…`) — must match when submitting.
Action label	Humanized `action_type` (STOP_DRUG, SUBSTITUTE_WITHIN_CLASS, …).
Third column	`target_drug` or `replacement_drug` or `mode` — whichever is most informative.
Right column	`estimated_safety_delta` formatted to 3 decimals, or Blocked if `legality_precheck === false`.
Disabled rows	You cannot select illegal candidates; click does nothing.
Default selection	Agent: first candidate in list. Env: first legal candidate that is not `KEEP_REGIMEN` and not `REQUEST_*`, else first legal non–KEEP_REGIMEN, else first in list (`defaultCandidateForMode`).

Hidden fields you can mention if showing JSON elsewhere: dose_bucket, taper_days, monitoring_plan, evidence_query, new_drug_name, candidate_components, uncertainty_score, rationale_tags, required_monitoring, burden_delta, disease_stability_estimate.

H. Action Console — every input and button

UI element	Effect
Type / Mode / Target / Replacement / Dose / Uncertainty	Read-only snapshot of the currently selected candidate.
Confidence	Number input 0.001–0.999 step 0.001; sent as `confidence` on Submit Candidate (Agent) or embedded in WS payload (Env).
Rationale	Free text → `rationale_brief` / rationale on the action.
Submit Candidate (Agent)	Calls `POST /env/step_candidate` with `{ candidate_id, confidence, rationale_brief }`. API finds matching legal action and calls `env.step`.
Submit Env Step (Env)	Same confidence/rationale + full action payload built by `buildActionPayload` → WS `step`.
Run Agent	Only when `mode === "agent"` and observation exists and not `done`. Calls `POST /agents/orchestrate` with empty JSON body. Disabled in Env mode.
Done notice	If `done`, shows which mode completed and `termination_reason` from `info` if present. Primary button becomes Reset Episode (shortcut).

I. Reward Channels — every bar (exact keys)

The UI renders exactly these keys in order (REWARD_KEYS in App.tsx — 14 rows):

#	Key	Role
1	`total_reward`	Weighted aggregate of component scores (`aggregate_rewards` in `reward_scaling.py`).
2	`primary_safety_legality`	Roll-up: legality, candidate alignment, anti-cheat, uncertainty calibration (`reward_router.compute_primary_reward_channels`).
3	`primary_clinical_improvement`	Roll-up: safety delta, burden improvement, disease stability.
4	`primary_dosing_quality`	Roll-up: dosing quality + abstention quality.
5	`primary_process_integrity`	Roll-up: format compliance, efficiency, process fidelity, explanation grounding.
6	`legality_score`	Action legal per safety verifier.
7	`safety_delta_score`	Movement on severe DDI / risk proxy vs pre-step state.
8	`burden_improvement_score`	Medication burden before vs after.
9	`disease_stability_score`	Stability heuristic vs disruptive action types.
10	`dosing_quality_score`	Dose-mode and bucket appropriateness.
11	`process_fidelity_score`	Follows intended workflow for sub-environment (e.g. fetch evidence when required).
12	`explanation_grounding_score`	Rationale present / grounded.
13	`anti_cheat_score`	Collapses when anti-cheat triggers.
14	`uncertainty_calibration_score`	Confidence vs uncertainty alignment.

Note: total_reward is row 1; rows 2–5 are primary channels; rows 6–14 are exposed component scores. Other components (format_compliance_score, efficiency_score, candidate_alignment_score, abstention_quality_score) still exist in the backend RewardBreakdown and feed primaries + total, but this UI does not give them their own bar rows.

Bars show - when the value is missing (no step yet or breakdown not returned). Bar width = value × 100% with value clamped to [0.001, 0.999].

Agent vs breakdown source: After a step, UI prefers info.reward_breakdown; it may also call GET /env/reward_breakdown. Env: uses info.reward_breakdown from the WebSocket step packet; if empty, the UI clears the reward panel.

J. Current Medications cards

Built from observation.medication_table[]. Each card:

Title: drug / drug_id / name.
High-risk ribbon: if high_risk or is_high_risk_elderly or Beers / warning flags.
Body: indication or class_name or atc_class.
Meta row: dose bucket or mg dose; taper vs monitoring or route.

K. Action History vs Warnings

Panel	Source
Action History	`observation.action_history` — each item shows step index and `action_type` / `candidate_id` / reward snippet.
Warnings	`observation.warning_summary` — list of human-readable env warnings (DDIs, constraints, etc.).

L. Decision / Explanation / Evidence (Agent only)

Rendered as JSON <pre> blocks:

Title	When populated	Content origin
Decision	Agent mode only.	`final_action` on the packet. For `step_candidate`, the API returns the standard step payload — typically no `final_action` field, so this panel may stay empty after manual submit. For `orchestrate`, `final_action` is the `PolyGuardAction` after critic (what actually hit `env.step`).
Explanation	Agent mode only.	`explanation` — output of `ExplainerAgent` after the step (`orchestrate` returns it). Usually empty after raw `step_candidate` unless API adds it.
Evidence	Agent mode only.	`evidence` key on packet. `orchestrate` returns `evidence_out` from `EvidenceAgent.run(state)` (retrieval / web-fallback bundle). `step_candidate` does not attach orchestrator evidence — panel often empty on manual clicks.

Demo takeaway: Tell viewers: “To populate Decision / Explanation / Evidence in the UI, use Run Agent (orchestrate). Manual Submit Candidate updates the env and rewards but does not replay the full multi-agent JSON into those three panels.”

M. Event Log vs Q Tips

Feature	Behavior
Event Log	Prepends timestamped strings: resets, each step’s reward line, errors. Capped at 24 lines. Cleared when you click Reset Episode (handler starts with `setEvents([])` then appends) — not the same as mode switch clearing.
Q Tips	10-step overlay; does not mutate env.

N. Orchestrator — every agent in order (`Run Agent`)

When POST /agents/orchestrate runs, Orchestrator.run_step executes:

Step	Agent class	What it does (operator language)
1	`MedRecAgent`	Summarizes current medication list / reconciliation view for downstream modules. Output key: `medrec`.
2	`EvidenceAgent`	Retrieves local evidence (and optional web fallback) for missing or thin context. Shown in UI `evidence` when orchestrating.
3	`GraphSafetyAgent`	Graph-style DDI / duplicate therapy style signals. Output: `graph`.
4	`DosingAgent`	Flags dose-sensitive windows and dosing opportunities. Feeds `dosing_active` into supervisor.
5	`CandidateAgent`	Wraps env candidate builder — produces the legal `CandidateAction` list.
6	`SupervisorAgent`	Chooses planner mode: regimen vs dose vs REVIEW (conservative routing).
7	Contextual bandit	`ContextualBanditPolicy` (LinUCB or Thompson sampling via `POLYGUARD_BANDIT_ALGO`) proposes top-k (`POLYGUARD_BANDIT_TOP_K`) candidates for the planner to consider.
8	`PlannerAgent`	Calls `PolicyProviderRouter.select_candidate` — this is where Transformers + Qwen + PEFT (or Ollama, or safety ranker fallback) picks a `candidate_id` and rationale.
9	`CriticAgent`	Safety veto / repair. May replace proposed action with a safer `final_action`.
10	Replan / debate (optional)	If `coordination_mode` is `replan_on_veto` or `lightweight_debate` and critic rejects, planner may rerun on review candidates; `debate_rounds` increments.
11	`PolyGuardEnv.step`	Commits `final_action`, returns `observation`, `reward`, `done`, `info`.
12	Bandit `update`	If the chosen candidate was in the bandit pool, updates bandit statistics with the reward (learning signal for next orchestrate).
13	`ExplainerAgent`	Builds `explanation` object for audit / UI.

Environment variables (mention for power users):

Variable	Effect
`POLYGUARD_POLICY_STACK`	`llm+bandit` (default): planner sees bandit-shortlisted candidates. `llm-only`: all supervisor-filtered candidates. `bandit-only`: no LLM — first bandit pick with fixed rationale.
*`POLYGUARD_BANDIT_`**	Algorithm, alpha, epsilon, seed, top-k.
`POLYGUARD_PROVIDER_PREFERENCE`	e.g. `transformers` vs `ollama` order.
`POLYGUARD_ENABLE_ACTIVE_MODEL`	Must be true on Space for bundle path; `POLYGUARD_HF_MODEL` sets base id for adapters.

O. Qwen and fallbacks (planner path)

PolicyProviderRouter (app/models/policy/provider_runtime.py):

Builds a JSON instruction listing candidates and asks for candidate_id=…; rationale=….
Tries providers in POLYGUARD_PROVIDER_PREFERENCE (default Transformers on Space).
Parses model text for a legal candidate_id; on failure uses safety_ranker deterministic ordering.

So: Even without Qwen load, Run Agent still completes using ranker / bandit — mention that if Model Truth is red.

P. Full observation contract (API / types)

The TypeScript type EnvObservation (lib/types.ts) lists fields the backend may send. The main workbench highlights patient summary, medication table, candidates, burden summary, action history, warnings, step budget, and sub-environment. Not all fields get their own panel — if you open browser DevTools → Network → reset / step response, you can narrate extras:

Field	Typical use
`comorbidity_summary`	Comorbidity list for the patient.
`organ_function_summary`	eGFR / hepatic flags for dosing scenarios.
`labs_vitals_summary`	Labs relevant to risk scoring.
`graph_safety_summary`	Aggregated graph / DDI context.
`precision_dosing_flags`	Tags when sub-environment is dosing-heavy.
`unresolved_conflicts`	Specialist conflict strings.
`abstention_indicators`	When the env suggests review / abstain.
`deterministic_contract`	Difficulty + sub-environment + scenario id contract for reproducibility.

Q. Q Tips — copy for each slide (matches `GUIDE_STEPS`)

#	Title	Body (read aloud or paraphrase)
1	Start here	PolyGuard is an interactive OpenEnv workbench; top bar picks runtime, scenario, reset.
2	Choose the runtime	Agent Workbench = REST API + reward breakdown + Qwen path; Env Explorer = WebSocket to OpenEnv.
3	Pick a scenario	Presets load real patient/regimen state from backend.
4	Check the model truth	`/policy/model_status`; Qwen only “verified” when API says adapters live.
5	Read the episode state	Task, patient, step budget, reward, risk delta from latest env response.
6	Review legal actions	Candidate rows = legal moves; inspect safety delta and mode.
7	Submit or ask the agent	Submit Candidate vs Run Agent; check model panel before claiming LLM.
8	Inspect reward channels	Real scorer output per channel; empty = no step yet.
9	Track regimen changes	Medication cards + history + warnings = not canned.
10	Follow the run	Event log shows resets, steps, rewards, errors plainly.

Agent Workbench vs Env Explorer (say this exactly on camera)

	Agent Workbench	Env Explorer
Reset	`POST /env/reset` with task preset (e.g. `{ "task_id": "easy_screening" }`) via product API.	WebSocket `reset` message to OpenEnv `/ws` with `{ difficulty, sub_environment }`.
Submit	`POST /env/step_candidate` — product API resolves `candidate_id` + your confidence + rationale into a full action and steps the same in-process `PolyGuardEnv`.	WebSocket `step` — payload built from selected candidate; talks directly to OpenEnv service.
Run Agent	`POST /agents/orchestrate` — runs the full orchestrator (med rec, evidence, graph, dosing, candidates, supervisor, bandit, planner/LLM, critic, env step, explainer).	Button disabled — there is no orchestrator path over raw WS-only mode in this UI.
Decision / Explanation / Evidence panels	Populated after orchestrate or after steps that echo `final_action` / `explanation` / `evidence` (orchestrate returns rich `evidence` from `EvidenceAgent` pipeline).	Always empty in the UI by design — those panels are `null` in Env mode (`App.tsx` only passes agent-mode state to DetailPanels).
Reward breakdown	From step `info.reward_breakdown` or fallback `GET /env/reward_breakdown`.	From WS step packet `info.reward_breakdown` when present.
Switching mode	Clears the Event Log and resets the other mode’s transient state — mention that so viewers don’t think it’s a bug.	Same.

One-liner for judges: “Agent Workbench is the full product API plus optional LLM-orchestrated policy; Env Explorer is the raw OpenEnv WebSocket contract for the same underlying environment.”

Reward channels — what they mean and how they’re computed (talk track)

Rewards are verifier-backed, bounded to roughly [0.001, 0.999] (3 decimal places in UI).

Four primary channels (high level)

These are averages of component groups (app/env/reward_router.py — compute_primary_reward_channels):

primary_safety_legality — legality, candidate id alignment, anti-cheat, uncertainty calibration.
primary_clinical_improvement — safety delta vs severe pairs, burden improvement, disease stability.
primary_dosing_quality — dosing quality + abstention (e.g. appropriate review requests under uncertainty).
primary_process_integrity — format compliance, efficiency (step budget), process fidelity, explanation grounding.

Components (examples — `compute_reward_breakdown`)

The environment builds scores such as:

legality_score: high if the action is legal per safety report.
safety_delta_score / burden_improvement_score: from before/after burden and severe DDI pair counts (_delta_to_reward).
anti_cheat_score: collapses if anti-cheat flags the trajectory.
uncertainty_calibration_score: penalizes overconfidence vs modeled uncertainty.
Sub-environment tweaks: e.g. WEB_SEARCH_MISSING_DATA boosts process fidelity when using FETCH_EXTERNAL_EVIDENCE; NEW_DRUG_DECOMPOSITION rewards decomposition actions with components.

Then components are scaled/clamped, primary channels recomputed, and total_reward = weighted aggregate (aggregate_rewards).

Demo line: “Bars update only after a real step — empty fields mean we haven’t stepped yet, not fake filler.”

Built-in Q Tips (on-screen tour)

Click Q Tips in the top bar. The app cycles 10 slides (App.tsx → GUIDE_STEPS):

Start here — top bar, scenarios, reset.
Choose the runtime — Agent vs Env.
Pick a scenario — presets load real patient/regimen state.
Check the model truth — /policy/model_status.
Read episode state — overview + patient summary.
Review legal actions — candidates.
Submit or ask the agent — Submit vs Run Agent.
Inspect reward channels.
Medications + history/warnings.
Event log — errors and connectivity.

Recording tip: Record Q Tips once in full voiceover (“I’ll use the in-app tour…”) then dismiss and do the live walkthrough below.

Shot-by-shot recording script

Scene 0 — Intro (30–45 s)

Action: Scroll slightly so hero + top bar are visible.
Say: “This is PolyGuard on Hugging Face Spaces: an OpenEnv workbench for polypharmacy safety. The backend runs a real PolyGuardEnv with verifiable rewards; the UI can drive it through the product API or raw OpenEnv WebSockets.”

Scene 1 — Model Truth (45–60 s)

Action: Stay on Agent Workbench. Click nothing yet; point at Model Truth.
Say: “Model Truth is live from /policy/model_status. Here we see the base model—typically Qwen 2.5 0.5B Instruct—which artifact is loaded—often the GRPO adapter—and the run id. On Spaces, weights are under /app/checkpoints/active after the bundle installer runs.”

If panel shows unavailable: “Cold start or CPU load can delay the bundle; the environment still works for manual candidate submission; Run Agent may fall back to non-LLM routing depending on config.”

Scene 2 — Easy task, manual submit (Agent) (90–120 s)

Action: Task → Easy Screening (DDI, easy). Reset Episode.
Say: “Easy Screening fixes difficulty easy and sub-environment DDI—drug–drug interaction screening.”

Action: Pan Episode Overview — read Patient Summary and Risk Delta aloud briefly.
Say: “This patient block and risk delta come straight from the observation object.”

Action: Candidate Actions — click 2–3 rows; show Blocked vs legal. Select a legal row.
Say: “Candidates are legal moves from the env; illegal rows are disabled.”

Action: Action Console — tweak Confidence and Rationale slightly. Click Submit Candidate.
Say: “Submit Candidate hits /env/step_candidate with my chosen legal action, confidence, and rationale.”

Action: After response, pause on Reward Channels and Last Reward in overview.
Say: “These bars are the verifier breakdown; total reward is the scalar GRPO-style signal we train on.”

Action: Action History — show one new line. Event Log — show the new reward line.
Say: “History and event log give an audit trail—not a canned animation.”

Scene 3 — Run Agent (orchestrator + LLM path) (90–120 s)

Prerequisite: Prefer recording when Model Truth shows enabled and active with Qwen artifacts.

Action: Reset Episode again (same or different task). Click Run Agent. Wait for completion.
Say: “Run Agent calls /agents/orchestrate. That runs med reconciliation, evidence retrieval, graph safety, dosing hints, candidate generation, supervisor mode, a contextual bandit shortlist, then the planner—here that’s where the loaded Qwen policy can choose among candidates—the critic veto, environment step, and explainer.”

Action: Scroll to Decision, Explanation, Evidence JSON panels.
Say: “These three panels are only populated in Agent Workbench mode. Env Explorer deliberately hides them because the raw WebSocket client doesn’t run the full orchestrator response bundle.”

Action: Point at Evidence — mention structured retriever output vs empty object if task didn’t fetch.
Say: “Evidence is whatever the evidence agent produced for this state—grounding for clinician trust.”

Scene 4 — Env Explorer contrast (60–90 s)

Action: Click Env Explorer. Reset Episode (same task: Easy Screening).
Say: “Now the UI resets over WebSocket reset to the OpenEnv service on port 8100—same scenarios, different transport.”

Action: Select a candidate, Submit Env Step.
Say: “Submit Env Step sends a WebSocket step with the action payload—no /agents/orchestrate.”

Action: Scroll to Decision / Explanation / Evidence — show they stay empty or “No data.”
Say: “This is intentional: I’m proving the low-level env API, not the full agent stack.”

Action: Event Log — note new lines tagged from env step.

Scene 5 — Task variety (2–3 minutes, optional montage)

For each preset, do Reset + one legal Submit (Agent mode is enough):

Task	Difficulty	Sub-environment	What to say
Easy Screening	easy	DDI	“Fast DDI-focused episode.”
Budgeted Screening	medium	REGIMEN_RISK	“More steps, regimen-risk tradeoffs.”
Complex Tradeoff	hard	REGIMEN_RISK	“Harder patient draw, tighter budgets.”
Bandit Mining	hard	BANDIT_MINING	“Bandit-style policy mining scenario.”

Action: Switch Task to Advanced. Set e.g. hard + PRECISION_DOSING. Reset.
Say: “Advanced exposes every sub-environment enum the backend supports—precision dosing, deprescribing, web-search missing data, alternatives, new-drug decomposition.”

Scene 6 — Medications + warnings (45 s)

Action: After any step with regimen change, show Current Medications cards (high-risk styling).
Say: “Cards mirror medication_table from the observation; warnings list is explicit env output.”

Scene 7 — Closing (30 s)

Say: “That’s the full loop: HF Space hosts OpenEnv + API, Qwen adapters live under checkpoints/active, Agent Workbench demonstrates orchestrated LLM decisions with evidence and explanations, and Env Explorer proves the same environment over raw WebSockets for OpenEnv compatibility.”

OBS / QuickTime checklist

Capture system audio if you add voiceover in post; or record mic in OBS.
1920×1080, 30 fps (or 60 if you want smooth cursor).
2 s pause after each button click before scrolling away.
If Space sleeps, mouse jiggle or refresh before recording.
Export MP4 H.264 for YouTube / HF dataset card.

Quick troubleshooting on camera (if something breaks)

Symptom	What to say / do
WebSocket errors in Event Log	“Env service reconnect—refresh page; WS URL is derived from the Space origin.”
Run Agent fails	“Check Model Truth—model may still be downloading or Ollama disabled on Space.”
Reward bars all dash	“No step yet—reset and submit once.”
Candidates empty	“Reset episode—env didn’t initialize.”