polyguard-openenv-workbench / polyguard-rl /docs /DEMO_RECORDING_SCRIPT.md
TheJackBright's picture
Deploy GitHub root master to Space
c296d62

PolyGuard Space UI — demo recording script (shot-by-shot)

Use this document while screen-recording the Hugging Face Space (or local Docker). Target length: 8–14 minutes for a full pass, or 3–5 minutes for a highlights reel.


Before you hit record

  1. Open the Space in a clean browser profile or incognito (fewer extensions → fewer glitches).
  2. Set resolution: 1920×1080 or 1440×900; browser zoom 100%.
  3. Fullscreen the Space iframe or use HF “Open in new tab” so the URL bar shows the Space domain.
  4. Wait for cold start: first load may download the model bundle (several minutes). The Event Log and Model Truth panel will tell you if the policy failed to load (heuristic fallback is still usable for env steps).
  5. Optional: hide mouse cursor in OBS if you prefer; otherwise move slowly and pause 2 seconds on each panel after major clicks.

Primary Space (product): https://huggingface.co/spaces/TheJackBright/polyguard-openenv-workbench Runtime: nginx fronts the product API (default 8200) and OpenEnv service (8100); see docker/space/entrypoint.sh.


Where the model lives (Qwen and artifacts)

This matters for what you say on camera.

Location What it is
On the Space container Working directory /app (see entrypoint.sh: cd /app).
Downloaded bundle If checkpoints/active/grpo_adapter/adapter_config.json is missing at boot, scripts/install_hf_active_bundle.py pulls the HF usable model bundle into checkpoints/active/.
Typical layout after install checkpoints/active/active_model_manifest.json — which artifact is active (often GRPO adapter on top of base).
Weights checkpoints/active/grpo_adapter/ (LoRA/PEFT), optionally checkpoints/active/merged/ (full merged weights), checkpoints/active/sft_adapter/.
Base model name Usually Qwen/Qwen2.5-0.5B-Instruct as the Transformers base for adapters (set via env e.g. POLYGUARD_HF_MODEL).

What the UI proves: the Model Truth panel calls GET /policy/model_status (product API). It shows model_id / base_model, run_id, preferred_artifact / loaded_source, and availability flags. Say on camera: “This is live from the API, not hard-coded in the frontend.”


UI map (what appears on screen)

Region Purpose
Hero (“PolyGuard neural safety cockpit”) Marketing copy + quick stats.
Top bar Agent Workbench vs Env Explorer, Task dropdown, Reset Episode, Q Tips.
Status chips “Live” / model line; in Env mode one chip reads ws env (WebSocket to OpenEnv).
Model Truth Qwen / artifact / run / availability.
Advanced strip Only if Task = Advanced — pick raw difficulty + sub_environment.
Episode Overview Mode, task, difficulty, environment, step budget, last reward, patient id, Patient Summary, Risk Delta.
Candidate Actions Legal moves: candidate_id, action type, target/replacement, estimated safety delta (or Blocked).
Action Console Confidence, rationale, Submit vs Run Agent (Agent mode only for Run Agent).
Reward Channels Bars for total + primary + component scores (see below).
Current Medications Cards from observation.
Action History / Warnings Step trace and env warnings.
Decision / Explanation / Evidence Agent mode only (filled after API steps that return those fields).
Event Log Human-readable trace of resets, steps, rewards, errors.

Feature encyclopedia — every panel, branch, and agent

Use this section as a script appendix or judge handout. It mirrors the React workbench in app/ui/frontend/src/App.tsx, the API in app/api/, and the orchestrator in app/agents/orchestrator.py.

A. How the Space is wired (under the hood)

Piece Role
Browser → nginx HF Space exposes one origin; nginx routes paths.
Product API Vite uses API_BASE (default /api). FastAPI serves catalog, reset, step_candidate, orchestrate, model_status, reward_breakdown, etc.
OpenEnv HTTP/WS ENV_BASE defaults to same origin on Spaces (not localhost). Web UI opens ws(s)://<origin>/ws for Env Explorer.
Two Python processes entrypoint.sh starts uvicorn for app.env.fastapi_app (env, port 8100) and uvicorn for app.api (product API, port 8200). Agent mode reset/step still use the API’s in-process PolyGuardEnv; Env mode uses the separate env service over WebSocket.
Important Agent and Env UIs maintain separate React state (agentObservation vs envObservation). Toggling mode clears the Event Log and clears the inactive branch’s episode state so you always know which backend path you are exercising.

B. Hero (“PolyGuard neural safety cockpit”)

Stat Source What to say on camera
Runtime mode === "agent" → “Agent Workbench”; else “Env Explorer”. “This is which transport I am using right now.”
Scenario Human label for current taskId from catalog presets or Advanced. “Which curriculum preset is bound to difficulty + sub-environment.”
Candidates candidate_action_set.length from the active observation. “How many legal moves the env is offering after the last reset/step.”
Reward Last scalar reward for the active branch (null → shown as -). “Verifier scalar after the last step in this mode only.”

C. Top bar — every control

Control Behavior
Agent Workbench Sets mode to agent. Clears env state, event log, error; clears agent panels if switching from env (see handleModeChange).
Env Explorer Sets mode to env. Clears agent-specific observation/reward/decision/evidence.
Task <select> Options: each task preset from GET /env/catalog (task_presets), plus Advanced. Changing a preset updates internal difficulty + sub_environment to match the preset.
Reset Episode Agent: POST /env/reset with body from preset ({ task_id }) or { difficulty, sub_environment }. Refreshes Model Truth first. Clears reward breakdown, decision, explanation, evidence, sets default candidate. Env: WebSocket reset with { difficulty, sub_environment } only (no task_id in WS path—preset is flattened to those two fields). Always clears events at start of reset handler, then appends one “Reset … in agent/env” line.
Q Tips Opens modal walkthrough; highlights DOM nodes with [data-guide="…"]. Skip stores polyguard.qtips.v2.seen in localStorage so first visit auto-opens tips.
Status chips First chip: Live if observation loaded and not done, else Complete / Ready. Second chip: in Agent mode, derived from modelSignal() (Qwen verified or not); in Env mode shows ws env.

D. Model Truth panel — field by field

Data from GET /policy/model_status (PolicyProviderRouter / active_model_status).

Field in UI Typical meaning
Heading label “Qwen 0.5B active” only when Space config matches a strict check (enabled + active + availability + model id regex for Qwen2.5-0.5B-Instruct); else “Qwen not verified” or Ollama-specific text if Ollama wins locally.
Detail paragraph Human sentence: model name, artifact, run_id, optional load_error.
Model model_id or base_model — HF id of the loaded or configured base.
Run run_id from manifest / sweep activation (which training bundle).
Artifact loaded_source or preferred_artifact — e.g. grpo_adapter, merged, sft_adapter.
Availability Key/value pairs from availability dict (which load stages succeeded).

Ollama branch (local dev): If status.ollama.enabled && available, the UI labels Ollama Qwen active and mentions POLYGUARD_PROVIDER_PREFERENCE order. Spaces Dockerfile sets POLYGUARD_ENABLE_OLLAMA=false by default.

E. Advanced strip (Task = Advanced)

Only rendered when taskId === "advanced". Two selects:

  1. Difficulty: easy | medium | hard — passed to reset as difficulty.
  2. Environment: every string in catalog.sub_environments (DDI, BANDIT_MINING, REGIMEN_RISK, PRECISION_DOSING, LONGITUDINAL_DEPRESCRIBING, WEB_SEARCH_MISSING_DATA, ALTERNATIVE_SUGGESTION, NEW_DRUG_DECOMPOSITION).

What each sub-environment stresses (one line each):

Sub-environment What the episode emphasizes
DDI Drug–drug interaction exposure and pair risk.
BANDIT_MINING Policy / bandit exploration style scenario (see preset “Bandit Mining”).
REGIMEN_RISK Overall regimen burden and safety tradeoffs.
PRECISION_DOSING Dose buckets, organ-sensitive flags in observation.
LONGITUDINAL_DEPRESCRIBING Multi-step taper / stop sequences over time.
WEB_SEARCH_MISSING_DATA Rewards process fidelity for evidence-fetch actions.
ALTERNATIVE_SUGGESTION Substitution / alternative action types rewarded more.
NEW_DRUG_DECOMPOSITION Hard track: decompose novel drug string into components.

F. Episode Overview — every KPI and subsection

KPI grid (always eight rows):

KPI Source
Mode Literal “Agent Workbench” or “Env Explorer”.
Task Preset label or “Advanced”.
Difficulty observation.deterministic_contract.difficulty or -.
Environment deterministic_contract.sub_environment or observation.sub_environment.
Step Budget observation.step_budget_remaining.
Last Reward Active branch’s last reward (after reset, Agent clears to - until first step).
Patient patient_summary.patient_id or patient_summary.id.
Status Complete if done, else Live if observation exists, else Ready.

Patient Summary <dl>: First 8 keys of observation.patient_summary (keys humanized: underscores → spaces, title case). Typical keys include demographics, allergies, high-level clinical flags—whatever the backend puts on PolyGuardObservation.

Risk Delta <dl>: First 8 entries of observation.burden_score_summary — burden-related scalars the env uses for reward deltas.

G. Candidate Actions list — each column

Each row is one CandidateAction from candidate_action_set.

Column / concept Meaning
candidate_id Stable id (e.g. cand_…) — must match when submitting.
Action label Humanized action_type (STOP_DRUG, SUBSTITUTE_WITHIN_CLASS, …).
Third column target_drug or replacement_drug or mode — whichever is most informative.
Right column estimated_safety_delta formatted to 3 decimals, or Blocked if legality_precheck === false.
Disabled rows You cannot select illegal candidates; click does nothing.
Default selection Agent: first candidate in list. Env: first legal candidate that is not KEEP_REGIMEN and not REQUEST_*, else first legal non–KEEP_REGIMEN, else first in list (defaultCandidateForMode).

Hidden fields you can mention if showing JSON elsewhere: dose_bucket, taper_days, monitoring_plan, evidence_query, new_drug_name, candidate_components, uncertainty_score, rationale_tags, required_monitoring, burden_delta, disease_stability_estimate.

H. Action Console — every input and button

UI element Effect
Type / Mode / Target / Replacement / Dose / Uncertainty Read-only snapshot of the currently selected candidate.
Confidence Number input 0.001–0.999 step 0.001; sent as confidence on Submit Candidate (Agent) or embedded in WS payload (Env).
Rationale Free text → rationale_brief / rationale on the action.
Submit Candidate (Agent) Calls POST /env/step_candidate with { candidate_id, confidence, rationale_brief }. API finds matching legal action and calls env.step.
Submit Env Step (Env) Same confidence/rationale + full action payload built by buildActionPayload → WS step.
Run Agent Only when mode === "agent" and observation exists and not done. Calls POST /agents/orchestrate with empty JSON body. Disabled in Env mode.
Done notice If done, shows which mode completed and termination_reason from info if present. Primary button becomes Reset Episode (shortcut).

I. Reward Channels — every bar (exact keys)

The UI renders exactly these keys in order (REWARD_KEYS in App.tsx14 rows):

# Key Role
1 total_reward Weighted aggregate of component scores (aggregate_rewards in reward_scaling.py).
2 primary_safety_legality Roll-up: legality, candidate alignment, anti-cheat, uncertainty calibration (reward_router.compute_primary_reward_channels).
3 primary_clinical_improvement Roll-up: safety delta, burden improvement, disease stability.
4 primary_dosing_quality Roll-up: dosing quality + abstention quality.
5 primary_process_integrity Roll-up: format compliance, efficiency, process fidelity, explanation grounding.
6 legality_score Action legal per safety verifier.
7 safety_delta_score Movement on severe DDI / risk proxy vs pre-step state.
8 burden_improvement_score Medication burden before vs after.
9 disease_stability_score Stability heuristic vs disruptive action types.
10 dosing_quality_score Dose-mode and bucket appropriateness.
11 process_fidelity_score Follows intended workflow for sub-environment (e.g. fetch evidence when required).
12 explanation_grounding_score Rationale present / grounded.
13 anti_cheat_score Collapses when anti-cheat triggers.
14 uncertainty_calibration_score Confidence vs uncertainty alignment.

Note: total_reward is row 1; rows 2–5 are primary channels; rows 6–14 are exposed component scores. Other components (format_compliance_score, efficiency_score, candidate_alignment_score, abstention_quality_score) still exist in the backend RewardBreakdown and feed primaries + total, but this UI does not give them their own bar rows.

Bars show - when the value is missing (no step yet or breakdown not returned). Bar width = value × 100% with value clamped to [0.001, 0.999].

Agent vs breakdown source: After a step, UI prefers info.reward_breakdown; it may also call GET /env/reward_breakdown. Env: uses info.reward_breakdown from the WebSocket step packet; if empty, the UI clears the reward panel.

J. Current Medications cards

Built from observation.medication_table[]. Each card:

  • Title: drug / drug_id / name.
  • High-risk ribbon: if high_risk or is_high_risk_elderly or Beers / warning flags.
  • Body: indication or class_name or atc_class.
  • Meta row: dose bucket or mg dose; taper vs monitoring or route.

K. Action History vs Warnings

Panel Source
Action History observation.action_history — each item shows step index and action_type / candidate_id / reward snippet.
Warnings observation.warning_summary — list of human-readable env warnings (DDIs, constraints, etc.).

L. Decision / Explanation / Evidence (Agent only)

Rendered as JSON <pre> blocks:

Title When populated Content origin
Decision Agent mode only. final_action on the packet. For step_candidate, the API returns the standard step payload — typically no final_action field, so this panel may stay empty after manual submit. For orchestrate, final_action is the PolyGuardAction after critic (what actually hit env.step).
Explanation Agent mode only. explanation — output of ExplainerAgent after the step (orchestrate returns it). Usually empty after raw step_candidate unless API adds it.
Evidence Agent mode only. evidence key on packet. orchestrate returns evidence_out from EvidenceAgent.run(state) (retrieval / web-fallback bundle). step_candidate does not attach orchestrator evidence — panel often empty on manual clicks.

Demo takeaway: Tell viewers: “To populate Decision / Explanation / Evidence in the UI, use Run Agent (orchestrate). Manual Submit Candidate updates the env and rewards but does not replay the full multi-agent JSON into those three panels.”

M. Event Log vs Q Tips

Feature Behavior
Event Log Prepends timestamped strings: resets, each step’s reward line, errors. Capped at 24 lines. Cleared when you click Reset Episode (handler starts with setEvents([]) then appends) — not the same as mode switch clearing.
Q Tips 10-step overlay; does not mutate env.

N. Orchestrator — every agent in order (Run Agent)

When POST /agents/orchestrate runs, Orchestrator.run_step executes:

Step Agent class What it does (operator language)
1 MedRecAgent Summarizes current medication list / reconciliation view for downstream modules. Output key: medrec.
2 EvidenceAgent Retrieves local evidence (and optional web fallback) for missing or thin context. Shown in UI evidence when orchestrating.
3 GraphSafetyAgent Graph-style DDI / duplicate therapy style signals. Output: graph.
4 DosingAgent Flags dose-sensitive windows and dosing opportunities. Feeds dosing_active into supervisor.
5 CandidateAgent Wraps env candidate builder — produces the legal CandidateAction list.
6 SupervisorAgent Chooses planner mode: regimen vs dose vs REVIEW (conservative routing).
7 Contextual bandit ContextualBanditPolicy (LinUCB or Thompson sampling via POLYGUARD_BANDIT_ALGO) proposes top-k (POLYGUARD_BANDIT_TOP_K) candidates for the planner to consider.
8 PlannerAgent Calls PolicyProviderRouter.select_candidate — this is where Transformers + Qwen + PEFT (or Ollama, or safety ranker fallback) picks a candidate_id and rationale.
9 CriticAgent Safety veto / repair. May replace proposed action with a safer final_action.
10 Replan / debate (optional) If coordination_mode is replan_on_veto or lightweight_debate and critic rejects, planner may rerun on review candidates; debate_rounds increments.
11 PolyGuardEnv.step Commits final_action, returns observation, reward, done, info.
12 Bandit update If the chosen candidate was in the bandit pool, updates bandit statistics with the reward (learning signal for next orchestrate).
13 ExplainerAgent Builds explanation object for audit / UI.

Environment variables (mention for power users):

Variable Effect
POLYGUARD_POLICY_STACK llm+bandit (default): planner sees bandit-shortlisted candidates. llm-only: all supervisor-filtered candidates. bandit-only: no LLM — first bandit pick with fixed rationale.
POLYGUARD_BANDIT_* Algorithm, alpha, epsilon, seed, top-k.
POLYGUARD_PROVIDER_PREFERENCE e.g. transformers vs ollama order.
POLYGUARD_ENABLE_ACTIVE_MODEL Must be true on Space for bundle path; POLYGUARD_HF_MODEL sets base id for adapters.

O. Qwen and fallbacks (planner path)

PolicyProviderRouter (app/models/policy/provider_runtime.py):

  1. Builds a JSON instruction listing candidates and asks for candidate_id=…; rationale=….
  2. Tries providers in POLYGUARD_PROVIDER_PREFERENCE (default Transformers on Space).
  3. Parses model text for a legal candidate_id; on failure uses safety_ranker deterministic ordering.

So: Even without Qwen load, Run Agent still completes using ranker / bandit — mention that if Model Truth is red.

P. Full observation contract (API / types)

The TypeScript type EnvObservation (lib/types.ts) lists fields the backend may send. The main workbench highlights patient summary, medication table, candidates, burden summary, action history, warnings, step budget, and sub-environment. Not all fields get their own panel — if you open browser DevTools → Network → reset / step response, you can narrate extras:

Field Typical use
comorbidity_summary Comorbidity list for the patient.
organ_function_summary eGFR / hepatic flags for dosing scenarios.
labs_vitals_summary Labs relevant to risk scoring.
graph_safety_summary Aggregated graph / DDI context.
precision_dosing_flags Tags when sub-environment is dosing-heavy.
unresolved_conflicts Specialist conflict strings.
abstention_indicators When the env suggests review / abstain.
deterministic_contract Difficulty + sub-environment + scenario id contract for reproducibility.

Q. Q Tips — copy for each slide (matches GUIDE_STEPS)

# Title Body (read aloud or paraphrase)
1 Start here PolyGuard is an interactive OpenEnv workbench; top bar picks runtime, scenario, reset.
2 Choose the runtime Agent Workbench = REST API + reward breakdown + Qwen path; Env Explorer = WebSocket to OpenEnv.
3 Pick a scenario Presets load real patient/regimen state from backend.
4 Check the model truth /policy/model_status; Qwen only “verified” when API says adapters live.
5 Read the episode state Task, patient, step budget, reward, risk delta from latest env response.
6 Review legal actions Candidate rows = legal moves; inspect safety delta and mode.
7 Submit or ask the agent Submit Candidate vs Run Agent; check model panel before claiming LLM.
8 Inspect reward channels Real scorer output per channel; empty = no step yet.
9 Track regimen changes Medication cards + history + warnings = not canned.
10 Follow the run Event log shows resets, steps, rewards, errors plainly.

Agent Workbench vs Env Explorer (say this exactly on camera)

Agent Workbench Env Explorer
Reset POST /env/reset with task preset (e.g. { "task_id": "easy_screening" }) via product API. WebSocket reset message to OpenEnv /ws with { difficulty, sub_environment }.
Submit POST /env/step_candidate — product API resolves candidate_id + your confidence + rationale into a full action and steps the same in-process PolyGuardEnv. WebSocket step — payload built from selected candidate; talks directly to OpenEnv service.
Run Agent POST /agents/orchestrate — runs the full orchestrator (med rec, evidence, graph, dosing, candidates, supervisor, bandit, planner/LLM, critic, env step, explainer). Button disabled — there is no orchestrator path over raw WS-only mode in this UI.
Decision / Explanation / Evidence panels Populated after orchestrate or after steps that echo final_action / explanation / evidence (orchestrate returns rich evidence from EvidenceAgent pipeline). Always empty in the UI by design — those panels are null in Env mode (App.tsx only passes agent-mode state to DetailPanels).
Reward breakdown From step info.reward_breakdown or fallback GET /env/reward_breakdown. From WS step packet info.reward_breakdown when present.
Switching mode Clears the Event Log and resets the other mode’s transient state — mention that so viewers don’t think it’s a bug. Same.

One-liner for judges: “Agent Workbench is the full product API plus optional LLM-orchestrated policy; Env Explorer is the raw OpenEnv WebSocket contract for the same underlying environment.”


Reward channels — what they mean and how they’re computed (talk track)

Rewards are verifier-backed, bounded to roughly [0.001, 0.999] (3 decimal places in UI).

Four primary channels (high level)

These are averages of component groups (app/env/reward_router.pycompute_primary_reward_channels):

  1. primary_safety_legality — legality, candidate id alignment, anti-cheat, uncertainty calibration.
  2. primary_clinical_improvement — safety delta vs severe pairs, burden improvement, disease stability.
  3. primary_dosing_quality — dosing quality + abstention (e.g. appropriate review requests under uncertainty).
  4. primary_process_integrity — format compliance, efficiency (step budget), process fidelity, explanation grounding.

Components (examples — compute_reward_breakdown)

The environment builds scores such as:

  • legality_score: high if the action is legal per safety report.
  • safety_delta_score / burden_improvement_score: from before/after burden and severe DDI pair counts (_delta_to_reward).
  • anti_cheat_score: collapses if anti-cheat flags the trajectory.
  • uncertainty_calibration_score: penalizes overconfidence vs modeled uncertainty.
  • Sub-environment tweaks: e.g. WEB_SEARCH_MISSING_DATA boosts process fidelity when using FETCH_EXTERNAL_EVIDENCE; NEW_DRUG_DECOMPOSITION rewards decomposition actions with components.

Then components are scaled/clamped, primary channels recomputed, and total_reward = weighted aggregate (aggregate_rewards).

Demo line: “Bars update only after a real step — empty fields mean we haven’t stepped yet, not fake filler.”


Built-in Q Tips (on-screen tour)

Click Q Tips in the top bar. The app cycles 10 slides (App.tsxGUIDE_STEPS):

  1. Start here — top bar, scenarios, reset.
  2. Choose the runtime — Agent vs Env.
  3. Pick a scenario — presets load real patient/regimen state.
  4. Check the model truth — /policy/model_status.
  5. Read episode state — overview + patient summary.
  6. Review legal actions — candidates.
  7. Submit or ask the agent — Submit vs Run Agent.
  8. Inspect reward channels.
  9. Medications + history/warnings.
  10. Event log — errors and connectivity.

Recording tip: Record Q Tips once in full voiceover (“I’ll use the in-app tour…”) then dismiss and do the live walkthrough below.


Shot-by-shot recording script

Scene 0 — Intro (30–45 s)

Action: Scroll slightly so hero + top bar are visible.
Say: “This is PolyGuard on Hugging Face Spaces: an OpenEnv workbench for polypharmacy safety. The backend runs a real PolyGuardEnv with verifiable rewards; the UI can drive it through the product API or raw OpenEnv WebSockets.”


Scene 1 — Model Truth (45–60 s)

Action: Stay on Agent Workbench. Click nothing yet; point at Model Truth.
Say: “Model Truth is live from /policy/model_status. Here we see the base model—typically Qwen 2.5 0.5B Instruct—which artifact is loaded—often the GRPO adapter—and the run id. On Spaces, weights are under /app/checkpoints/active after the bundle installer runs.”

If panel shows unavailable: “Cold start or CPU load can delay the bundle; the environment still works for manual candidate submission; Run Agent may fall back to non-LLM routing depending on config.”


Scene 2 — Easy task, manual submit (Agent) (90–120 s)

Action: Task → Easy Screening (DDI, easy). Reset Episode.
Say: “Easy Screening fixes difficulty easy and sub-environment DDI—drug–drug interaction screening.”

Action: Pan Episode Overview — read Patient Summary and Risk Delta aloud briefly.
Say: “This patient block and risk delta come straight from the observation object.”

Action: Candidate Actions — click 2–3 rows; show Blocked vs legal. Select a legal row.
Say: “Candidates are legal moves from the env; illegal rows are disabled.”

Action: Action Console — tweak Confidence and Rationale slightly. Click Submit Candidate.
Say: “Submit Candidate hits /env/step_candidate with my chosen legal action, confidence, and rationale.”

Action: After response, pause on Reward Channels and Last Reward in overview.
Say: “These bars are the verifier breakdown; total reward is the scalar GRPO-style signal we train on.”

Action: Action History — show one new line. Event Log — show the new reward line.
Say: “History and event log give an audit trail—not a canned animation.”


Scene 3 — Run Agent (orchestrator + LLM path) (90–120 s)

Prerequisite: Prefer recording when Model Truth shows enabled and active with Qwen artifacts.

Action: Reset Episode again (same or different task). Click Run Agent. Wait for completion.
Say: “Run Agent calls /agents/orchestrate. That runs med reconciliation, evidence retrieval, graph safety, dosing hints, candidate generation, supervisor mode, a contextual bandit shortlist, then the planner—here that’s where the loaded Qwen policy can choose among candidates—the critic veto, environment step, and explainer.”

Action: Scroll to Decision, Explanation, Evidence JSON panels.
Say: “These three panels are only populated in Agent Workbench mode. Env Explorer deliberately hides them because the raw WebSocket client doesn’t run the full orchestrator response bundle.”

Action: Point at Evidence — mention structured retriever output vs empty object if task didn’t fetch.
Say: “Evidence is whatever the evidence agent produced for this state—grounding for clinician trust.”


Scene 4 — Env Explorer contrast (60–90 s)

Action: Click Env Explorer. Reset Episode (same task: Easy Screening).
Say: “Now the UI resets over WebSocket reset to the OpenEnv service on port 8100—same scenarios, different transport.”

Action: Select a candidate, Submit Env Step.
Say: “Submit Env Step sends a WebSocket step with the action payload—no /agents/orchestrate.”

Action: Scroll to Decision / Explanation / Evidence — show they stay empty or “No data.”
Say: “This is intentional: I’m proving the low-level env API, not the full agent stack.”

Action: Event Log — note new lines tagged from env step.


Scene 5 — Task variety (2–3 minutes, optional montage)

For each preset, do Reset + one legal Submit (Agent mode is enough):

Task Difficulty Sub-environment What to say
Easy Screening easy DDI “Fast DDI-focused episode.”
Budgeted Screening medium REGIMEN_RISK “More steps, regimen-risk tradeoffs.”
Complex Tradeoff hard REGIMEN_RISK “Harder patient draw, tighter budgets.”
Bandit Mining hard BANDIT_MINING “Bandit-style policy mining scenario.”

Action: Switch Task to Advanced. Set e.g. hard + PRECISION_DOSING. Reset.
Say: “Advanced exposes every sub-environment enum the backend supports—precision dosing, deprescribing, web-search missing data, alternatives, new-drug decomposition.”


Scene 6 — Medications + warnings (45 s)

Action: After any step with regimen change, show Current Medications cards (high-risk styling).
Say: “Cards mirror medication_table from the observation; warnings list is explicit env output.”


Scene 7 — Closing (30 s)

Say: “That’s the full loop: HF Space hosts OpenEnv + API, Qwen adapters live under checkpoints/active, Agent Workbench demonstrates orchestrated LLM decisions with evidence and explanations, and Env Explorer proves the same environment over raw WebSockets for OpenEnv compatibility.”


OBS / QuickTime checklist

  • Capture system audio if you add voiceover in post; or record mic in OBS.
  • 1920×1080, 30 fps (or 60 if you want smooth cursor).
  • 2 s pause after each button click before scrolling away.
  • If Space sleeps, mouse jiggle or refresh before recording.
  • Export MP4 H.264 for YouTube / HF dataset card.

Quick troubleshooting on camera (if something breaks)

Symptom What to say / do
WebSocket errors in Event Log “Env service reconnect—refresh page; WS URL is derived from the Space origin.”
Run Agent fails “Check Model Truth—model may still be downloading or Ollama disabled on Space.”
Reward bars all dash “No step yet—reset and submit once.”
Candidates empty “Reset episode—env didn’t initialize.”

Related docs