Spaces:

anugrahteesdollar
/

drugenv-trainer

Runtime error

App Files Files Community

anugrahteesdollar commited on 16 days ago

Commit

e681925

verified ·

1 Parent(s): 77bad22

initial: drugenv trainer control panel

Browse files

Files changed (34) hide show

README.md +230 -5
client.py +54 -0
dashboard.html +543 -0
dashboard.py +129 -0
demo.html +1639 -0
models.py +927 -0
openenv.yaml +12 -0
pyproject.toml +55 -0
server/Dockerfile +80 -0
server/__init__.py +3 -0
server/app.py +81 -0
server/biology/__init__.py +11 -0
server/biology/target_index.py +96 -0
server/hackathon_environment.py +325 -0
server/requirements.txt +1 -0
server/rewards/__init__.py +3 -0
server/rewards/reward.py +265 -0
server/rules/__init__.py +3 -0
server/rules/engine.py +210 -0
server/simulator/__init__.py +21 -0
server/simulator/latent_state.py +175 -0
server/simulator/noise.py +128 -0
server/simulator/output_generator.py +695 -0
server/simulator/transition.py +201 -0
server/tasks/__init__.py +4 -0
server/tasks/generator.py +132 -0
server/tasks/procedural_generator.py +232 -0
server/tasks/scenarios.py +370 -0
space/__init__.py +0 -0
space/training/Dockerfile +36 -0
space/training/README.md +116 -0
space/training/__init__.py +0 -0
space/training/app.py +943 -0
space/training/requirements.txt +1 -0

README.md CHANGED Viewed

@@ -1,10 +1,235 @@
 ---
-title: Drugenv Trainer
-emoji: 👀
-colorFrom: blue
-colorTo: gray
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Drug Target Validation Environment
 sdk: docker
 pinned: false
+app_port: 8000
+tags:
+  - openenv
+  - reinforcement-learning
+  - drug-discovery
+  - pharma
 ---
+# 🧬 DrugEnv — Drug Target Validation Environment
+> **DrugEnv** — an OpenEnv RL environment that teaches LLMs to do computational drug-target validation.
+This repository implements an OpenEnv-compatible reinforcement learning environment in which an agent acts as a **computational drug discovery scientist**. Given a proposed drug target (gene / protein) and a disease context, the agent must investigate target viability by issuing simulated bioinformatics, clinical, and experimental queries, and finally submit a calibrated **go / no-go** validation report with a confidence score.
+The environment is designed as a partially observable Markov decision process (POMDP) with:
+- a hidden ground-truth `TargetProfile` (expression, druggability, selectivity, toxicity, clinical precedent)
+- noisy database / assay outputs governed by `DataQualityState`
+- a single unified **experimental credit** budget per episode
+- visible task metadata, dossier of accumulated findings, and step history
+- dense step-wise reward plus terminal reward for decision quality and evidence coverage
+## Why drug target validation?
+Roughly **90% of drug development programs fail** in clinical trials, and a large fraction of failures trace back to mistakes during target validation: targets that are not actually disease-driving, are undruggable, lack selectivity, or have hidden toxicity. The cost of progressing a single bad target through Phase III can run into the **billions of dollars**. Even modest improvements in early-stage decision quality therefore translate into enormous savings and faster cures.
+This environment lets you train and benchmark agents on exactly that bottleneck: **acquiring the right evidence cheaply and submitting a well-calibrated go / no-go**.
+## How it works
+At a high level, each episode looks like this:
+1. `reset()` selects a drug-target-validation scenario and seeds the simulator.
+2. The agent receives a `ValidationObservation` describing the target, indication, remaining credits, accumulated dossier, and step history.
+3. The agent submits a `DrugTargetAction` such as `query_expression`, `druggability_screen`, `off_target_screen`, or `submit_validation_report`.
+4. The rule engine checks credit budget, redundancy, and ordering prerequisites.
+5. The transition engine deducts credits and asks the output generator to simulate evidence from the hidden `TargetProfile`.
+6. The reward computer scores the step for novelty, reasoning coherence, credit efficiency, and rule compliance.
+7. The environment returns a new observation with an updated `EvidenceDossier`, latest output, violations, and reward.
+8. The episode ends when the agent submits a validation report, exhausts credits, or hits the step limit.
+## The core mental model
+### Hidden state
+The simulator maintains a `FullLatentState` that the agent never sees directly:
+- `TargetProfile` — true expression level / tissue specificity / disease over-expression, druggability score, binding-pocket quality, selectivity ratio, off-target genes, toxicity profile, clinical precedent, expected in-vitro and in-vivo behaviour, plus the hidden `correct_decision`, `true_viability_score`, `key_evidence_dimensions`, and any `misleading_signals`.
+- `DataQualityState` — noise level, false-positive rate, false-negative rate, database coverage.
+- `CreditState` — total / used / remaining experimental credits.
+- `ValidationProgress` — boolean flags for which evidence dimensions have been investigated and whether a report has been submitted.
+### Visible state
+The agent only sees `ValidationObservation`, which includes:
+- `target_gene`, `disease_context`, `indication`
+- `credits_remaining` / `credits_total`
+- `dossier` — running `EvidenceDossier` of expression / protein / clinical / safety / literature / experimental findings, plus any `flagged_red_flags`
+- `pipeline_history` — list of past actions and their summary outputs
+- `latest_output` — typed `IntermediateOutput` from the most recent step
+- `rule_violations` and `step_reward_breakdown` for the last step
+## Action space
+| Category | Action | Cost (credits) |
+|---|---|---|
+| Expression & omics | `query_expression`, `differential_expression`, `pathway_enrichment`, `coexpression_network` | 2 |
+| Protein & structure | `protein_structure_lookup`, `binding_site_analysis`, `druggability_screen` | 3 |
+| Protein & structure | `protein_interaction_network` | 2 |
+| Clinical & safety | `clinical_trial_lookup`, `toxicity_panel`, `off_target_screen`, `patient_stratification` | 3 |
+| Literature | `literature_search`, `evidence_synthesis`, `competitor_landscape` | 1 |
+| Experimental | `crispr_knockout`, `biomarker_correlation` | 4 / 3 |
+| Experimental | `in_vitro_assay` | 5 |
+| Experimental | `in_vivo_model` | 8 |
+| Meta | `flag_red_flag`, `request_expert_review` | 0 / 1 |
+| Terminal | `submit_validation_report` | 0 |
+`submit_validation_report` carries two extra fields: `final_decision` (`"go"` or `"no_go"`) and `confidence` in `[0, 1]`. The episode ends as soon as the report is submitted.
+## Reward function
+Every step receives a decomposed reward:
+```
+R_t = evidence_novelty_bonus
+    + reasoning_coherence_bonus
+    + credit_efficiency_penalty
+    + rule_violation_penalty
+    + [φ(s_{t+1}) − φ(s_t)]
+```
+When the episode ends, a terminal reward is added:
+```
+R_T = 0.40 * decision_accuracy
+    + 0.35 * evidence_coverage
+    + 0.15 * credit_efficiency
+    + 0.10 * reasoning_coherence
+```
+Where:
+- `decision_accuracy` — `1.0` if the final go / no-go matched the hidden `correct_decision`, scaled by `2 * |confidence - 0.5|` so a confidently correct answer is fully rewarded and a confidently wrong answer is fully penalised.
+- `evidence_coverage` — fraction of the scenario's `key_evidence_dimensions` (e.g. `expression`, `druggability`, `off_target`, `clinical`, `in_vitro`) that the agent actually investigated.
+- `credit_efficiency` — `1 − redundant_calls / total_calls`.
+- `reasoning_coherence` — fraction of actions whose soft prerequisites (e.g. `expression` before `toxicity`, `in_vitro` before `in_vivo`) were satisfied.
+Hard penalties are applied for: submitting without any evidence, submitting without a decision or confidence, and exhausting credits without ever submitting a report.
+## Curated scenarios
+| Name | Difficulty | Correct decision | Why it's interesting |
+|---|---|---|---|
+| `egfr_nsclc_viable` | easy | `go` | Clear viable target — expression + druggability alone are sufficient. |
+| `kras_pdac_borderline` | medium | `go` | Historically undruggable; recent inhibitor literature is decisive. |
+| `cd33_aml_misleading` | hard | `no_go` | Naive expression query says "go", but off-target + toxicity + clinical reveal the right answer. |
+| `tp53_solid_tumors_clear_fail` | easy-medium | `no_go` | Druggability check alone is sufficient. |
+| `ptpn11_juvenile_mml_complex` | very hard | `go` | Requires `binding_site_analysis(include_allosteric=True)`, off-target work, patient stratification, and an in-vitro assay. |
+The procedural generator (`server/tasks/procedural_generator.py`) layers on additional easy / medium / hard scenarios sampled from a pool of 20 real cancer targets and 8 cancer indications.
+## Setup
+```bash
+# 1. Install dependencies (env runtime only)
+pip install -e .
+# 2. Or install with training extras (torch + transformers + trl + peft pinned to working set)
+pip install -e .[train]
+# 3. Run the environment server
+PYTHONPATH=. python -m server.app
+# server is now available at http://localhost:8000
+```
+The legacy `uv sync` workflow still works if you have `uv.lock` checked
+in locally; the editable `pip install` path above is the primary
+supported route.
+## Talking to the environment
+```python
+from client import DrugTargetEnv
+from models import DrugTargetAction
+with DrugTargetEnv(base_url="http://localhost:8000") as env:
+    result = env.reset()
+    print(result.observation.target_gene, "/", result.observation.indication)
+    result = env.step(DrugTargetAction(
+        action_type="query_expression",
+        parameters={"database": "GTEx"},
+        reasoning="Establish tissue baseline",
+    ))
+    print(result.observation.latest_output.summary)
+    result = env.step(DrugTargetAction(
+        action_type="submit_validation_report",
+        reasoning="Sufficient evidence for go",
+        final_decision="go",
+        confidence=0.85,
+    ))
+    print("done:", result.done, "reward:", result.reward)
+```
+## Running the baseline agent
+```bash
+PYTHONPATH=. python run_agent.py
+```
+The script writes a live JSON snapshot to `_dashboard_state.json` after every step so you can watch the agent's progress. Default model is `Qwen/Qwen2.5-3B-Instruct`.
+## Reproduce
+Three commands cover the env-locally / training-locally / training-on-Space paths:
+```bash
+# 1. Env locally (CPU is fine — the env itself is dependency-light)
+pip install -e . && PYTHONPATH=. python -m server.app
+# → http://localhost:8000  (also at https://huggingface.co/spaces/anugrahteesdollar/drugenv when deployed)
+# 2. Training locally (single GPU, vanilla GRPO)
+pip install -e .[train]
+PYTHONPATH=. python -m training.training_script \
+    --model-id Qwen/Qwen2.5-3B-Instruct \
+    --evidence-dir evidence \
+    --output-dir runs/grpo-output
+# 3. Training on a Hugging Face Space (H200 single-GPU)
+#    Push space/training/ to anugrahteesdollar/drugenv-trainer, set PUSH_REPO + HF_TOKEN
+#    in the Space variables, then POST /train.
+#    → https://huggingface.co/spaces/anugrahteesdollar/drugenv-trainer
+```
+The trainer Space's FastAPI control panel (`space/training/app.py`)
+streams a live evidence dashboard while training runs — per-step
+training curve, mid-training checkpoint progression, and a before /
+after summary card. Default expected hardware: **H200 single-GPU**
+(`h200x1`); H200 is ≈ 4× A100 throughput, ~$0.05–0.10 per step on
+Qwen2.5-3B-class GRPO.
+An optional **SFT warm-start** (`training/sft_warmstart.py`) is
+controlled via the `SFT_WARMSTART` env var on the Space (default on).
+It collects oracle trajectories on the curated scenario library, SFTs
+the base model with a small LoRA, and hands the merged checkpoint to
+GRPO so the policy starts with a non-zero prior over correct
+trajectories.
+## Baseline scores
+| Difficulty bucket | Random policy | Heuristic policy | Trained Qwen2.5-3B |
+|---|---|---|---|
+| Easy (`egfr_nsclc_viable`) | _filled in after first training run_ | _filled in after first training run_ | _filled in after first training run_ |
+| Medium (`kras_pdac_borderline`) | _filled in after first training run_ | _filled in after first training run_ | _filled in after first training run_ |
+| Hard (`cd33_aml_misleading`) | _filled in after first training run_ | _filled in after first training run_ | _filled in after first training run_ |
+The trainer Space writes the populated table to
+`evidence/before_after_metrics.json` automatically on every run.
+## Evolution note
+The deployment scaffolding in this repository — the trainer Space
+control panel, the live training-evidence callback, the SFT warm-start
+script, and the working dependency pin set — was originally validated
+against a particle-physics-themed prototype and then carried forward
+when we pivoted to drug discovery. The simulator, scenarios, action
+space, reward function, and rules engine are all drug-domain native;
+the inheritance is exclusively in the training and evaluation
+scaffolding.

client.py ADDED Viewed

	@@ -0,0 +1,54 @@

+"""Drug Target Validation Environment Client.
+Provides the ``DrugTargetEnv`` class that communicates with the
+environment server over WebSocket / HTTP using the OpenEnv protocol.
+"""
+from typing import Dict
+from openenv.core.client_types import StepResult
+from openenv.core.env_server.types import State
+from openenv.core import EnvClient
+try:  # pragma: no cover - package import path
+    from .models import DrugTargetAction, ValidationObservation
+except ImportError:  # pragma: no cover - direct module import path
+    from models import DrugTargetAction, ValidationObservation
+class DrugTargetEnv(
+    EnvClient[DrugTargetAction, ValidationObservation, State]
+):
+    """Client for the Drug Target Validation Environment.
+    Example:
+        >>> with DrugTargetEnv(base_url="http://localhost:8000") as env:
+        ...     result = env.reset()
+        ...     print(result.observation.target_gene)
+        ...     result = env.step(DrugTargetAction(
+        ...         action_type="query_expression",
+        ...         parameters={"database": "GTEx"},
+        ...         reasoning="baseline expression survey",
+        ...     ))
+        ...     print(result.observation.latest_output.summary)
+    """
+    def _step_payload(self, action: DrugTargetAction) -> Dict:
+        return action.model_dump()
+    def _parse_result(
+        self, payload: Dict
+    ) -> StepResult[ValidationObservation]:
+        obs_data = payload.get("observation", {})
+        observation = ValidationObservation(**obs_data)
+        return StepResult(
+            observation=observation,
+            reward=payload.get("reward"),
+            done=payload.get("done", False),
+        )
+    def _parse_state(self, payload: Dict) -> State:
+        return State(
+            episode_id=payload.get("episode_id"),
+            step_count=payload.get("step_count", 0),
+        )

dashboard.html ADDED Viewed

	@@ -0,0 +1,543 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+<meta charset="utf-8" />
+<meta name="viewport" content="width=device-width, initial-scale=1" />
+<title>Bio-Experiment Agent Dashboard</title>
+<link rel="preconnect" href="https://fonts.googleapis.com" />
+<link href="https://fonts.googleapis.com/css2?family=JetBrains+Mono:wght@400;600&family=DM+Sans:wght@400;500;700&display=swap" rel="stylesheet" />
+<style>
+:root {
+  --bg: #0c0e14;
+  --surface: #151822;
+  --surface2: #1c2030;
+  --border: #2a2f42;
+  --text: #e2e4ea;
+  --text-dim: #8b90a5;
+  --accent: #5ce0d8;
+  --accent2: #7c6cf0;
+  --green: #4ade80;
+  --red: #f87171;
+  --amber: #fbbf24;
+  --blue: #60a5fa;
+  --pink: #f472b6;
+}
+*, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; }
+body { background: var(--bg); color: var(--text); font-family: 'DM Sans', system-ui, sans-serif; line-height: 1.5; min-height: 100vh; }
+.mono { font-family: 'JetBrains Mono', monospace; }
+.header { display: flex; align-items: center; justify-content: space-between; padding: 14px 28px; border-bottom: 1px solid var(--border); background: var(--surface); }
+.header h1 { font-size: 18px; font-weight: 700; letter-spacing: -.3px; }
+.header h1 span { color: var(--accent); }
+.header-right { display: flex; align-items: center; gap: 10px; }
+.status-pill { font-size: 12px; padding: 4px 14px; border-radius: 20px; font-weight: 600; text-transform: uppercase; letter-spacing: .5px; }
+.status-pill.live { background: rgba(76,222,128,.15); color: var(--green); }
+.status-pill.done { background: rgba(248,113,113,.15); color: var(--red); }
+.status-pill.waiting { background: rgba(139,144,165,.15); color: var(--text-dim); }
+.btn { padding: 6px 16px; border-radius: 8px; border: 1px solid var(--border); background: var(--surface2); color: var(--text); font-size: 12px; font-weight: 600; cursor: pointer; transition: all .15s; }
+.btn:hover { border-color: var(--accent); color: var(--accent); }
+.btn.primary { background: rgba(92,224,216,.12); border-color: var(--accent); color: var(--accent); }
+.btn.primary:hover { background: rgba(92,224,216,.25); }
+.btn.danger { border-color: var(--red); color: var(--red); }
+.btn.danger:hover { background: rgba(248,113,113,.12); }
+.grid { display: grid; grid-template-columns: 1fr 1fr 1fr; gap: 16px; padding: 20px 28px; max-width: 1600px; }
+@media (max-width: 1100px) { .grid { grid-template-columns: 1fr 1fr; } }
+@media (max-width: 700px)  { .grid { grid-template-columns: 1fr; } }
+.card { background: var(--surface); border: 1px solid var(--border); border-radius: 12px; padding: 18px 20px; overflow: hidden; }
+.card h2 { font-size: 11px; font-weight: 600; text-transform: uppercase; letter-spacing: 1px; color: var(--text-dim); margin-bottom: 12px; }
+.card.span2 { grid-column: span 2; }
+.card.span3 { grid-column: span 3; }
+@media (max-width: 700px) { .card.span2, .card.span3 { grid-column: span 1; } }
+.gauge-row { display: flex; gap: 14px; flex-wrap: wrap; }
+.gauge { flex: 1; min-width: 130px; background: var(--surface2); border-radius: 10px; padding: 14px; }
+.gauge-label { font-size: 11px; color: var(--text-dim); margin-bottom: 6px; text-transform: uppercase; letter-spacing: .5px; }
+.gauge-value { font-size: 22px; font-weight: 700; }
+.gauge-bar { height: 5px; border-radius: 3px; background: var(--border); margin-top: 8px; overflow: hidden; }
+.gauge-bar-fill { height: 100%; border-radius: 3px; transition: width .6s ease; }
+.timeline { position: relative; padding-left: 20px; }
+.timeline::before { content: ''; position: absolute; left: 6px; top: 0; bottom: 0; width: 2px; background: var(--border); }
+.timeline-item { position: relative; margin-bottom: 14px; padding-left: 18px; }
+.timeline-item::before { content: ''; position: absolute; left: -18px; top: 6px; width: 10px; height: 10px; border-radius: 50%; border: 2px solid var(--accent); background: var(--bg); }
+.timeline-item.fail::before { border-color: var(--red); }
+.tl-action { font-weight: 600; font-size: 14px; }
+.tl-meta { font-size: 12px; color: var(--text-dim); margin-top: 2px; }
+.mini-table { width: 100%; font-size: 13px; border-collapse: collapse; }
+.mini-table td { padding: 5px 8px; border-bottom: 1px solid var(--border); vertical-align: top; }
+.mini-table td:first-child { color: var(--text-dim); white-space: nowrap; width: 40%; }
+.tag-list { display: flex; flex-wrap: wrap; gap: 6px; }
+.tag { font-size: 12px; padding: 3px 10px; border-radius: 6px; background: var(--surface2); border: 1px solid var(--border); font-family: 'JetBrains Mono', monospace; }
+.tag.green { border-color: rgba(76,222,128,.3); color: var(--green); }
+.tag.pink  { border-color: rgba(244,114,182,.3); color: var(--pink); }
+.tag.amber { border-color: rgba(251,191,36,.3); color: var(--amber); }
+.tag.red   { border-color: rgba(248,113,113,.3); color: var(--red); }
+.tag.match { background: rgba(76,222,128,.15); }
+.tag.miss  { background: rgba(248,113,113,.08); }
+.code-block { background: var(--surface2); border: 1px solid var(--border); border-radius: 8px; padding: 12px 14px; font-family: 'JetBrains Mono', monospace; font-size: 12px; white-space: pre-wrap; word-break: break-all; max-height: 220px; overflow-y: auto; color: var(--text-dim); line-height: 1.6; }
+.progress-grid { display: grid; grid-template-columns: repeat(auto-fill, minmax(150px, 1fr)); gap: 6px; }
+.progress-item { display: flex; align-items: center; gap: 6px; font-size: 12px; }
+.dot { width: 8px; height: 8px; border-radius: 50%; flex-shrink: 0; background: var(--border); }
+.dot.done { background: var(--green); }
+.pop-bar-container { margin-bottom: 10px; }
+.pop-bar-label { font-size: 12px; margin-bottom: 3px; display: flex; justify-content: space-between; }
+.pop-bar { height: 14px; border-radius: 4px; background: var(--surface2); overflow: hidden; }
+.pop-bar-fill { height: 100%; border-radius: 4px; }
+#reward-chart { width: 100%; height: 120px; }
+::-webkit-scrollbar { width: 6px; }
+::-webkit-scrollbar-track { background: transparent; }
+::-webkit-scrollbar-thumb { background: var(--border); border-radius: 3px; }
+.conclusion-card { background: var(--surface2); border: 1px solid var(--border); border-radius: 10px; padding: 14px 16px; margin-bottom: 12px; }
+.conclusion-card .cc-header { display: flex; justify-content: space-between; align-items: center; margin-bottom: 8px; }
+.cc-type { font-size: 11px; padding: 2px 10px; border-radius: 4px; font-weight: 600; text-transform: uppercase; letter-spacing: .5px; }
+.cc-type.causal { background: rgba(244,114,182,.15); color: var(--pink); }
+.cc-type.correlative { background: rgba(96,165,250,.15); color: var(--blue); }
+.cc-type.descriptive { background: rgba(139,144,165,.15); color: var(--text-dim); }
+.cc-conf { font-family: 'JetBrains Mono', monospace; font-size: 13px; font-weight: 600; }
+.cc-claim { font-size: 14px; margin-bottom: 8px; line-height: 1.5; }
+.cc-section-label { font-size: 10px; color: var(--text-dim); text-transform: uppercase; letter-spacing: .5px; margin-bottom: 3px; margin-top: 8px; }
+/* ── control panel ────────────────────────────── */
+.control-panel { background: var(--surface); border: 1px solid var(--border); border-radius: 12px; margin: 20px 28px 0; padding: 18px 20px; }
+.control-panel summary { cursor: pointer; font-size: 13px; font-weight: 600; color: var(--accent); }
+.control-panel[open] summary { margin-bottom: 14px; }
+.form-row { display: flex; gap: 12px; margin-bottom: 10px; flex-wrap: wrap; align-items: end; }
+.form-field { display: flex; flex-direction: column; gap: 4px; }
+.form-field label { font-size: 11px; color: var(--text-dim); text-transform: uppercase; letter-spacing: .5px; }
+.form-field input, .form-field textarea, .form-field select {
+  background: var(--surface2); border: 1px solid var(--border); border-radius: 6px;
+  color: var(--text); padding: 7px 10px; font-size: 13px; font-family: inherit; outline: none;
+}
+.form-field input:focus, .form-field textarea:focus, .form-field select:focus { border-color: var(--accent); }
+.form-field textarea { min-height: 60px; resize: vertical; }
+/* ── final report ─────────────────────────────── */
+.report-overlay { display: none; position: fixed; inset: 0; z-index: 100; background: rgba(12,14,20,.85); backdrop-filter: blur(6px); overflow-y: auto; padding: 40px 20px; }
+.report-overlay.visible { display: flex; justify-content: center; align-items: flex-start; }
+.report-card { background: var(--surface); border: 1px solid var(--border); border-radius: 16px; padding: 32px 36px; max-width: 900px; width: 100%; }
+.report-card h2 { font-size: 22px; font-weight: 700; margin-bottom: 4px; color: var(--text); text-transform: none; letter-spacing: normal; }
+.report-card .subtitle { font-size: 13px; color: var(--text-dim); margin-bottom: 20px; }
+.report-section { margin-bottom: 20px; }
+.report-section h3 { font-size: 12px; color: var(--accent); text-transform: uppercase; letter-spacing: 1px; margin-bottom: 8px; }
+.comparison-row { display: flex; gap: 20px; margin-bottom: 16px; }
+.comparison-col { flex: 1; }
+.comparison-col h4 { font-size: 11px; color: var(--text-dim); text-transform: uppercase; margin-bottom: 6px; }
+.pulse { animation: pulse 1.5s ease-in-out infinite; }
+@keyframes pulse { 0%,100% { opacity: 1; } 50% { opacity: .5; } }
+</style>
+</head>
+<body>
+<div class="header">
+  <h1><span>BioExp</span> Agent Dashboard</h1>
+  <div class="header-right">
+    <span id="thinking-badge" class="mono" style="font-size:11px;color:var(--accent2);display:none">REASONING ON</span>
+    <span id="step-label" class="mono" style="font-size:13px;color:var(--text-dim)">Step 0</span>
+    <span id="status-pill" class="status-pill waiting">Waiting</span>
+    <button class="btn primary" onclick="doRestart()">Restart</button>
+    <button class="btn" onclick="showReport()">Report</button>
+  </div>
+</div>
+<!-- Control Panel (collapsible) -->
+<details class="control-panel" id="control-panel">
+  <summary>New Task / Custom Ground Truth</summary>
+  <div class="form-row">
+    <div class="form-field" style="flex:2">
+      <label>Scenario (leave blank for random)</label>
+      <select id="f-scenario"><option value="">— random —</option></select>
+    </div>
+    <div class="form-field" style="flex:1">
+      <label>True Markers (comma-separated)</label>
+      <input id="f-markers" placeholder="e.g. MYH7, TNNT2, ACTA1" />
+    </div>
+    <div class="form-field" style="flex:1">
+      <label>Causal Mechanisms (comma-separated)</label>
+      <input id="f-mechanisms" placeholder="e.g. sarcomere dysfunction" />
+    </div>
+  </div>
+  <div class="form-row">
+    <div class="form-field" style="flex:2">
+      <label>True Pathways (name:score, comma-sep)</label>
+      <input id="f-pathways" placeholder="e.g. Wnt_signaling:0.8, MAPK:0.6" />
+    </div>
+    <div class="form-field">
+      <button class="btn primary" onclick="doCustomRun()">Run with Ground Truth</button>
+    </div>
+  </div>
+</details>
+<div class="grid">
+  <div class="card span2" id="card-task">
+    <h2>Task</h2>
+    <div id="task-statement" style="font-size:15px;font-weight:500;margin-bottom:8px;">—</div>
+    <div id="task-meta" style="font-size:13px;color:var(--text-dim)"></div>
+  </div>
+  <div class="card">
+    <h2>Reward</h2>
+    <div id="reward-value" class="mono" style="font-size:32px;font-weight:700;margin-bottom:6px;">0.000</div>
+    <canvas id="reward-chart"></canvas>
+  </div>
+  <div class="card span3"><h2>Resources</h2><div class="gauge-row" id="gauges"></div></div>
+  <div class="card span2" style="max-height:460px;overflow-y:auto">
+    <h2>Pipeline History <span style="color:var(--accent);font-size:10px">OBSERVABLE</span></h2>
+    <div class="timeline" id="timeline"></div>
+  </div>
+  <div class="card">
+    <h2>Current Action</h2>
+    <table class="mini-table" id="action-table"><tbody></tbody></table>
+    <h2 style="margin-top:14px" id="thinking-header" style="display:none">Model Reasoning</h2>
+    <div class="code-block" id="model-thinking" style="display:none;border-color:rgba(124,108,240,.2);max-height:140px;margin-bottom:10px">—</div>
+    <h2 style="margin-top:10px">Model Raw Output</h2>
+    <div class="code-block" id="model-response">—</div>
+  </div>
+  <div class="card">
+    <h2>Discovered Markers <span style="color:var(--accent);font-size:10px">OBSERVABLE</span></h2>
+    <div class="tag-list" id="markers-list"><span class="tag" style="color:var(--text-dim)">none yet</span></div>
+    <h2 style="margin-top:14px">Candidate Mechanisms</h2>
+    <div class="tag-list" id="mechanisms-list"><span class="tag" style="color:var(--text-dim)">none yet</span></div>
+  </div>
+  <div class="card">
+    <h2>Rule Violations</h2>
+    <div id="violations" style="font-size:13px;color:var(--text-dim)">None</div>
+    <h2 style="margin-top:14px">Uncertainty Summary</h2>
+    <table class="mini-table" id="uncertainty-table"><tbody></tbody></table>
+    <h2 style="margin-top:14px">Reward Breakdown</h2>
+    <table class="mini-table" id="reward-breakdown-table"><tbody></tbody></table>
+  </div>
+  <div class="card">
+    <h2>Latest Output</h2>
+    <table class="mini-table" id="output-table"><tbody></tbody></table>
+    <div class="code-block" id="output-data" style="margin-top:10px;max-height:140px">—</div>
+  </div>
+  <div class="card span3" id="card-conclusions" style="display:none;border-color:rgba(76,222,128,.25)">
+    <h2 style="color:var(--green)">Synthesized Conclusions</h2>
+    <div id="conclusions-list"></div>
+  </div>
+  <!-- Ground Truth Comparison (shown when episode done + has conclusions) -->
+  <div class="card span3" id="card-gt-comparison" style="display:none;border-color:rgba(251,191,36,.25)">
+    <h2 style="color:var(--amber)">Ground Truth Comparison</h2>
+    <div class="comparison-row">
+      <div class="comparison-col">
+        <h4>Agent's Markers</h4>
+        <div class="tag-list" id="gt-agent-markers"></div>
+      </div>
+      <div class="comparison-col">
+        <h4>True Markers</h4>
+        <div class="tag-list" id="gt-true-markers"></div>
+      </div>
+    </div>
+    <div class="comparison-row">
+      <div class="comparison-col">
+        <h4>Agent's Mechanisms</h4>
+        <div class="tag-list" id="gt-agent-mechs"></div>
+      </div>
+      <div class="comparison-col">
+        <h4>True Mechanisms</h4>
+        <div class="tag-list" id="gt-true-mechs"></div>
+      </div>
+    </div>
+    <div id="gt-score" style="margin-top:8px;font-size:14px;font-weight:600"></div>
+  </div>
+  <div class="card" style="border-color:rgba(124,108,240,.25)">
+    <h2 style="color:var(--accent2)">Cell Populations <span style="font-size:10px">HIDDEN</span></h2>
+    <div id="populations"></div>
+  </div>
+  <div class="card" style="border-color:rgba(124,108,240,.25)">
+    <h2 style="color:var(--accent2)">Ground Truth <span style="font-size:10px">HIDDEN</span></h2>
+    <div style="margin-bottom:8px"><span style="font-size:11px;color:var(--text-dim);text-transform:uppercase">True Markers</span><div class="tag-list" id="true-markers" style="margin-top:4px"></div></div>
+    <div style="margin-bottom:8px"><span style="font-size:11px;color:var(--text-dim);text-transform:uppercase">Causal Mechanisms</span><div class="tag-list" id="true-mechanisms" style="margin-top:4px"></div></div>
+    <div><span style="font-size:11px;color:var(--text-dim);text-transform:uppercase">Top Pathways</span><table class="mini-table" id="pathways-table" style="margin-top:4px"><tbody></tbody></table></div>
+  </div>
+  <div class="card" style="border-color:rgba(124,108,240,.25)">
+    <h2 style="color:var(--accent2)">Technical State <span style="font-size:10px">HIDDEN</span></h2>
+    <table class="mini-table" id="technical-table"><tbody></tbody></table>
+    <h2 style="margin-top:14px;color:var(--accent2)">Failure Conditions <span style="font-size:10px">HIDDEN</span></h2>
+    <div class="tag-list" id="failure-conditions"></div>
+  </div>
+  <div class="card span3" style="border-color:rgba(124,108,240,.25)">
+    <h2 style="color:var(--accent2)">Experiment Progress <span style="font-size:10px">HIDDEN</span></h2>
+    <div class="progress-grid" id="progress-grid"></div>
+  </div>
+</div>
+<!-- Final Report Overlay -->
+<div class="report-overlay" id="report-overlay" onclick="if(event.target===this)hideReport()">
+  <div class="report-card" id="report-content"></div>
+</div>
+<script>
+const POLL_MS = 1200;
+const POP_COLORS = ['#5ce0d8','#7c6cf0','#f472b6','#60a5fa','#fbbf24','#4ade80','#f87171','#c084fc','#fb923c','#38bdf8'];
+let rewardHistory = [];
+let lastTimestamp = 0;
+let latestState = null;
+function $(id) { return document.getElementById(id); }
+function setHTML(id, html) { $(id).innerHTML = html; }
+function tagsHTML(arr, cls) {
+  if (!arr || !arr.length) return '<span class="tag" style="color:var(--text-dim)">—</span>';
+  return arr.map(t => `<span class="tag ${cls||''}">${esc(t)}</span>`).join('');
+}
+function esc(s) { if (s == null) return '—'; const d = document.createElement('div'); d.textContent = String(s); return d.innerHTML; }
+function pct(used, total) { if (!total) return 0; return Math.min(100, Math.max(0, (used / total) * 100)); }
+function gaugeColor(p) { return p < 50 ? 'var(--green)' : p < 80 ? 'var(--amber)' : 'var(--red)'; }
+function fmt(n) { if (n == null) return '0'; return Number(n).toLocaleString('en-US', { maximumFractionDigits: 0 }); }
+function uniqueItems(arr) {
+  const out = [];
+  const seen = new Set();
+  (arr || []).forEach(item => {
+    if (item == null) return;
+    const text = String(item).trim();
+    if (!text) return;
+    const key = text.toUpperCase();
+    if (seen.has(key)) return;
+    seen.add(key);
+    out.push(text);
+  });
+  return out;
+}
+function gauge(label, value, pctVal, inv) {
+  let bar = '';
+  if (pctVal != null) { const c = inv ? gaugeColor(100-pctVal) : gaugeColor(pctVal); bar = `<div class="gauge-bar"><div class="gauge-bar-fill" style="width:${pctVal.toFixed(1)}%;background:${c}"></div></div>`; }
+  return `<div class="gauge"><div class="gauge-label">${label}</div><div class="gauge-value mono">${value}</div>${bar}</div>`;
+}
+function miniRows(obj) { return Object.entries(obj).map(([k,v]) => `<tr><td>${esc(k)}</td><td>${esc(v)}</td></tr>`).join(''); }
+function drawRewardChart(canvas, data) {
+  const ctx = canvas.getContext('2d'); const W = canvas.width = canvas.offsetWidth * 2; const H = canvas.height = canvas.offsetHeight * 2;
+  ctx.clearRect(0, 0, W, H); if (data.length < 2) return;
+  const vals = data.map(d => d.v); const minV = Math.min(0, ...vals); const maxV = Math.max(0.1, ...vals); const range = maxV - minV || 1; const pad = 8;
+  ctx.strokeStyle = 'rgba(92,224,216,.4)'; ctx.lineWidth = 2; ctx.beginPath();
+  const yZ = H - pad - ((0 - minV) / range) * (H - 2*pad); ctx.moveTo(pad, yZ); ctx.lineTo(W-pad, yZ); ctx.stroke();
+  ctx.strokeStyle = '#5ce0d8'; ctx.lineWidth = 3; ctx.beginPath();
+  data.forEach((d,i) => { const x = pad+(i/(data.length-1))*(W-2*pad); const y = H-pad-((d.v-minV)/range)*(H-2*pad); i===0?ctx.moveTo(x,y):ctx.lineTo(x,y); }); ctx.stroke();
+  data.forEach((d,i) => { const x = pad+(i/(data.length-1))*(W-2*pad); const y = H-pad-((d.v-minV)/range)*(H-2*pad); ctx.fillStyle = d.v>=0?'#4ade80':'#f87171'; ctx.beginPath(); ctx.arc(x,y,5,0,Math.PI*2); ctx.fill(); });
+}
+function comparedTags(agentArr, trueArr, cls) {
+  if (!agentArr || !agentArr.length) return '<span class="tag" style="color:var(--text-dim)">—</span>';
+  const trueSet = new Set((trueArr||[]).map(t => t.toUpperCase()));
+  return agentArr.map(t => {
+    const hit = trueSet.has(t.toUpperCase());
+    return `<span class="tag ${cls} ${hit?'match':'miss'}">${esc(t)} ${hit?'✓':'✗'}</span>`;
+  }).join('');
+}
+// ── API actions ──
+async function doRestart() {
+  rewardHistory = []; lastTimestamp = 0;
+  await fetch('/api/restart', { method: 'POST' });
+}
+async function doCustomRun() {
+  const scenario = $('f-scenario').value || undefined;
+  const markers = $('f-markers').value.split(',').map(s=>s.trim()).filter(Boolean);
+  const mechs = $('f-mechanisms').value.split(',').map(s=>s.trim()).filter(Boolean);
+  const pwRaw = $('f-pathways').value.split(',').map(s=>s.trim()).filter(Boolean);
+  const pathways = {};
+  pwRaw.forEach(p => { const [k,v] = p.split(':'); if (k && v) pathways[k.trim()] = parseFloat(v); });
+  const gt = {};
+  if (markers.length) gt.true_markers = markers;
+  if (mechs.length) gt.causal_mechanisms = mechs;
+  if (Object.keys(pathways).length) gt.true_pathways = pathways;
+  rewardHistory = []; lastTimestamp = 0;
+  await fetch('/api/run', { method: 'POST', headers: {'Content-Type':'application/json'}, body: JSON.stringify({ scenario_name: scenario, ground_truth: Object.keys(gt).length ? gt : undefined }) });
+}
+function showReport() {
+  const s = latestState; if (!s) return;
+  const rc = $('report-content');
+  const t = s.task || {};
+  const lat = s.latent || {};
+  const conc = s.conclusions || [];
+  const trueM = lat.true_markers || [];
+  const trueMech = lat.causal_mechanisms || [];
+  const conclusionMarkers = uniqueItems(conc.flatMap(c => c.top_markers || []));
+  const conclusionMechanisms = uniqueItems(conc.flatMap(c => c.causal_mechanisms || []));
+  const agentM = uniqueItems((s.discovered_markers && s.discovered_markers.length) ? s.discovered_markers : conclusionMarkers);
+  const agentMechanisms = uniqueItems((s.candidate_mechanisms && s.candidate_mechanisms.length) ? s.candidate_mechanisms : conclusionMechanisms);
+  const markerHits = agentM.filter(m => trueM.some(t => t.toUpperCase() === m.toUpperCase()));
+  const r = s.resources || {};
+  let html = `<h2>Experiment Report</h2>
+  <div class="subtitle">${esc(t.problem_statement)}</div>
+  <div class="report-section"><h3>Summary</h3>
+    <table class="mini-table"><tbody>
+      <tr><td>Status</td><td>${s.episode_done ? 'Completed' : 'In Progress'}</td></tr>
+      <tr><td>Steps</td><td>${s.step}</td></tr>
+      <tr><td>Cumulative Reward</td><td style="color:${(s.cumulative_reward||0)>=0?'var(--green)':'var(--red)'}">${((s.cumulative_reward||0)>=0?'+':'')}${(s.cumulative_reward||0).toFixed(3)}</td></tr>
+      <tr><td>Budget Used</td><td>$${fmt(r.budget_used)} / $${fmt((r.budget_used||0)+(r.budget_remaining||0))}</td></tr>
+      <tr><td>Time Used</td><td>${(r.time_used_days||0).toFixed(0)}d / ${((r.time_used_days||0)+(r.time_remaining_days||0)).toFixed(0)}d</td></tr>
+      <tr><td>Markers Found</td><td>${agentM.length} (${markerHits.length} match ground truth)</td></tr>
+    </tbody></table>
+  </div>`;
+  if (conc.length) {
+    html += `<div class="report-section"><h3>Conclusions</h3>`;
+    conc.forEach(c => {
+      html += `<div class="conclusion-card"><div class="cc-header"><span class="cc-type ${(c.claim_type||'').toLowerCase()}">${esc(c.claim_type)}</span><span class="cc-conf" style="color:${c.confidence>=.7?'var(--green)':c.confidence>=.4?'var(--amber)':'var(--red)'}">${((c.confidence||0)*100).toFixed(0)}%</span></div>`;
+      if (c.claim) html += `<div class="cc-claim">${esc(c.claim)}</div>`;
+      if (c.top_markers?.length) html += `<div class="cc-section-label">Top Markers</div><div class="tag-list">${c.top_markers.map(m=>`<span class="tag green">${esc(m)}</span>`).join('')}</div>`;
+      if (c.causal_mechanisms?.length) html += `<div class="cc-section-label">Causal Mechanisms</div><div class="tag-list">${c.causal_mechanisms.map(m=>`<span class="tag pink">${esc(m)}</span>`).join('')}</div>`;
+      if (c.predicted_pathways && Object.keys(c.predicted_pathways).length) html += `<div class="cc-section-label">Predicted Pathways</div><table class="mini-table"><tbody>${Object.entries(c.predicted_pathways).map(([k,v])=>`<tr><td>${esc(k)}</td><td>${Number(v).toFixed(3)}</td></tr>`).join('')}</tbody></table>`;
+      html += `</div>`;
+    });
+    html += `</div>`;
+  }
+  html += `<div class="report-section"><h3>Ground Truth Comparison</h3>
+    <div class="comparison-row"><div class="comparison-col"><h4>Agent's Markers</h4><div class="tag-list">${comparedTags(agentM, trueM, 'green')}</div></div>
+    <div class="comparison-col"><h4>True Markers</h4><div class="tag-list">${tagsHTML(trueM,'green')}</div></div></div>
+    <div class="comparison-row"><div class="comparison-col"><h4>Agent's Mechanisms</h4><div class="tag-list">${comparedTags(agentMechanisms, trueMech, 'pink')}</div></div>
+    <div class="comparison-col"><h4>True Mechanisms</h4><div class="tag-list">${tagsHTML(trueMech,'pink')}</div></div></div>
+  </div>`;
+  const hist = s.pipeline_history || [];
+  if (hist.length) {
+    html += `<div class="report-section"><h3>Pipeline Steps</h3><table class="mini-table"><tbody>`;
+    hist.forEach(h => { html += `<tr><td>${h.success?'✓':'✗'} ${esc(h.action_type)}</td><td>${esc(h.output_summary)} · q=${h.quality_score}</td></tr>`; });
+    html += `</tbody></table></div>`;
+  }
+  html += `<div style="margin-top:20px;text-align:right"><button class="btn" onclick="hideReport()">Close</button> <button class="btn primary" onclick="doRestart();hideReport()">New Run</button></div>`;
+  rc.innerHTML = html;
+  $('report-overlay').classList.add('visible');
+}
+function hideReport() { $('report-overlay').classList.remove('visible'); }
+function renderState(s) {
+  latestState = s;
+  if (s.error) { $('status-pill').className='status-pill waiting'; $('status-pill').textContent='Waiting'; $('task-statement').textContent=s.error; return; }
+  const pill = $('status-pill');
+  if (s.episode_done) { pill.className='status-pill done'; pill.textContent='Done'; } else { pill.className='status-pill live'; pill.textContent='Live'; }
+  $('step-label').textContent = `Step ${s.step}`;
+  if (s.thinking_enabled) { $('thinking-badge').style.display = ''; } else { $('thinking-badge').style.display = 'none'; }
+  const t = s.task || {};
+  $('task-statement').textContent = t.problem_statement || '—';
+  $('task-meta').innerHTML = [t.organism, t.tissue, t.modality, t.conditions ? t.conditions.join(' vs ') : null].filter(Boolean).map(v => `<span class="tag">${esc(v)}</span>`).join(' ');
+  const cum = s.cumulative_reward || 0;
+  $('reward-value').textContent = (cum >= 0 ? '+' : '') + cum.toFixed(3);
+  $('reward-value').style.color = cum >= 0 ? 'var(--green)' : 'var(--red)';
+  if (s.timestamp !== lastTimestamp && s.step > 0) { rewardHistory.push({ step: s.step, v: cum }); lastTimestamp = s.timestamp; }
+  drawRewardChart($('reward-chart'), rewardHistory);
+  const r = s.resources || {};
+  const bT = (r.budget_used||0)+(r.budget_remaining||0), tT = (r.time_used_days||0)+(r.time_remaining_days||0);
+  const bP = pct(r.budget_used, bT), tP = pct(r.time_used_days, tT);
+  $('gauges').innerHTML = [gauge('Budget Used',`$${fmt(r.budget_used)}`,bP), gauge('Budget Left',`$${fmt(r.budget_remaining)}`,100-bP,true), gauge('Time Used',`${(r.time_used_days||0).toFixed(0)}d`,tP), gauge('Time Left',`${(r.time_remaining_days||0).toFixed(0)}d`,100-tP,true), gauge('Samples',String(r.samples_consumed||0),null), gauge('Compute',`${(r.compute_hours_used||0).toFixed(1)}h`,null)].join('');
+  const hist = s.pipeline_history || [];
+  $('timeline').innerHTML = hist.length ? hist.map(h => `<div class="timeline-item ${!h.success?'fail':''}"><div class="tl-action">${esc(h.action_type)}${h.method?` <span style="color:var(--text-dim);font-weight:400;font-size:12px">${esc(h.method)}</span>`:''}</div><div class="tl-meta">${h.success?'✓':'✗'} ${esc(h.output_summary)} · q=${h.quality_score} · $${fmt(h.resource_cost)} · ${h.time_cost_days}d</div></div>`).join('') : '<div style="color:var(--text-dim);font-size:13px">No steps yet</div>';
+  const a = s.current_action;
+  if (a) { $('action-table').querySelector('tbody').innerHTML = miniRows({'Type':a.action_type,'Method':a.method||'—','Confidence':a.confidence?.toFixed(2),'Justification':a.justification||'—','Fallback?':s.used_fallback?'YES':'no'}); }
+  if (s.model_thinking) { $('model-thinking').style.display=''; $('model-thinking').textContent = s.model_thinking; } else { $('model-thinking').style.display='none'; }
+  $('model-response').textContent = s.model_response_raw || '—';
+  setHTML('markers-list', tagsHTML(s.discovered_markers, 'green'));
+  setHTML('mechanisms-list', tagsHTML(s.candidate_mechanisms, 'pink'));
+  const v = s.rule_violations || [];
+  $('violations').innerHTML = v.length ? v.map(x=>`<div class="tag red" style="margin-bottom:4px">${esc(x)}</div>`).join('') : '<span style="color:var(--text-dim)">None</span>';
+  $('uncertainty-table').querySelector('tbody').innerHTML = miniRows(s.uncertainty_summary || {});
+  const rb = s.reward_breakdown || {};
+  $('reward-breakdown-table').querySelector('tbody').innerHTML = miniRows(Object.fromEntries(Object.entries(rb).map(([k,v])=>[k,(v>=0?'+':'')+v.toFixed(4)])));
+  const lo = s.latest_output;
+  if (lo) { $('output-table').querySelector('tbody').innerHTML = miniRows({'Summary':lo.summary,'Success':lo.success?'✓':'✗','Quality':lo.quality_score,'Uncertainty':lo.uncertainty,'Warnings':(lo.warnings||[]).join('; ')||'—'}); $('output-data').textContent = lo.data_preview||'—'; }
+  const conc = s.conclusions || [];
+  if (conc.length) {
+    $('card-conclusions').style.display = '';
+    $('conclusions-list').innerHTML = conc.map(c => {
+      const confColor = c.confidence>=.7?'var(--green)':c.confidence>=.4?'var(--amber)':'var(--red)';
+      let h = `<div class="conclusion-card"><div class="cc-header"><span class="cc-type ${(c.claim_type||'').toLowerCase()}">${esc(c.claim_type||'unknown')}</span><span class="cc-conf" style="color:${confColor}">${((c.confidence||0)*100).toFixed(0)}%</span></div>`;
+      if (c.claim) h += `<div class="cc-claim">${esc(c.claim)}</div>`;
+      if (c.top_markers?.length) h += `<div class="cc-section-label">Top Markers</div><div class="tag-list">${c.top_markers.map(m=>`<span class="tag green">${esc(m)}</span>`).join('')}</div>`;
+      if (c.causal_mechanisms?.length) h += `<div class="cc-section-label">Causal Mechanisms</div><div class="tag-list">${c.causal_mechanisms.map(m=>`<span class="tag pink">${esc(m)}</span>`).join('')}</div>`;
+      if (c.predicted_pathways && Object.keys(c.predicted_pathways).length) h += `<div class="cc-section-label">Predicted Pathways</div><table class="mini-table"><tbody>${Object.entries(c.predicted_pathways).map(([k,v])=>`<tr><td>${esc(k)}</td><td>${Number(v).toFixed(3)}</td></tr>`).join('')}</tbody></table>`;
+      return h + '</div>';
+    }).join('');
+  } else { $('card-conclusions').style.display = 'none'; }
+  // Ground truth comparison (visible when done or has conclusions)
+  const lat = s.latent;
+  if ((s.episode_done || conc.length) && lat) {
+    const conclusionMarkers = uniqueItems(conc.flatMap(c => c.top_markers || []));
+    const conclusionMechanisms = uniqueItems(conc.flatMap(c => c.causal_mechanisms || []));
+    const comparisonMarkers = uniqueItems((s.discovered_markers && s.discovered_markers.length) ? s.discovered_markers : conclusionMarkers);
+    const comparisonMechanisms = uniqueItems((s.candidate_mechanisms && s.candidate_mechanisms.length) ? s.candidate_mechanisms : conclusionMechanisms);
+    $('card-gt-comparison').style.display = '';
+    setHTML('gt-agent-markers', comparedTags(comparisonMarkers, lat.true_markers, 'green'));
+    setHTML('gt-true-markers', tagsHTML(lat.true_markers, 'green'));
+    setHTML('gt-agent-mechs', comparedTags(comparisonMechanisms, lat.causal_mechanisms, 'pink'));
+    setHTML('gt-true-mechs', tagsHTML(lat.causal_mechanisms, 'pink'));
+    const hits = comparisonMarkers.filter(m => (lat.true_markers||[]).some(t => t.toUpperCase()===m.toUpperCase()));
+    $('gt-score').innerHTML = `Marker accuracy: <span style="color:var(--accent)">${hits.length}</span> / ${(lat.true_markers||[]).length} true markers recovered`;
+  } else { $('card-gt-comparison').style.display = 'none'; }
+  if (!lat) return;
+  const pops = lat.cell_populations || [];
+  $('populations').innerHTML = pops.map((p,i) => { const c = POP_COLORS[i%POP_COLORS.length]; const w = (p.proportion*100).toFixed(1); return `<div class="pop-bar-container"><div class="pop-bar-label"><span>${esc(p.name)} <span style="color:var(--text-dim);font-size:11px">${p.state}</span></span><span class="mono" style="font-size:12px">${w}%</span></div><div class="pop-bar"><div class="pop-bar-fill" style="width:${w}%;background:${c}"></div></div><div class="tag-list" style="margin-top:3px">${p.marker_genes.map(g=>`<span class="tag" style="font-size:11px">${esc(g)}</span>`).join('')}</div></div>`; }).join('') || '<span style="color:var(--text-dim)">—</span>';
+  setHTML('true-markers', tagsHTML(lat.true_markers, 'green'));
+  setHTML('true-mechanisms', tagsHTML(lat.causal_mechanisms, 'pink'));
+  const pw = lat.true_pathways || {};
+  $('pathways-table').querySelector('tbody').innerHTML = miniRows(Object.fromEntries(Object.entries(pw).slice(0,10).map(([k,v])=>[k,v.toFixed(3)])));
+  $('technical-table').querySelector('tbody').innerHTML = miniRows(lat.technical || {});
+  setHTML('failure-conditions', tagsHTML(lat.hidden_failure_conditions, 'red'));
+  const prog = lat.progress || {};
+  const bK = Object.entries(prog).filter(([,v])=>typeof v==='boolean'), nK = Object.entries(prog).filter(([,v])=>typeof v!=='boolean');
+  $('progress-grid').innerHTML = bK.map(([k,v])=>`<div class="progress-item"><div class="dot ${v?'done':''}"></div>${k.replace(/_/g,' ')}</div>`).join('') + nK.map(([k,v])=>`<div class="progress-item" style="color:var(--accent)"><span class="mono" style="font-size:11px;margin-right:4px">${v??'—'}</span>${k.replace(/_/g,' ')}</div>`).join('');
+  if (s.episode_done && !reportShownForTimestamp && s.timestamp) { reportShownForTimestamp = s.timestamp; setTimeout(showReport, 800); }
+}
+let reportShownForTimestamp = null;
+async function loadScenarios() {
+  try {
+    const res = await fetch('/api/scenarios');
+    const data = await res.json();
+    const sel = $('f-scenario');
+    (data.scenarios || []).forEach(n => { const o = document.createElement('option'); o.value = n; o.textContent = n; sel.appendChild(o); });
+  } catch(e) {}
+}
+async function poll() {
+  try { const res = await fetch('/api/state',{cache:'no-store'}); const data = await res.json(); renderState(data); } catch(e) {}
+  setTimeout(poll, POLL_MS);
+}
+loadScenarios();
+poll();
+</script>
+</body>
+</html>

dashboard.py ADDED Viewed

	@@ -0,0 +1,129 @@

+"""Lightweight dashboard server for the drug-target-validation agent.
+No external dependencies — uses only the Python standard library.
+Usage:
+    python dashboard.py          # serves on http://localhost:8050
+    python dashboard.py --port 9000
+"""
+from __future__ import annotations
+import argparse
+import json
+from http.server import HTTPServer, SimpleHTTPRequestHandler
+from pathlib import Path
+ROOT = Path(__file__).parent
+STATE_FILE = ROOT / "_dashboard_state.json"
+CMD_FILE = ROOT / "_dashboard_cmd.json"
+DASHBOARD_HTML = ROOT / "dashboard.html"
+class DashboardHandler(SimpleHTTPRequestHandler):
+    def do_GET(self):
+        if self.path == "/" or self.path == "/index.html":
+            self._serve_file(DASHBOARD_HTML, "text/html")
+        elif self.path == "/api/state":
+            self._serve_state()
+        elif self.path == "/api/scenarios":
+            self._serve_scenarios()
+        else:
+            self.send_error(404)
+    def do_POST(self):
+        if self.path == "/api/restart":
+            self._handle_command({"action": "restart"})
+        elif self.path == "/api/run":
+            body = self._read_body()
+            if body is None:
+                return
+            body["action"] = "restart"
+            self._handle_command(body)
+        else:
+            self.send_error(404)
+    def do_OPTIONS(self):
+        self.send_response(204)
+        self.send_header("Access-Control-Allow-Origin", "*")
+        self.send_header("Access-Control-Allow-Methods", "GET, POST, OPTIONS")
+        self.send_header("Access-Control-Allow-Headers", "Content-Type")
+        self.end_headers()
+    def _read_body(self):
+        length = int(self.headers.get("Content-Length", 0))
+        if length == 0:
+            return {}
+        raw = self.rfile.read(length)
+        try:
+            return json.loads(raw)
+        except json.JSONDecodeError:
+            self._json_response(400, {"error": "Invalid JSON"})
+            return None
+    def _handle_command(self, cmd: dict):
+        CMD_FILE.write_text(json.dumps(cmd), encoding="utf-8")
+        self._json_response(200, {"ok": True, "command": cmd.get("action")})
+    def _serve_state(self):
+        self.send_response(200)
+        self.send_header("Content-Type", "application/json")
+        self.send_header("Access-Control-Allow-Origin", "*")
+        self.send_header("Cache-Control", "no-cache")
+        self.end_headers()
+        try:
+            data = STATE_FILE.read_bytes()
+        except FileNotFoundError:
+            data = b'{"error": "No state file yet. Run run_agent.py to start an episode."}'
+        self.wfile.write(data)
+    def _serve_scenarios(self):
+        try:
+            from server.tasks.scenarios import SCENARIO_LIBRARY
+            names = [s.name for s in SCENARIO_LIBRARY]
+        except Exception:
+            names = []
+        self._json_response(200, {"scenarios": names})
+    def _serve_file(self, path: Path, content_type: str):
+        try:
+            body = path.read_bytes()
+        except FileNotFoundError:
+            self.send_error(404, f"{path.name} not found")
+            return
+        self.send_response(200)
+        self.send_header("Content-Type", content_type)
+        self.send_header("Content-Length", str(len(body)))
+        self.end_headers()
+        self.wfile.write(body)
+    def _json_response(self, code: int, obj: dict):
+        body = json.dumps(obj).encode()
+        self.send_response(code)
+        self.send_header("Content-Type", "application/json")
+        self.send_header("Access-Control-Allow-Origin", "*")
+        self.send_header("Content-Length", str(len(body)))
+        self.end_headers()
+        self.wfile.write(body)
+    def log_message(self, format, *args):
+        pass
+def main():
+    parser = argparse.ArgumentParser(description="Drug-target-validation dashboard server")
+    parser.add_argument("--port", type=int, default=8050)
+    args = parser.parse_args()
+    server = HTTPServer(("0.0.0.0", args.port), DashboardHandler)
+    print(f"Dashboard running at  http://localhost:{args.port}")
+    print("Waiting for agent state from run_agent.py ...")
+    try:
+        server.serve_forever()
+    except KeyboardInterrupt:
+        print("\nShutting down.")
+        server.server_close()
+if __name__ == "__main__":
+    main()

demo.html ADDED Viewed

	@@ -0,0 +1,1639 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>BioEnv</title>
+<style>
+  @import url('https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700;800&family=JetBrains+Mono:wght@400;500;600&display=swap');
+  :root {
+    --bg: #07090d;
+    --bg-surface: #0c0f16;
+    --bg-raised: #111827;
+    --bg-hover: #1a2235;
+    --border: #1e293b;
+    --border-active: #334155;
+    --text: #e2e8f0;
+    --text-dim: #94a3b8;
+    --text-muted: #475569;
+    --accent: #38bdf8;
+    --accent-dim: rgba(56,189,248,0.12);
+    --green: #34d399;
+    --green-dim: rgba(52,211,153,0.10);
+    --amber: #fbbf24;
+    --amber-dim: rgba(251,191,36,0.10);
+    --red: #f87171;
+    --red-dim: rgba(248,113,113,0.10);
+    --cyan: #22d3ee;
+    --cyan-dim: rgba(34,211,238,0.10);
+    --pink: #f472b6;
+    --purple: #a78bfa;
+  }
+  * { margin: 0; padding: 0; box-sizing: border-box; }
+  html, body { height: 100%; overflow: hidden; }
+  body {
+    font-family: 'Inter', -apple-system, sans-serif;
+    background: var(--bg);
+    color: var(--text);
+    display: flex;
+    flex-direction: column;
+  }
+  /* ---- Top Bar ---- */
+  .topbar {
+    height: 48px;
+    min-height: 48px;
+    background: var(--bg-surface);
+    border-bottom: 1px solid var(--border);
+    display: flex;
+    align-items: center;
+    padding: 0 20px;
+    gap: 16px;
+    z-index: 10;
+  }
+  .topbar-logo {
+    font-size: 15px;
+    font-weight: 800;
+    letter-spacing: -0.5px;
+    background: linear-gradient(135deg, #38bdf8, #22d3ee);
+    -webkit-background-clip: text;
+    -webkit-text-fill-color: transparent;
+  }
+  .topbar-sep { width: 1px; height: 20px; background: var(--border); }
+  .topbar-env {
+    font-size: 12px;
+    color: var(--text-dim);
+    font-family: 'JetBrains Mono', monospace;
+  }
+  .topbar-status {
+    display: flex;
+    align-items: center;
+    gap: 6px;
+    margin-left: auto;
+    font-size: 12px;
+    color: var(--text-dim);
+  }
+  .status-dot {
+    width: 7px; height: 7px;
+    border-radius: 50%;
+    background: var(--text-muted);
+  }
+  .status-dot.live {
+    background: var(--green);
+    box-shadow: 0 0 8px var(--green);
+    animation: pulse 2s infinite;
+  }
+  @keyframes pulse {
+    0%, 100% { opacity: 1; }
+    50% { opacity: 0.5; }
+  }
+  .topbar-btn {
+    font-size: 12px;
+    font-weight: 600;
+    padding: 6px 14px;
+    border-radius: 6px;
+    border: none;
+    cursor: pointer;
+    transition: all 0.15s;
+    font-family: inherit;
+  }
+  .btn-primary { background: var(--accent); color: #07090d; font-weight: 700; }
+  .btn-primary:hover { background: #7dd3fc; }
+  .btn-primary:disabled { opacity: 0.4; cursor: not-allowed; }
+  .btn-ghost {
+    background: transparent;
+    color: var(--text-dim);
+    border: 1px solid var(--border);
+  }
+  .btn-ghost:hover { background: var(--bg-hover); color: var(--text); }
+  /* ---- Main Layout ---- */
+  .main {
+    flex: 1;
+    display: grid;
+    grid-template-columns: 260px 1fr 340px;
+    overflow: hidden;
+  }
+  /* ---- Left Sidebar ---- */
+  .sidebar {
+    background: var(--bg-surface);
+    border-right: 1px solid var(--border);
+    display: flex;
+    flex-direction: column;
+    overflow-y: auto;
+  }
+  .sidebar-section {
+    padding: 16px;
+    border-bottom: 1px solid var(--border);
+  }
+  .sidebar-heading {
+    font-size: 10px;
+    font-weight: 600;
+    text-transform: uppercase;
+    letter-spacing: 1.5px;
+    color: var(--text-muted);
+    margin-bottom: 10px;
+  }
+  .scenario-list { display: flex; flex-direction: column; gap: 4px; }
+  .scenario-opt {
+    display: flex;
+    align-items: center;
+    gap: 10px;
+    padding: 8px 10px;
+    border-radius: 6px;
+    cursor: pointer;
+    transition: all 0.15s;
+    border: 1px solid transparent;
+  }
+  .scenario-opt:hover { background: var(--bg-hover); }
+  .scenario-opt.active {
+    background: var(--accent-dim);
+    border-color: rgba(56,189,248,0.2);
+  }
+  .scenario-opt .sc-dot { width: 8px; height: 8px; border-radius: 50%; flex-shrink: 0; }
+  .scenario-opt .sc-name {
+    font-size: 12px; font-weight: 500; flex: 1;
+    white-space: nowrap; overflow: hidden; text-overflow: ellipsis;
+  }
+  .scenario-opt .sc-diff {
+    font-size: 10px; font-weight: 600;
+    text-transform: uppercase; letter-spacing: 0.5px;
+  }
+  .gauge { margin-bottom: 14px; }
+  .gauge:last-child { margin-bottom: 0; }
+  .gauge-header {
+    display: flex; justify-content: space-between;
+    align-items: baseline; margin-bottom: 6px;
+  }
+  .gauge-label { font-size: 12px; color: var(--text-dim); font-weight: 500; }
+  .gauge-value {
+    font-size: 12px; font-weight: 600;
+    font-family: 'JetBrains Mono', monospace;
+  }
+  .gauge-track {
+    height: 4px; background: var(--bg-hover);
+    border-radius: 4px; overflow: hidden;
+  }
+  .gauge-fill {
+    height: 100%; border-radius: 4px;
+    transition: width 0.8s cubic-bezier(0.4,0,0.2,1);
+  }
+  .pipeline-steps { display: flex; flex-direction: column; gap: 2px; }
+  .pipe-step {
+    display: flex; align-items: center; gap: 8px;
+    padding: 5px 8px; border-radius: 4px;
+    font-size: 11px; font-family: 'JetBrains Mono', monospace;
+    color: var(--text-muted);
+    opacity: 0; transform: translateX(-8px);
+    transition: all 0.3s ease;
+  }
+  .pipe-step.visible { opacity: 1; transform: translateX(0); }
+  .pipe-step.active { color: var(--text); background: var(--accent-dim); }
+  .pipe-step.done { color: var(--text-dim); }
+  .pipe-step .step-icon {
+    width: 16px; height: 16px; border-radius: 50%;
+    border: 1.5px solid var(--text-muted);
+    display: flex; align-items: center; justify-content: center;
+    font-size: 8px; flex-shrink: 0; transition: all 0.3s;
+  }
+  .pipe-step.done .step-icon {
+    background: var(--green-dim); border-color: var(--green); color: var(--green);
+  }
+  .pipe-step.active .step-icon {
+    border-color: var(--accent); background: var(--accent-dim);
+    color: var(--accent); animation: pulse 1.5s infinite;
+  }
+  /* ---- Center: Lab + Terminal ---- */
+  .center {
+    display: flex;
+    flex-direction: column;
+    overflow: hidden;
+    background: var(--bg);
+  }
+  /* Lab canvas */
+  .lab-panel {
+    height: 300px;
+    min-height: 300px;
+    background: var(--bg-surface);
+    border-bottom: 1px solid var(--border);
+    position: relative;
+    overflow: hidden;
+  }
+  .lab-panel canvas {
+    display: block;
+    width: 100%;
+    height: 100%;
+  }
+  .lab-label {
+    position: absolute;
+    top: 8px;
+    left: 12px;
+    font-size: 10px;
+    font-weight: 600;
+    text-transform: uppercase;
+    letter-spacing: 1.5px;
+    color: var(--text-muted);
+    z-index: 2;
+    pointer-events: none;
+  }
+  .lab-action-label {
+    position: absolute;
+    bottom: 10px;
+    left: 50%;
+    transform: translateX(-50%);
+    font-size: 11px;
+    font-family: 'JetBrains Mono', monospace;
+    color: var(--text-dim);
+    background: rgba(12,15,22,0.85);
+    padding: 4px 14px;
+    border-radius: 100px;
+    border: 1px solid var(--border);
+    z-index: 2;
+    pointer-events: none;
+    opacity: 0;
+    transition: opacity 0.3s;
+  }
+  .lab-action-label.visible { opacity: 1; }
+  .center-header {
+    height: 36px;
+    min-height: 36px;
+    display: flex;
+    align-items: center;
+    padding: 0 16px;
+    background: var(--bg-surface);
+    border-bottom: 1px solid var(--border);
+    gap: 8px;
+  }
+  .tab {
+    font-size: 11px; font-weight: 500;
+    padding: 4px 12px; border-radius: 4px;
+    color: var(--text-dim); cursor: pointer;
+    transition: all 0.15s;
+  }
+  .tab.active { color: var(--text); background: var(--bg-hover); }
+  .tab:hover { color: var(--text); }
+  .terminal {
+    flex: 1;
+    overflow-y: auto;
+    padding: 16px 20px;
+    font-family: 'JetBrains Mono', monospace;
+    font-size: 12.5px;
+    line-height: 1.9;
+    scrollbar-width: thin;
+    scrollbar-color: var(--border) transparent;
+  }
+  .terminal::-webkit-scrollbar { width: 6px; }
+  .terminal::-webkit-scrollbar-track { background: transparent; }
+  .terminal::-webkit-scrollbar-thumb { background: var(--border); border-radius: 3px; }
+  .t-line {
+    white-space: pre-wrap;
+    opacity: 0;
+    animation: lineIn 0.25s ease forwards;
+  }
+  @keyframes lineIn {
+    from { opacity: 0; transform: translateY(4px); }
+    to { opacity: 1; transform: translateY(0); }
+  }
+  .t-prompt { color: var(--green); }
+  .t-cmd { color: var(--text); }
+  .t-dim { color: var(--text-muted); }
+  .t-label { color: var(--accent); }
+  .t-str { color: var(--amber); }
+  .t-kw { color: var(--pink); }
+  .t-fn { color: var(--cyan); }
+  .t-num { color: var(--purple); }
+  .t-ok { color: var(--green); }
+  .t-warn { color: var(--amber); }
+  .t-err { color: var(--red); }
+  .t-sub { color: var(--text-dim); }
+  /* ---- Right Panel ---- */
+  .right {
+    background: var(--bg-surface);
+    border-left: 1px solid var(--border);
+    display: flex;
+    flex-direction: column;
+    overflow-y: auto;
+    scrollbar-width: thin;
+    scrollbar-color: var(--border) transparent;
+  }
+  .panel-section {
+    padding: 16px;
+    border-bottom: 1px solid var(--border);
+  }
+  .panel-heading {
+    font-size: 10px; font-weight: 600;
+    text-transform: uppercase; letter-spacing: 1.5px;
+    color: var(--text-muted); margin-bottom: 12px;
+    display: flex; align-items: center; justify-content: space-between;
+  }
+  .reward-row {
+    display: flex; align-items: center; gap: 10px; margin-bottom: 8px;
+  }
+  .reward-row:last-child { margin-bottom: 0; }
+  .rw-label {
+    font-size: 11px; font-weight: 500; width: 80px;
+    color: var(--text-dim); text-align: right;
+  }
+  .rw-track {
+    flex: 1; height: 18px;
+    background: rgba(255,255,255,0.03);
+    border-radius: 4px; overflow: hidden; position: relative;
+  }
+  .rw-fill {
+    height: 100%; border-radius: 4px; width: 0%;
+    transition: width 0.6s cubic-bezier(0.4,0,0.2,1);
+    display: flex; align-items: center; justify-content: flex-end;
+    padding-right: 6px; font-size: 10px; font-weight: 600;
+    font-family: 'JetBrains Mono', monospace;
+    color: rgba(255,255,255,0.85); min-width: fit-content;
+  }
+  .rw-fill.validity { background: linear-gradient(90deg, rgba(52,211,153,0.5), rgba(52,211,153,0.85)); }
+  .rw-fill.ordering { background: linear-gradient(90deg, rgba(34,211,238,0.5), rgba(34,211,238,0.85)); }
+  .rw-fill.info_gain { background: linear-gradient(90deg, rgba(56,189,248,0.5), rgba(56,189,248,0.85)); }
+  .rw-fill.efficiency { background: linear-gradient(90deg, rgba(251,191,36,0.5), rgba(251,191,36,0.85)); }
+  .rw-fill.novelty { background: linear-gradient(90deg, rgba(167,139,250,0.5), rgba(167,139,250,0.85)); }
+  .rw-fill.penalty { background: linear-gradient(90deg, rgba(248,113,113,0.5), rgba(248,113,113,0.85)); }
+  .cumulative-row {
+    display: flex; align-items: baseline; justify-content: space-between;
+    margin-top: 12px; padding-top: 12px; border-top: 1px solid var(--border);
+  }
+  .cum-label { font-size: 11px; color: var(--text-dim); }
+  .cum-value {
+    font-size: 20px; font-weight: 700;
+    font-family: 'JetBrains Mono', monospace; color: var(--green);
+  }
+  .discovery-list { display: flex; flex-direction: column; gap: 6px; }
+  .discovery {
+    display: flex; align-items: flex-start; gap: 8px;
+    padding: 8px 10px; background: var(--bg-raised);
+    border-radius: 6px; border: 1px solid var(--border);
+    opacity: 0; transform: scale(0.95); transition: all 0.3s ease;
+  }
+  .discovery.visible { opacity: 1; transform: scale(1); }
+  .disc-icon {
+    width: 20px; height: 20px; border-radius: 4px;
+    display: flex; align-items: center; justify-content: center;
+    font-size: 10px; flex-shrink: 0; margin-top: 1px;
+  }
+  .disc-body { flex: 1; }
+  .disc-title { font-size: 11px; font-weight: 600; }
+  .disc-detail {
+    font-size: 10px; color: var(--text-dim); margin-top: 2px;
+    font-family: 'JetBrains Mono', monospace;
+  }
+  .empty-state {
+    font-size: 11px; color: var(--text-muted);
+    font-style: italic; padding: 8px 0;
+  }
+  .step-reward-mini {
+    display: flex; align-items: center; justify-content: space-between;
+    padding: 6px 10px; background: var(--bg-raised);
+    border-radius: 6px; margin-bottom: 4px;
+    font-size: 11px; font-family: 'JetBrains Mono', monospace;
+    opacity: 0; transition: all 0.3s;
+  }
+  .step-reward-mini.visible { opacity: 1; }
+  .step-reward-mini .srm-name { color: var(--text-dim); }
+  .step-reward-mini .srm-val { font-weight: 600; }
+  .step-reward-mini .srm-val.pos { color: var(--green); }
+  .step-reward-mini .srm-val.neg { color: var(--red); }
+</style>
+</head>
+<body>
+<!-- Top Bar -->
+<div class="topbar">
+  <div class="topbar-logo">BioEnv</div>
+  <div class="topbar-sep"></div>
+  <div class="topbar-env">biomarker_validation_lung</div>
+  <div class="topbar-status">
+    <div class="status-dot" id="statusDot"></div>
+    <span id="statusText">Ready</span>
+  </div>
+  <button class="topbar-btn btn-ghost" id="resetBtn" onclick="resetDemo()">Reset</button>
+  <button class="topbar-btn btn-primary" id="runBtn" onclick="startDemo()">Run Episode</button>
+</div>
+<div class="main">
+  <!-- Left Sidebar -->
+  <div class="sidebar">
+    <div class="sidebar-section">
+      <div class="sidebar-heading">Scenario</div>
+      <div class="scenario-list">
+        <div class="scenario-opt" onclick="selectScenario(this)">
+          <div class="sc-dot" style="background: var(--green);"></div>
+          <span class="sc-name">Cardiac Disease DE</span>
+          <span class="sc-diff" style="color: var(--green);">Easy</span>
+        </div>
+        <div class="scenario-opt" onclick="selectScenario(this)">
+          <div class="sc-dot" style="background: var(--amber);"></div>
+          <span class="sc-name">Hematopoiesis Trajectory</span>
+          <span class="sc-diff" style="color: var(--amber);">Med</span>
+        </div>
+        <div class="scenario-opt" onclick="selectScenario(this)">
+          <div class="sc-dot" style="background: var(--amber);"></div>
+          <span class="sc-name">Perturbation Immune</span>
+          <span class="sc-diff" style="color: var(--amber);">Med</span>
+        </div>
+        <div class="scenario-opt active" onclick="selectScenario(this)">
+          <div class="sc-dot" style="background: var(--red);"></div>
+          <span class="sc-name">Biomarker Validation (Lung)</span>
+          <span class="sc-diff" style="color: var(--red);">Hard</span>
+        </div>
+      </div>
+    </div>
+    <div class="sidebar-section">
+      <div class="sidebar-heading">Environment State</div>
+      <div class="gauge">
+        <div class="gauge-header">
+          <span class="gauge-label">Budget</span>
+          <span class="gauge-value" id="budgetVal">$100,000</span>
+        </div>
+        <div class="gauge-track"><div class="gauge-fill" id="budgetFill" style="width:100%;background:var(--green);"></div></div>
+      </div>
+      <div class="gauge">
+        <div class="gauge-header">
+          <span class="gauge-label">Time</span>
+          <span class="gauge-value" id="timeVal">180 / 180 days</span>
+        </div>
+        <div class="gauge-track"><div class="gauge-fill" id="timeFill" style="width:100%;background:var(--cyan);"></div></div>
+      </div>
+      <div class="gauge">
+        <div class="gauge-header">
+          <span class="gauge-label">Steps</span>
+          <span class="gauge-value" id="stepVal">0 / 30</span>
+        </div>
+        <div class="gauge-track"><div class="gauge-fill" id="stepFill" style="width:0%;background:var(--accent);"></div></div>
+      </div>
+    </div>
+    <div class="sidebar-section" style="flex:1;overflow-y:auto;">
+      <div class="sidebar-heading">Pipeline</div>
+      <div class="pipeline-steps" id="pipelineSteps"></div>
+    </div>
+  </div>
+  <!-- Center: Lab + Terminal -->
+  <div class="center">
+    <div class="lab-panel">
+      <div class="lab-label">Virtual Lab</div>
+      <div class="lab-action-label" id="labActionLabel"></div>
+      <canvas id="labCanvas"></canvas>
+    </div>
+    <div class="center-header">
+      <div class="tab active">Agent Log</div>
+      <div class="tab">Raw JSON</div>
+    </div>
+    <div class="terminal" id="terminal"></div>
+  </div>
+  <!-- Right Panel -->
+  <div class="right">
+    <div class="panel-section">
+      <div class="panel-heading">
+        Step Reward
+        <span id="stepRewardLabel" style="font-family:'JetBrains Mono',monospace;font-size:11px;color:var(--text-dim);">--</span>
+      </div>
+      <div id="rewardBars">
+        <div class="reward-row"><span class="rw-label">Validity</span><div class="rw-track"><div class="rw-fill validity" id="rw-validity"></div></div></div>
+        <div class="reward-row"><span class="rw-label">Ordering</span><div class="rw-track"><div class="rw-fill ordering" id="rw-ordering"></div></div></div>
+        <div class="reward-row"><span class="rw-label">Info Gain</span><div class="rw-track"><div class="rw-fill info_gain" id="rw-info_gain"></div></div></div>
+        <div class="reward-row"><span class="rw-label">Efficiency</span><div class="rw-track"><div class="rw-fill efficiency" id="rw-efficiency"></div></div></div>
+        <div class="reward-row"><span class="rw-label">Novelty</span><div class="rw-track"><div class="rw-fill novelty" id="rw-novelty"></div></div></div>
+        <div class="reward-row"><span class="rw-label">Penalty</span><div class="rw-track"><div class="rw-fill penalty" id="rw-penalty"></div></div></div>
+      </div>
+      <div class="cumulative-row">
+        <span class="cum-label">Cumulative Reward</span>
+        <span class="cum-value" id="cumReward">0.00</span>
+      </div>
+    </div>
+    <div class="panel-section">
+      <div class="panel-heading">Reward History</div>
+      <div id="rewardHistory"><div class="empty-state">No steps yet</div></div>
+    </div>
+    <div class="panel-section">
+      <div class="panel-heading">Discoveries</div>
+      <div class="discovery-list" id="discoveries"><div class="empty-state">No discoveries yet</div></div>
+    </div>
+    <div class="panel-section">
+      <div class="panel-heading">Violations</div>
+      <div id="violations"><div class="empty-state">No violations</div></div>
+    </div>
+  </div>
+</div>
+<script>
+// =====================================================
+// VIRTUAL LAB - Canvas rendering
+// =====================================================
+const labCanvas = document.getElementById('labCanvas');
+const ctx = labCanvas.getContext('2d');
+let labW, labH, dpr;
+function resizeLab() {
+  const rect = labCanvas.parentElement.getBoundingClientRect();
+  dpr = window.devicePixelRatio || 1;
+  labW = rect.width;
+  labH = rect.height;
+  labCanvas.width = labW * dpr;
+  labCanvas.height = labH * dpr;
+  ctx.setTransform(dpr, 0, 0, dpr, 0, 0);
+}
+resizeLab();
+window.addEventListener('resize', () => { resizeLab(); });
+// Lab stations (positions as fractions of canvas, converted in draw)
+const STATIONS = {
+  idle:       { fx: 0.06, fy: 0.55, label: 'ENTRANCE',       icon: 'door',    color: '#475569' },
+  sample:     { fx: 0.20, fy: 0.35, label: 'SAMPLE BENCH',   icon: 'bench',   color: '#34d399' },
+  cohort:     { fx: 0.20, fy: 0.75, label: 'COHORT SELECT',  icon: 'people',  color: '#34d399' },
+  prep:       { fx: 0.38, fy: 0.35, label: 'LIBRARY PREP',   icon: 'flask',   color: '#2dd4bf' },
+  sequencer:  { fx: 0.38, fy: 0.75, label: 'SEQUENCER',      icon: 'machine', color: '#22d3ee' },
+  computer:   { fx: 0.62, fy: 0.50, label: 'COMPUTE',        icon: 'screen',  color: '#38bdf8' },
+  whiteboard: { fx: 0.84, fy: 0.45, label: 'SYNTHESIS',      icon: 'board',   color: '#a78bfa' },
+};
+// Map actions to stations
+const ACTION_STATION = {
+  collect_sample: 'sample',
+  select_cohort: 'cohort',
+  prepare_library: 'prep',
+  sequence_cells: 'sequencer',
+  run_qc: 'computer',
+  normalize_data: 'computer',
+  cluster_cells: 'computer',
+  differential_expression: 'computer',
+  pathway_enrichment: 'computer',
+  marker_selection: 'computer',
+  validate_marker: 'computer',
+  synthesize_conclusion: 'whiteboard',
+};
+// Agent state
+let agent = { x: 0, y: 0, targetX: 0, targetY: 0, station: 'idle', working: false };
+let agentTrail = [];
+let workingTick = 0;
+let terminalLines = []; // fake terminal on computer screen
+let activeStationKey = null;
+let particlesLab = [];
+function stationPos(key) {
+  const s = STATIONS[key];
+  return { x: s.fx * labW, y: s.fy * labH };
+}
+function initAgent() {
+  const p = stationPos('idle');
+  agent.x = p.x; agent.y = p.y;
+  agent.targetX = p.x; agent.targetY = p.y;
+  agent.station = 'idle';
+  agent.working = false;
+  agent.facing = 1;
+  agentTrail = [];
+  terminalLines = [];
+  activeStationKey = null;
+  particlesLab = [];
+}
+initAgent();
+function moveAgentTo(stationKey) {
+  const p = stationPos(stationKey);
+  agent.targetX = p.x;
+  agent.targetY = p.y;
+  agent.station = stationKey;
+  agent.working = false;
+  activeStationKey = stationKey;
+}
+function setAgentWorking(actionName) {
+  agent.working = true;
+  workingTick = 0;
+  // If at computer, set up terminal lines
+  if (agent.station === 'computer') {
+    terminalLines = [];
+    typeComputerLines(actionName);
+  }
+}
+const COMP_COMMANDS = {
+  run_qc:                  ['$ scanpy.pp.filter_cells()', '  filtering 11847 cells...', '  10234 passed QC', '  doublet rate: 3.2%'],
+  normalize_data:          ['$ scran.normalize(adata)', '  computing size factors...', '  log1p transform', '  HVGs: 3000 selected'],
+  cluster_cells:           ['$ sc.tl.leiden(adata, 0.8)', '  building kNN graph...', '  optimizing modularity', '  14 clusters found'],
+  differential_expression: ['$ DESeq2.run(IPF, Ctrl)', '  fitting GLM...', '  1847 DE genes', '  SPP1 log2FC=3.42 ***'],
+  pathway_enrichment:      ['$ gseapy.enrich(de_genes)', '  KEGG + Reactome...', '  ECM-receptor p=4.2e-12', '  TGF-beta p=1.8e-09'],
+  marker_selection:        ['$ rank_markers(candidates)', '  SPP1  AUROC: 0.94', '  MMP7  AUROC: 0.87', '  COL1A1 AUROC: 0.81'],
+  validate_marker:         ['$ cross_validate("SPP1")', '  fold 1: 0.93', '  fold 2: 0.89', '  mean AUROC: 0.91 OK'],
+};
+async function typeComputerLines(actionName) {
+  const lines = COMP_COMMANDS[actionName] || ['$ processing...', '  computing...', '  done'];
+  for (let i = 0; i < lines.length; i++) {
+    await wait(250);
+    terminalLines.push(lines[i]);
+    if (terminalLines.length > 5) terminalLines.shift();
+  }
+}
+// Particles burst
+function spawnParticles(x, y, color, count = 8) {
+  for (let i = 0; i < count; i++) {
+    const angle = (Math.PI * 2 / count) * i + Math.random() * 0.5;
+    particlesLab.push({
+      x, y,
+      vx: Math.cos(angle) * (1.5 + Math.random() * 2),
+      vy: Math.sin(angle) * (1.5 + Math.random() * 2),
+      life: 1,
+      color,
+      size: 2 + Math.random() * 2,
+    });
+  }
+}
+// ---- Draw loop ----
+let frameCount = 0;
+const FLOOR_COLOR = '#0f1520';
+const WALL_COLOR = '#1a2332';
+const FLOOR_TILE_A = '#0d1219';
+const FLOOR_TILE_B = '#10161f';
+function drawLab() {
+  frameCount++;
+  ctx.clearRect(0, 0, labW, labH);
+  // Floor - checkerboard tiles
+  const tileSize = 24;
+  for (let ty = 0; ty < labH; ty += tileSize) {
+    for (let tx = 0; tx < labW; tx += tileSize) {
+      const checker = ((Math.floor(tx / tileSize) + Math.floor(ty / tileSize)) % 2 === 0);
+      ctx.fillStyle = checker ? FLOOR_TILE_A : FLOOR_TILE_B;
+      ctx.fillRect(tx, ty, tileSize, tileSize);
+    }
+  }
+  // Walls - top and bottom border
+  ctx.fillStyle = WALL_COLOR;
+  ctx.fillRect(0, 0, labW, 18);
+  ctx.fillRect(0, labH - 8, labW, 8);
+  ctx.strokeStyle = '#253040';
+  ctx.lineWidth = 1;
+  ctx.beginPath(); ctx.moveTo(0, 18); ctx.lineTo(labW, 18); ctx.stroke();
+  // Draw equipment at each station (behind the person)
+  for (const [key, s] of Object.entries(STATIONS)) {
+    const pos = stationPos(key);
+    const isActive = key === activeStationKey;
+    drawEquipment(key, pos.x, pos.y, s.color, isActive);
+  }
+  // Draw walking path (subtle floor markings)
+  ctx.strokeStyle = 'rgba(56,189,248,0.06)';
+  ctx.lineWidth = 16;
+  ctx.lineCap = 'round';
+  ctx.lineJoin = 'round';
+  const pathOrder = ['idle','sample','prep','computer','whiteboard'];
+  ctx.beginPath();
+  const p0 = stationPos(pathOrder[0]);
+  ctx.moveTo(p0.x, p0.y + 10);
+  for (let i = 1; i < pathOrder.length; i++) {
+    const p = stationPos(pathOrder[i]);
+    ctx.lineTo(p.x, p.y + 10);
+  }
+  ctx.stroke();
+  // Lower path
+  ctx.beginPath();
+  const pl0 = stationPos('idle');
+  ctx.moveTo(pl0.x, pl0.y + 10);
+  const pl1 = stationPos('cohort');
+  ctx.lineTo(pl1.x, pl1.y + 10);
+  const pl2 = stationPos('sequencer');
+  ctx.lineTo(pl2.x, pl2.y + 10);
+  const pl3 = stationPos('computer');
+  ctx.lineTo(pl3.x, pl3.y + 10);
+  ctx.stroke();
+  ctx.lineCap = 'butt';
+  // Floating terminal popup at computer
+  if (agent.station === 'computer' && agent.working && terminalLines.length > 0) {
+    const cp = stationPos('computer');
+    const sx = cp.x + 55, sy = cp.y - 65;
+    const sw = 170, sh = 95;
+    // Shadow
+    ctx.fillStyle = 'rgba(0,0,0,0.4)';
+    roundRect(ctx, sx + 3, sy + 3, sw, sh, 6);
+    ctx.fill();
+    ctx.fillStyle = 'rgba(7,9,13,0.97)';
+    ctx.strokeStyle = 'rgba(56,189,248,0.3)';
+    ctx.lineWidth = 1;
+    roundRect(ctx, sx, sy, sw, sh, 6);
+    ctx.fill(); ctx.stroke();
+    // Title bar
+    ctx.fillStyle = 'rgba(30,41,59,0.5)';
+    ctx.fillRect(sx + 1, sy + 1, sw - 2, 14);
+    ctx.fillStyle = '#475569';
+    ctx.font = '500 7px Inter, sans-serif';
+    ctx.textAlign = 'left';
+    ctx.fillText('terminal', sx + 6, sy + 10);
+    // dots
+    ctx.fillStyle = '#f87171'; ctx.beginPath(); ctx.arc(sx + sw - 28, sy + 7, 3, 0, Math.PI*2); ctx.fill();
+    ctx.fillStyle = '#fbbf24'; ctx.beginPath(); ctx.arc(sx + sw - 18, sy + 7, 3, 0, Math.PI*2); ctx.fill();
+    ctx.fillStyle = '#34d399'; ctx.beginPath(); ctx.arc(sx + sw - 8, sy + 7, 3, 0, Math.PI*2); ctx.fill();
+    ctx.font = '500 9px JetBrains Mono, monospace';
+    const startY = sy + 28;
+    for (let i = 0; i < terminalLines.length; i++) {
+      const line = terminalLines[i];
+      ctx.fillStyle = line.startsWith('$') ? '#34d399' : line.includes('***') || line.includes('OK') ? '#34d399' : '#94a3b8';
+      ctx.fillText(terminalLines[i].substring(0, 24), sx + 8, startY + i * 14);
+    }
+    if (frameCount % 60 < 30) {
+      ctx.fillStyle = '#34d399';
+      ctx.fillRect(sx + 8, startY + terminalLines.length * 14 - 8, 6, 11);
+    }
+  }
+  // Whiteboard popup
+  if (agent.station === 'whiteboard' && agent.working) {
+    const wp = stationPos('whiteboard');
+    const bx = wp.x - 60, by = wp.y - 75;
+    const bw = 120, bh = 72;
+    ctx.fillStyle = 'rgba(0,0,0,0.3)';
+    roundRect(ctx, bx + 3, by + 3, bw, bh, 6);
+    ctx.fill();
+    ctx.fillStyle = 'rgba(17,24,39,0.95)';
+    ctx.strokeStyle = 'rgba(167,139,250,0.3)';
+    ctx.lineWidth = 1;
+    roundRect(ctx, bx, by, bw, bh, 6);
+    ctx.fill(); ctx.stroke();
+    ctx.font = '600 8px JetBrains Mono, monospace';
+    ctx.textAlign = 'left';
+    ctx.fillStyle = '#a78bfa';
+    ctx.fillText('CONCLUSION', bx + 8, by + 14);
+    ctx.font = '400 7.5px JetBrains Mono, monospace';
+    const synthLines = ['SPP1 validated', 'AUROC = 0.91', 'Confidence: 0.85', 'Match: 4/5'];
+    for (let i = 0; i < synthLines.length; i++) {
+      ctx.fillStyle = i === 0 ? '#34d399' : '#94a3b8';
+      ctx.fillText(synthLines[i], bx + 8, by + 28 + i * 12);
+    }
+  }
+  // Activity text above active station
+  if (agent.working && activeStationKey && activeStationKey !== 'idle') {
+    const sp = stationPos(activeStationKey);
+    const actTexts = {
+      sample: 'collecting tissue...', cohort: 'selecting cohort...',
+      prep: 'preparing library...', sequencer: 'sequencing...',
+      computer: 'computing...', whiteboard: 'synthesizing...',
+    };
+    ctx.fillStyle = STATIONS[activeStationKey].color;
+    ctx.font = '500 9px JetBrains Mono, monospace';
+    ctx.textAlign = 'center';
+    ctx.globalAlpha = 0.5 + 0.3 * Math.sin(frameCount * 0.06);
+    const yOff = ['sample','prep'].includes(activeStationKey) ? -55 : -50;
+    ctx.fillText(actTexts[activeStationKey] || 'working...', sp.x, sp.y + yOff);
+    ctx.globalAlpha = 1;
+  }
+  // Move agent smoothly
+  const dx = agent.targetX - agent.x;
+  const dy = agent.targetY - agent.y;
+  const dist = Math.sqrt(dx * dx + dy * dy);
+  const isWalking = dist > 2;
+  if (isWalking) {
+    const speed = 0.05;
+    agent.x += dx * speed;
+    agent.y += dy * speed;
+    agent.facing = dx > 0 ? 1 : dx < -0.5 ? -1 : agent.facing;
+  }
+  // Draw person
+  drawPerson(agent.x, agent.y, isWalking, agent.working, agent.facing || 1);
+  // Particles
+  for (let i = particlesLab.length - 1; i >= 0; i--) {
+    const p = particlesLab[i];
+    p.x += p.vx; p.y += p.vy;
+    p.vx *= 0.95; p.vy *= 0.95;
+    p.life -= 0.02;
+    if (p.life <= 0) { particlesLab.splice(i, 1); continue; }
+    ctx.globalAlpha = p.life * 0.6;
+    ctx.fillStyle = p.color;
+    ctx.beginPath();
+    ctx.arc(p.x, p.y, p.size * p.life, 0, Math.PI * 2);
+    ctx.fill();
+  }
+  ctx.globalAlpha = 1;
+  // Station labels
+  for (const [key, s] of Object.entries(STATIONS)) {
+    if (key === 'idle') continue;
+    const pos = stationPos(key);
+    const isActive = key === activeStationKey;
+    ctx.fillStyle = isActive ? s.color : '#334155';
+    ctx.font = `600 ${isActive ? 9 : 8}px Inter, sans-serif`;
+    ctx.textAlign = 'center';
+    const ly = key === 'cohort' || key === 'sequencer' ? pos.y + 45 : pos.y + 42;
+    ctx.fillText(s.label, pos.x, ly);
+  }
+  requestAnimationFrame(drawLab);
+}
+// ---- Draw person (lab coat researcher) ----
+function drawPerson(x, y, walking, working, facing) {
+  const f = facing;
+  const t = frameCount;
+  // Walking cycle
+  const walkCycle = walking ? Math.sin(t * 0.15) : 0;
+  const bobY = walking ? Math.abs(Math.sin(t * 0.15)) * 2 : 0;
+  // Working arm animation
+  const workArm = working ? Math.sin(t * 0.08) * 0.3 : 0;
+  const py = y - bobY; // feet position base
+  ctx.save();
+  ctx.translate(x, py);
+  // Shadow
+  ctx.fillStyle = 'rgba(0,0,0,0.25)';
+  ctx.beginPath();
+  ctx.ellipse(0, 12, 10, 4, 0, 0, Math.PI * 2);
+  ctx.fill();
+  // Legs
+  const legSpread = walking ? walkCycle * 5 : 0;
+  ctx.strokeStyle = '#1e3a5f';
+  ctx.lineWidth = 3;
+  ctx.lineCap = 'round';
+  // Left leg
+  ctx.beginPath();
+  ctx.moveTo(-3, 4);
+  ctx.lineTo(-3 + legSpread, 12);
+  ctx.stroke();
+  // Right leg
+  ctx.beginPath();
+  ctx.moveTo(3, 4);
+  ctx.lineTo(3 - legSpread, 12);
+  ctx.stroke();
+  // Shoes
+  ctx.fillStyle = '#1e293b';
+  ctx.beginPath(); ctx.arc(-3 + legSpread, 12, 2.5, 0, Math.PI * 2); ctx.fill();
+  ctx.beginPath(); ctx.arc(3 - legSpread, 12, 2.5, 0, Math.PI * 2); ctx.fill();
+  // Body / lab coat
+  ctx.fillStyle = '#e2e8f0'; // white lab coat
+  ctx.beginPath();
+  ctx.moveTo(-7, -4);
+  ctx.lineTo(-6, 6);
+  ctx.lineTo(6, 6);
+  ctx.lineTo(7, -4);
+  ctx.quadraticCurveTo(7, -10, 0, -10);
+  ctx.quadraticCurveTo(-7, -10, -7, -4);
+  ctx.fill();
+  // Coat outline
+  ctx.strokeStyle = '#94a3b8';
+  ctx.lineWidth = 0.5;
+  ctx.stroke();
+  // Coat split at bottom
+  ctx.beginPath();
+  ctx.moveTo(0, 1);
+  ctx.lineTo(0, 6);
+  ctx.strokeStyle = '#cbd5e1';
+  ctx.lineWidth = 0.5;
+  ctx.stroke();
+  // Pocket
+  ctx.strokeStyle = '#94a3b8';
+  ctx.lineWidth = 0.5;
+  ctx.strokeRect(f > 0 ? 1 : -5, -1, 4, 3);
+  // Arms
+  ctx.strokeStyle = '#e2e8f0';
+  ctx.lineWidth = 3.5;
+  ctx.lineCap = 'round';
+  // Back arm
+  const backArmSwing = walking ? -walkCycle * 4 : 0;
+  ctx.beginPath();
+  ctx.moveTo(-f * 6, -6);
+  ctx.lineTo(-f * 6 + backArmSwing, 2);
+  ctx.stroke();
+  // Front arm (active arm)
+  if (working) {
+    // Arm reaching forward/up for work
+    ctx.beginPath();
+    ctx.moveTo(f * 6, -6);
+    ctx.lineTo(f * 10 + workArm * 5, -8 + workArm * 3);
+    ctx.stroke();
+    // Hand/tool
+    ctx.fillStyle = '#fde68a';
+    ctx.beginPath();
+    ctx.arc(f * 10 + workArm * 5, -8 + workArm * 3, 2, 0, Math.PI * 2);
+    ctx.fill();
+  } else {
+    const frontArmSwing = walking ? walkCycle * 4 : 0;
+    ctx.beginPath();
+    ctx.moveTo(f * 6, -6);
+    ctx.lineTo(f * 6 + frontArmSwing, 2);
+    ctx.stroke();
+  }
+  // Skin for hands
+  ctx.fillStyle = '#fde68a';
+  ctx.beginPath(); ctx.arc(-f * 6 + backArmSwing, 2, 1.8, 0, Math.PI * 2); ctx.fill();
+  if (!working) {
+    const fs = walking ? walkCycle * 4 : 0;
+    ctx.beginPath(); ctx.arc(f * 6 + fs, 2, 1.8, 0, Math.PI * 2); ctx.fill();
+  }
+  // Head
+  ctx.fillStyle = '#fde68a'; // skin
+  ctx.beginPath();
+  ctx.arc(0, -15, 7, 0, Math.PI * 2);
+  ctx.fill();
+  // Hair
+  ctx.fillStyle = '#1e293b';
+  ctx.beginPath();
+  ctx.arc(0, -17, 7, Math.PI, 0);
+  ctx.fill();
+  // Face details
+  ctx.fillStyle = '#1e293b';
+  // Eyes
+  ctx.beginPath();
+  ctx.arc(f * 2.5, -15.5, 1, 0, Math.PI * 2);
+  ctx.fill();
+  ctx.beginPath();
+  ctx.arc(f * -1.5, -15.5, 1, 0, Math.PI * 2);
+  ctx.fill();
+  // Glasses
+  ctx.strokeStyle = '#475569';
+  ctx.lineWidth = 0.7;
+  ctx.beginPath();
+  ctx.arc(f * 2.5, -15.5, 2.5, 0, Math.PI * 2);
+  ctx.stroke();
+  ctx.beginPath();
+  ctx.arc(f * -1.5, -15.5, 2.5, 0, Math.PI * 2);
+  ctx.stroke();
+  ctx.beginPath();
+  ctx.moveTo(f * 0.5, -15.5);
+  ctx.lineTo(f * -0.5, -15.5);
+  ctx.stroke();
+  // Mouth
+  if (working) {
+    ctx.fillStyle = '#1e293b';
+    ctx.beginPath();
+    ctx.arc(f * 0.5, -12.5, 1, 0, Math.PI);
+    ctx.fill();
+  }
+  // ID Badge
+  ctx.fillStyle = '#38bdf8';
+  ctx.fillRect(f > 0 ? -6 : 2, -3, 4, 5);
+  ctx.fillStyle = '#fff';
+  ctx.font = 'bold 3px Inter, sans-serif';
+  ctx.textAlign = 'center';
+  ctx.fillText('AI', f > 0 ? -4 : 4, 0.5);
+  ctx.restore();
+}
+// ---- Draw lab equipment ----
+function drawEquipment(stationKey, cx, cy, color, active) {
+  ctx.save();
+  switch (stationKey) {
+    case 'idle':
+      // Door frame
+      ctx.strokeStyle = '#334155';
+      ctx.lineWidth = 2;
+      ctx.strokeRect(cx - 12, cy - 30, 24, 40);
+      ctx.fillStyle = '#1a2332';
+      ctx.fillRect(cx - 10, cy - 28, 20, 36);
+      ctx.fillStyle = '#475569';
+      ctx.beginPath(); ctx.arc(cx + 6, cy - 10, 2, 0, Math.PI * 2); ctx.fill();
+      break;
+    case 'sample':
+      // Lab bench with sample tubes
+      // Bench surface
+      ctx.fillStyle = '#1a2332';
+      ctx.fillRect(cx - 30, cy - 8, 60, 6);
+      // Bench legs
+      ctx.fillStyle = '#253040';
+      ctx.fillRect(cx - 28, cy - 2, 4, 20);
+      ctx.fillRect(cx + 24, cy - 2, 4, 20);
+      // Tube rack
+      ctx.fillStyle = '#253040';
+      ctx.fillRect(cx - 18, cy - 18, 36, 10);
+      // Test tubes
+      const tubeColors = ['#34d399', '#22d3ee', '#fbbf24', '#f472b6', '#34d399', '#22d3ee'];
+      for (let i = 0; i < 6; i++) {
+        const tx = cx - 14 + i * 6;
+        ctx.fillStyle = active ? tubeColors[i] : '#334155';
+        ctx.globalAlpha = active ? 0.7 : 0.4;
+        ctx.fillRect(tx, cy - 28, 4, 12);
+        // Tube caps
+        ctx.globalAlpha = 1;
+        ctx.fillStyle = active ? tubeColors[i] : '#475569';
+        ctx.fillRect(tx - 0.5, cy - 29, 5, 2);
+      }
+      ctx.globalAlpha = 1;
+      // Pipette if active
+      if (active) {
+        const pipY = cy - 32 + Math.sin(frameCount * 0.08) * 4;
+        ctx.strokeStyle = '#94a3b8';
+        ctx.lineWidth = 2;
+        ctx.beginPath();
+        ctx.moveTo(cx + 5, pipY);
+        ctx.lineTo(cx + 5, pipY - 14);
+        ctx.stroke();
+        ctx.fillStyle = '#64748b';
+        ctx.fillRect(cx + 3, pipY - 18, 5, 6);
+        // Droplet
+        if (frameCount % 60 < 20) {
+          ctx.fillStyle = '#34d399';
+          ctx.globalAlpha = 0.6;
+          ctx.beginPath();
+          ctx.arc(cx + 5, pipY + 3, 1.5, 0, Math.PI * 2);
+          ctx.fill();
+          ctx.globalAlpha = 1;
+        }
+      }
+      break;
+    case 'cohort':
+      // Filing cabinet / patient records
+      ctx.fillStyle = '#1a2332';
+      ctx.fillRect(cx - 20, cy - 22, 40, 40);
+      ctx.strokeStyle = '#253040';
+      ctx.lineWidth = 1;
+      for (let i = 0; i < 3; i++) {
+        const dy = cy - 18 + i * 13;
+        ctx.strokeRect(cx - 18, dy, 36, 11);
+        ctx.fillStyle = active ? '#475569' : '#253040';
+        ctx.fillRect(cx - 4, dy + 4, 8, 3);
+      }
+      // Clipboard
+      ctx.fillStyle = '#253040';
+      ctx.fillRect(cx + 24, cy - 16, 14, 20);
+      ctx.strokeStyle = '#475569';
+      ctx.lineWidth = 0.5;
+      for (let i = 0; i < 4; i++) {
+        ctx.beginPath();
+        ctx.moveTo(cx + 27, cy - 12 + i * 4);
+        ctx.lineTo(cx + 35, cy - 12 + i * 4);
+        ctx.stroke();
+      }
+      if (active) {
+        ctx.fillStyle = color;
+        ctx.globalAlpha = 0.5;
+        ctx.beginPath(); ctx.arc(cx + 31, cy - 14, 2, 0, Math.PI * 2); ctx.fill();
+        ctx.globalAlpha = 1;
+      }
+      break;
+    case 'prep':
+      // Library prep station - PCR machine + bench
+      // Bench
+      ctx.fillStyle = '#1a2332';
+      ctx.fillRect(cx - 28, cy - 6, 56, 6);
+      ctx.fillStyle = '#253040';
+      ctx.fillRect(cx - 26, cy, 4, 18);
+      ctx.fillRect(cx + 22, cy, 4, 18);
+      // PCR/thermocycler machine
+      ctx.fillStyle = active ? '#192535' : '#172030';
+      ctx.strokeStyle = active ? color : '#253040';
+      ctx.lineWidth = 1;
+      roundRect(ctx, cx - 18, cy - 26, 36, 20, 3);
+      ctx.fill(); ctx.stroke();
+      // Display on machine
+      ctx.fillStyle = active ? 'rgba(45,212,191,0.15)' : 'rgba(30,41,59,0.3)';
+      ctx.fillRect(cx - 14, cy - 22, 16, 8);
+      if (active) {
+        ctx.fillStyle = color;
+        ctx.font = '500 6px JetBrains Mono, monospace';
+        ctx.textAlign = 'left';
+        ctx.fillText('72.0°C', cx - 12, cy - 16);
+        // LED
+        ctx.fillStyle = color;
+        ctx.beginPath(); ctx.arc(cx + 12, cy - 18, 2, 0, Math.PI * 2); ctx.fill();
+      }
+      // Microplate
+      ctx.fillStyle = '#1e293b';
+      ctx.fillRect(cx - 20, cy - 3, 18, 12);
+      ctx.strokeStyle = '#334155';
+      ctx.lineWidth = 0.3;
+      for (let r = 0; r < 3; r++) {
+        for (let c = 0; c < 4; c++) {
+          ctx.beginPath();
+          ctx.arc(cx - 17 + c * 4.5, cy + 1 + r * 3.5, 1.2, 0, Math.PI * 2);
+          ctx.stroke();
+        }
+      }
+      break;
+    case 'sequencer':
+      // Big sequencing machine (NovaSeq-like)
+      // Machine body
+      ctx.fillStyle = '#172030';
+      ctx.strokeStyle = active ? color : '#253040';
+      ctx.lineWidth = active ? 1.5 : 1;
+      roundRect(ctx, cx - 24, cy - 28, 48, 44, 4);
+      ctx.fill(); ctx.stroke();
+      // Front panel / screen
+      ctx.fillStyle = active ? 'rgba(34,211,238,0.1)' : 'rgba(30,41,59,0.3)';
+      roundRect(ctx, cx - 18, cy - 22, 36, 18, 2);
+      ctx.fill();
+      if (active) {
+        // Progress bar on screen
+        ctx.fillStyle = 'rgba(34,211,238,0.2)';
+        ctx.fillRect(cx - 14, cy - 12, 28, 4);
+        const progress = (frameCount % 120) / 120;
+        ctx.fillStyle = color;
+        ctx.fillRect(cx - 14, cy - 12, 28 * progress, 4);
+        ctx.fillStyle = color;
+        ctx.font = '500 6px JetBrains Mono, monospace';
+        ctx.textAlign = 'center';
+        ctx.fillText('SEQUENCING', cx, cy - 16);
+      }
+      // Slot
+      ctx.fillStyle = '#0f1520';
+      ctx.fillRect(cx - 10, cy, 20, 4);
+      // Status LEDs
+      ctx.fillStyle = active ? '#34d399' : '#334155';
+      ctx.beginPath(); ctx.arc(cx - 14, cy + 10, 2, 0, Math.PI * 2); ctx.fill();
+      if (active && frameCount % 30 < 15) {
+        ctx.fillStyle = '#fbbf24';
+      } else {
+        ctx.fillStyle = '#334155';
+      }
+      ctx.beginPath(); ctx.arc(cx - 8, cy + 10, 2, 0, Math.PI * 2); ctx.fill();
+      break;
+    case 'computer':
+      // Computer desk with dual monitors
+      // Desk
+      ctx.fillStyle = '#1a2332';
+      ctx.fillRect(cx - 36, cy + 2, 72, 5);
+      ctx.fillStyle = '#253040';
+      ctx.fillRect(cx - 32, cy + 7, 4, 16);
+      ctx.fillRect(cx + 28, cy + 7, 4, 16);
+      // Chair
+      ctx.fillStyle = '#1e293b';
+      ctx.beginPath();
+      ctx.arc(cx, cy + 28, 8, 0, Math.PI * 2);
+      ctx.fill();
+      ctx.fillStyle = '#253040';
+      ctx.fillRect(cx - 1, cy + 20, 2, 8);
+      // Monitor 1 (main)
+      ctx.fillStyle = active ? '#0c1219' : '#131c28';
+      ctx.strokeStyle = active ? 'rgba(56,189,248,0.4)' : '#253040';
+      ctx.lineWidth = 1;
+      roundRect(ctx, cx - 30, cy - 28, 32, 24, 2);
+      ctx.fill(); ctx.stroke();
+      // Monitor stand
+      ctx.fillStyle = '#334155';
+      ctx.fillRect(cx - 16, cy - 4, 4, 6);
+      ctx.fillRect(cx - 20, cy + 1, 12, 2);
+      // Monitor 2
+      ctx.fillStyle = active ? '#0c1219' : '#131c28';
+      ctx.strokeStyle = active ? 'rgba(56,189,248,0.3)' : '#253040';
+      roundRect(ctx, cx + 2, cy - 24, 26, 20, 2);
+      ctx.fill(); ctx.stroke();
+      ctx.fillStyle = '#334155';
+      ctx.fillRect(cx + 13, cy - 4, 4, 6);
+      ctx.fillRect(cx + 9, cy + 1, 12, 2);
+      // Screen content
+      if (active) {
+        ctx.fillStyle = 'rgba(56,189,248,0.08)';
+        ctx.fillRect(cx - 28, cy - 26, 28, 20);
+        // Code lines
+        for (let i = 0; i < 5; i++) {
+          ctx.fillStyle = `rgba(56,189,248,${0.15 + i * 0.06})`;
+          const w = 8 + Math.sin(i * 2.3 + frameCount * 0.02) * 6;
+          ctx.fillRect(cx - 26, cy - 24 + i * 4, w, 2);
+        }
+        // Second screen - graph
+        ctx.fillStyle = 'rgba(56,189,248,0.06)';
+        ctx.fillRect(cx + 4, cy - 22, 22, 16);
+        ctx.strokeStyle = 'rgba(34,211,238,0.3)';
+        ctx.lineWidth = 1;
+        ctx.beginPath();
+        ctx.moveTo(cx + 6, cy - 8);
+        for (let i = 0; i < 8; i++) {
+          ctx.lineTo(cx + 6 + i * 2.5, cy - 10 - Math.sin(i * 0.8 + frameCount * 0.03) * 5);
+        }
+        ctx.stroke();
+      }
+      // Keyboard
+      ctx.fillStyle = '#1e293b';
+      ctx.fillRect(cx - 14, cy + 4, 28, 6);
+      // Typing effect
+      if (active && agent.working) {
+        const keyX = cx - 12 + (frameCount % 20) * 1.2;
+        ctx.fillStyle = 'rgba(56,189,248,0.4)';
+        ctx.fillRect(keyX, cy + 5, 3, 4);
+      }
+      break;
+    case 'whiteboard':
+      // Whiteboard on wall + standing desk
+      // Board on wall
+      ctx.fillStyle = '#1e293b';
+      ctx.strokeStyle = '#334155';
+      ctx.lineWidth = 1;
+      ctx.fillRect(cx - 28, cy - 34, 56, 32);
+      ctx.strokeRect(cx - 28, cy - 34, 56, 32);
+      // Board content
+      if (active) {
+        ctx.fillStyle = 'rgba(167,139,250,0.1)';
+        ctx.fillRect(cx - 26, cy - 32, 52, 28);
+        // Diagram elements
+        ctx.strokeStyle = 'rgba(167,139,250,0.4)';
+        ctx.lineWidth = 0.8;
+        // Boxes
+        ctx.strokeRect(cx - 20, cy - 28, 14, 8);
+        ctx.strokeRect(cx + 6, cy - 28, 14, 8);
+        ctx.strokeRect(cx - 8, cy - 16, 16, 8);
+        // Arrows
+        ctx.beginPath();
+        ctx.moveTo(cx - 6, cy - 24); ctx.lineTo(cx + 6, cy - 24); ctx.stroke();
+        ctx.beginPath();
+        ctx.moveTo(cx, cy - 20); ctx.lineTo(cx, cy - 16); ctx.stroke();
+        // Checkmark
+        ctx.strokeStyle = '#34d399';
+        ctx.lineWidth = 1.5;
+        ctx.beginPath();
+        ctx.moveTo(cx - 4, cy - 12);
+        ctx.lineTo(cx - 1, cy - 9);
+        ctx.lineTo(cx + 5, cy - 15);
+        ctx.stroke();
+      } else {
+        // Faint lines
+        ctx.strokeStyle = '#253040';
+        ctx.lineWidth = 0.5;
+        for (let i = 0; i < 4; i++) {
+          ctx.beginPath();
+          ctx.moveTo(cx - 22, cy - 28 + i * 7);
+          ctx.lineTo(cx + 22, cy - 28 + i * 7);
+          ctx.stroke();
+        }
+      }
+      // Standing desk
+      ctx.fillStyle = '#1a2332';
+      ctx.fillRect(cx - 16, cy + 2, 32, 4);
+      ctx.fillStyle = '#253040';
+      ctx.fillRect(cx - 2, cy + 6, 4, 14);
+      break;
+  }
+  ctx.restore();
+}
+function roundRect(ctx, x, y, w, h, r) {
+  ctx.beginPath();
+  ctx.moveTo(x + r, y);
+  ctx.lineTo(x + w - r, y);
+  ctx.quadraticCurveTo(x + w, y, x + w, y + r);
+  ctx.lineTo(x + w, y + h - r);
+  ctx.quadraticCurveTo(x + w, y + h, x + w - r, y + h);
+  ctx.lineTo(x + r, y + h);
+  ctx.quadraticCurveTo(x, y + h, x, y + h - r);
+  ctx.lineTo(x, y + r);
+  ctx.quadraticCurveTo(x, y, x + r, y);
+  ctx.closePath();
+}
+drawLab();
+// =====================================================
+// EPISODE DATA + APP LOGIC
+// =====================================================
+const EPISODE = [
+  {
+    action: 'collect_sample', params: 'n_samples=8, tissue="lung"', category: 'wet',
+    budget: 92400, budgetPct: 92.4, time: 165, timePct: 91.7,
+    output: ['Collected 8 lung tissue samples (4 IPF, 4 control)','Tissue quality: excellent | Storage: -80C'],
+    reward: { validity: 0.90, ordering: 1.00, info_gain: 0.10, efficiency: 0.72, novelty: 1.00, penalty: 0.0 },
+    total: 0.45,
+  },
+  {
+    action: 'select_cohort', params: 'criteria="age_matched, sex_balanced"', category: 'wet',
+    budget: 91800, budgetPct: 91.8, time: 162, timePct: 90.0,
+    output: ['Cohort selected: 4 IPF patients (2M/2F, age 58-67)','Controls matched: 4 healthy donors (2M/2F, age 55-65)'],
+    reward: { validity: 0.85, ordering: 0.90, info_gain: 0.15, efficiency: 0.80, novelty: 0.90, penalty: 0.0 },
+    total: 0.38,
+  },
+  {
+    action: 'prepare_library', params: 'protocol="10x_chromium_v3"', category: 'wet',
+    budget: 84200, budgetPct: 84.2, time: 155, timePct: 86.1,
+    output: ['Library prep complete using 10x Chromium v3','Estimated cell capture: ~12,000 cells','cDNA yield: 42ng (good)'],
+    reward: { validity: 0.95, ordering: 1.00, info_gain: 0.20, efficiency: 0.70, novelty: 0.95, penalty: 0.0 },
+    total: 0.52,
+  },
+  {
+    action: 'sequence_cells', params: 'depth="standard", platform="NovaSeq"', category: 'wet',
+    budget: 68500, budgetPct: 68.5, time: 142, timePct: 78.9,
+    output: ['11,847 cells sequenced | 22,438 genes detected','Median reads/cell: 45,200 | Median genes/cell: 3,842','Sequencing saturation: 78.3%'],
+    reward: { validity: 0.95, ordering: 1.00, info_gain: 0.55, efficiency: 0.60, novelty: 0.90, penalty: 0.0 },
+    total: 0.68,
+  },
+  {
+    action: 'run_qc', params: 'tool="scanpy", min_genes=200', category: 'comp',
+    budget: 68100, budgetPct: 68.1, time: 141, timePct: 78.3,
+    output: ['QC complete: 10,234 / 11,847 cells passed (86.4%)','Removed: 382 doublets (3.2%), 1,231 low-quality cells','Mitochondrial threshold: 20% (flagged 847 cells)'],
+    reward: { validity: 0.95, ordering: 1.00, info_gain: 0.35, efficiency: 0.85, novelty: 0.80, penalty: 0.0 },
+    total: 0.55,
+  },
+  {
+    action: 'normalize_data', params: 'method="scran", log_transform=true', category: 'comp',
+    budget: 67900, budgetPct: 67.9, time: 140, timePct: 77.8,
+    output: ['Size-factor normalization (scran) applied','Log1p transform complete | HVG selection: 3,000 genes'],
+    reward: { validity: 0.90, ordering: 1.00, info_gain: 0.25, efficiency: 0.90, novelty: 0.70, penalty: 0.0 },
+    total: 0.42,
+  },
+  {
+    action: 'cluster_cells', params: 'algorithm="leiden", resolution=0.8', category: 'comp',
+    budget: 67500, budgetPct: 67.5, time: 139, timePct: 77.2,
+    output: ['Leiden clustering: 14 clusters identified','AT1 (8.2%), AT2 (12.1%), Fibroblast (15.7%), Macrophage (18.3%)','Endothelial (9.4%), Basal (6.1%), Ciliated (5.8%), NK/T (7.2%)','Smooth Muscle (4.1%), Mast (2.9%), B cell (3.4%), pDC (2.0%)','Mesothelial (2.6%), Aberrant Basaloid (2.2%)'],
+    reward: { validity: 0.95, ordering: 1.00, info_gain: 0.65, efficiency: 0.85, novelty: 0.85, penalty: 0.0 },
+    total: 0.72,
+    discovery: { title: '14 cell populations identified', detail: 'Including Aberrant Basaloid cells (IPF-associated)', color: 'var(--cyan)', bg: 'var(--cyan-dim)' },
+  },
+  {
+    action: 'differential_expression', params: 'method="DESeq2", contrast="IPF_vs_Ctrl"', category: 'comp',
+    budget: 67000, budgetPct: 67.0, time: 137, timePct: 76.1,
+    output: ['1,847 DE genes (|log2FC| > 1, padj < 0.05)','Top upregulated in IPF:','  SPP1   log2FC=3.42  padj=1.2e-18','  MMP7   log2FC=2.89  padj=3.4e-15','  COL1A1 log2FC=2.67  padj=8.7e-14','  TGFB1  log2FC=1.95  padj=2.1e-09','Top downregulated: AGER (-3.1), SFTPC (-2.8), HOPX (-2.3)'],
+    reward: { validity: 0.95, ordering: 1.00, info_gain: 0.78, efficiency: 0.80, novelty: 0.88, penalty: 0.0 },
+    total: 0.82,
+    discovery: { title: 'SPP1 strongly upregulated in IPF', detail: 'log2FC=3.42, padj=1.2e-18', color: 'var(--pink)', bg: 'rgba(244,114,182,0.10)' },
+  },
+  {
+    action: 'pathway_enrichment', params: 'tool="gseapy", gene_sets="KEGG,Reactome"', category: 'comp',
+    budget: 66600, budgetPct: 66.6, time: 136, timePct: 75.6,
+    output: ['Top enriched pathways (IPF vs Control):','  ECM-receptor interaction     padj=4.2e-12','  TGF-beta signaling           padj=1.8e-09','  PI3K-Akt signaling           padj=3.1e-07','  Focal adhesion               padj=8.9e-07','SPP1 participates in 3/4 top pathways'],
+    reward: { validity: 0.90, ordering: 1.00, info_gain: 0.60, efficiency: 0.85, novelty: 0.75, penalty: 0.0 },
+    total: 0.58,
+    discovery: { title: 'SPP1 in ECM/TGF-beta/PI3K pathways', detail: 'Core fibrosis signaling axis confirmed', color: 'var(--purple)', bg: 'rgba(167,139,250,0.10)' },
+  },
+  {
+    action: 'marker_selection', params: 'candidates=["SPP1","MMP7","COL1A1"]', category: 'comp',
+    budget: 66200, budgetPct: 66.2, time: 135, timePct: 75.0,
+    output: ['Marker ranking by discriminative power:','  1. SPP1   - AUROC: 0.94, specificity: 0.89','  2. MMP7   - AUROC: 0.87, specificity: 0.82','  3. COL1A1 - AUROC: 0.81, specificity: 0.76','SPP1 selected as primary biomarker candidate'],
+    reward: { validity: 0.90, ordering: 1.00, info_gain: 0.50, efficiency: 0.88, novelty: 0.70, penalty: 0.0 },
+    total: 0.55,
+  },
+  {
+    action: 'validate_marker', params: 'gene="SPP1", method="cross_validation"', category: 'comp',
+    budget: 65200, budgetPct: 65.2, time: 130, timePct: 72.2,
+    output: ['SPP1 Biomarker Validation Report:','  5-fold CV AUROC:    0.91 (+/- 0.03)','  Sensitivity:         0.88','  Specificity:         0.87','  Positive LR:         6.77','  Expression in Aberrant Basaloid: 94.2% of cells','  Status: VALIDATED as IPF biomarker'],
+    reward: { validity: 0.95, ordering: 1.00, info_gain: 0.72, efficiency: 0.82, novelty: 0.85, penalty: 0.0 },
+    total: 0.76,
+    discovery: { title: 'SPP1 validated as IPF biomarker', detail: 'AUROC=0.91, specificity=0.87', color: 'var(--green)', bg: 'var(--green-dim)' },
+  },
+  {
+    action: 'synthesize_conclusion', params: 'confidence=0.85', category: 'meta',
+    budget: 65000, budgetPct: 65.0, time: 129, timePct: 71.7,
+    output: ['CONCLUSION (confidence: 0.85):','','SPP1 is a validated biomarker for IPF with strong','discriminative power (AUROC=0.91). It is upregulated','3.42-fold in IPF lungs, concentrated in Aberrant Basaloid','cells (94.2%), and participates in ECM-receptor, TGF-beta,','and PI3K-Akt signaling pathways.','','Literature match: 4/5 expected findings confirmed','Calibration: Well-calibrated (no overconfidence penalty)'],
+    reward: { validity: 1.00, ordering: 1.00, info_gain: 0.40, efficiency: 0.90, novelty: 0.50, penalty: 0.0 },
+    total: 0.91, terminal: true,
+  },
+];
+// State
+let running = false;
+let cumReward = 0;
+// DOM refs
+const terminalEl = document.getElementById('terminal');
+const statusDot = document.getElementById('statusDot');
+const statusText = document.getElementById('statusText');
+const runBtn = document.getElementById('runBtn');
+const labActionLabel = document.getElementById('labActionLabel');
+// Helpers
+function addLine(html) {
+  const div = document.createElement('div');
+  div.className = 't-line';
+  div.innerHTML = html || '&nbsp;';
+  terminalEl.appendChild(div);
+  terminalEl.scrollTop = terminalEl.scrollHeight;
+}
+function setGauge(id, value, pct, color) {
+  document.getElementById(id + 'Val').textContent = value;
+  const fill = document.getElementById(id + 'Fill');
+  fill.style.width = pct + '%';
+  if (color) fill.style.background = color;
+}
+function setRewardBars(r) {
+  for (const key of ['validity','ordering','info_gain','efficiency','novelty','penalty']) {
+    const el = document.getElementById('rw-' + key);
+    el.style.width = (r[key] * 100) + '%';
+    el.textContent = r[key] > 0.01 ? r[key].toFixed(2) : '';
+  }
+}
+function clearRewardBars() {
+  for (const key of ['validity','ordering','info_gain','efficiency','novelty','penalty']) {
+    const el = document.getElementById('rw-' + key);
+    el.style.width = '0%';
+    el.textContent = '';
+  }
+}
+function addPipeStep(step, index) {
+  const el = document.createElement('div');
+  el.className = 'pipe-step';
+  el.id = 'pipe-' + index;
+  const catColor = step.category === 'wet' ? 'var(--green)' : step.category === 'comp' ? 'var(--accent)' : 'var(--pink)';
+  el.innerHTML = `<div class="step-icon" style="color:${catColor};border-color:${catColor};">${index + 1}</div><span>${step.action}</span>`;
+  document.getElementById('pipelineSteps').appendChild(el);
+  requestAnimationFrame(() => el.classList.add('visible'));
+  return el;
+}
+function addDiscovery(d) {
+  const c = document.getElementById('discoveries');
+  if (c.querySelector('.empty-state')) c.innerHTML = '';
+  const el = document.createElement('div');
+  el.className = 'discovery';
+  el.innerHTML = `<div class="disc-icon" style="background:${d.bg};color:${d.color};">&#9670;</div><div class="disc-body"><div class="disc-title">${d.title}</div><div class="disc-detail">${d.detail}</div></div>`;
+  c.appendChild(el);
+  requestAnimationFrame(() => el.classList.add('visible'));
+}
+function addRewardHistory(step, index) {
+  const c = document.getElementById('rewardHistory');
+  if (c.querySelector('.empty-state')) c.innerHTML = '';
+  const el = document.createElement('div');
+  el.className = 'step-reward-mini';
+  el.innerHTML = `<span class="srm-name">${index + 1}. ${step.action}</span><span class="srm-val ${step.total >= 0 ? 'pos' : 'neg'}">${step.total >= 0 ? '+' : ''}${step.total.toFixed(2)}</span>`;
+  c.appendChild(el);
+  requestAnimationFrame(() => el.classList.add('visible'));
+}
+function selectScenario(el) {
+  if (running) return;
+  document.querySelectorAll('.scenario-opt').forEach(e => e.classList.remove('active'));
+  el.classList.add('active');
+}
+function wait(ms) { return new Promise(r => setTimeout(r, ms)); }
+// ---- Run ----
+async function startDemo() {
+  if (running) return;
+  running = true;
+  runBtn.disabled = true;
+  runBtn.textContent = 'Running...';
+  statusDot.classList.add('live');
+  statusText.textContent = 'Running';
+  terminalEl.innerHTML = '';
+  cumReward = 0;
+  document.getElementById('pipelineSteps').innerHTML = '';
+  document.getElementById('discoveries').innerHTML = '<div class="empty-state">No discoveries yet</div>';
+  document.getElementById('rewardHistory').innerHTML = '<div class="empty-state">No steps yet</div>';
+  document.getElementById('violations').innerHTML = '<div class="empty-state">No violations</div>';
+  clearRewardBars();
+  document.getElementById('cumReward').textContent = '0.00';
+  document.getElementById('stepRewardLabel').textContent = '--';
+  initAgent();
+  addLine('<span class="t-label">[BioEnv]</span> <span class="t-dim">Initializing environment...</span>');
+  await wait(500);
+  addLine('<span class="t-label">[BioEnv]</span> Scenario: <span class="t-str">biomarker_validation_lung</span> (Hard)');
+  await wait(200);
+  addLine('<span class="t-label">[BioEnv]</span> Organism: <span class="t-str">Homo sapiens</span> | Tissue: <span class="t-str">Lung</span>');
+  await wait(200);
+  addLine('<span class="t-label">[BioEnv]</span> Budget: <span class="t-num">$100,000</span> | Time: <span class="t-num">180 days</span> | Max steps: <span class="t-num">30</span>');
+  await wait(200);
+  addLine('<span class="t-label">[BioEnv]</span> Task: Validate <span class="t-kw">SPP1</span> as biomarker for idiopathic pulmonary fibrosis');
+  await wait(400);
+  addLine('');
+  for (let i = 0; i < EPISODE.length; i++) {
+    await runStep(i);
+    await wait(500);
+  }
+  // Done
+  moveAgentTo('idle');
+  labActionLabel.classList.remove('visible');
+  addLine('');
+  addLine('<span class="t-label">[BioEnv]</span> <span class="t-ok">Episode complete!</span>');
+  addLine('<span class="t-label">[BioEnv]</span> Total reward: <span class="t-ok">+' + cumReward.toFixed(2) + '</span> | Steps: <span class="t-num">' + EPISODE.length + '</span> | Budget remaining: <span class="t-num">$65,000</span>');
+  addLine('<span class="t-label">[BioEnv]</span> Literature match: <span class="t-ok">4/5 expected findings confirmed</span>');
+  addLine('<span class="t-label">[BioEnv]</span> Calibration: <span class="t-ok">Well-calibrated</span> (no overconfidence penalty)');
+  statusDot.classList.remove('live');
+  statusText.textContent = 'Complete';
+  runBtn.textContent = 'Run Episode';
+  runBtn.disabled = false;
+  running = false;
+}
+async function runStep(i) {
+  const step = EPISODE[i];
+  const station = ACTION_STATION[step.action] || 'computer';
+  // Move agent in lab
+  moveAgentTo(station);
+  labActionLabel.textContent = step.action + '()';
+  labActionLabel.classList.add('visible');
+  await wait(800); // wait for agent to travel
+  // Start working animation
+  setAgentWorking(step.action);
+  spawnParticles(agent.targetX, agent.targetY, STATIONS[station].color);
+  // Pipeline sidebar
+  const pipeEl = addPipeStep(step, i);
+  if (i > 0) {
+    const prev = document.getElementById('pipe-' + (i - 1));
+    prev.classList.remove('active');
+    prev.classList.add('done');
+    prev.querySelector('.step-icon').innerHTML = '&#10003;';
+  }
+  pipeEl.classList.add('active');
+  // Gauges
+  setGauge('budget', '$' + step.budget.toLocaleString(), step.budgetPct,
+    step.budgetPct > 50 ? 'var(--green)' : step.budgetPct > 25 ? 'var(--amber)' : 'var(--red)');
+  setGauge('time', step.time + ' / 180 days', step.timePct, 'var(--cyan)');
+  setGauge('step', (i + 1) + ' / 30', ((i + 1) / 30 * 100), 'var(--accent)');
+  // Terminal output
+  const catTag = step.category === 'wet' ? '<span class="t-ok">WET</span>'
+    : step.category === 'comp' ? '<span class="t-label">CMP</span>'
+    : '<span class="t-kw">META</span>';
+  addLine(`<span class="t-dim">Step ${i + 1}</span>  ${catTag}  <span class="t-fn">${step.action}</span>(<span class="t-str">${step.params}</span>)`);
+  await wait(300);
+  for (const line of step.output) {
+    addLine('  <span class="t-sub">' + line + '</span>');
+    await wait(80);
+  }
+  // Reward
+  cumReward += step.total;
+  document.getElementById('stepRewardLabel').textContent = 'Step ' + (i + 1) + ': ' + step.action;
+  setRewardBars(step.reward);
+  document.getElementById('cumReward').textContent = cumReward.toFixed(2);
+  addRewardHistory(step, i);
+  const rewardStr = step.total >= 0
+    ? '<span class="t-ok">+' + step.total.toFixed(2) + '</span>'
+    : '<span class="t-err">' + step.total.toFixed(2) + '</span>';
+  addLine(`  <span class="t-dim">reward: ${rewardStr}  <span class="t-dim">(cumulative: ${cumReward.toFixed(2)})</span></span>`);
+  addLine('');
+  if (step.discovery) addDiscovery(step.discovery);
+  // Done working
+  agent.working = false;
+  spawnParticles(agent.targetX, agent.targetY, '#34d399', 6);
+  if (step.terminal) {
+    pipeEl.classList.remove('active');
+    pipeEl.classList.add('done');
+    pipeEl.querySelector('.step-icon').innerHTML = '&#10003;';
+  }
+}
+function resetDemo() {
+  if (running) return;
+  terminalEl.innerHTML = '';
+  cumReward = 0;
+  document.getElementById('pipelineSteps').innerHTML = '';
+  document.getElementById('discoveries').innerHTML = '<div class="empty-state">No discoveries yet</div>';
+  document.getElementById('rewardHistory').innerHTML = '<div class="empty-state">No steps yet</div>';
+  document.getElementById('violations').innerHTML = '<div class="empty-state">No violations</div>';
+  clearRewardBars();
+  document.getElementById('cumReward').textContent = '0.00';
+  document.getElementById('stepRewardLabel').textContent = '--';
+  setGauge('budget', '$100,000', 100, 'var(--green)');
+  setGauge('time', '180 / 180 days', 100, 'var(--cyan)');
+  setGauge('step', '0 / 30', 0, 'var(--accent)');
+  statusDot.classList.remove('live');
+  statusText.textContent = 'Ready';
+  labActionLabel.classList.remove('visible');
+  initAgent();
+  addLine('<span class="t-dim">Environment reset. Click "Run Episode" to start.</span>');
+}
+// Init
+addLine('<span class="t-dim">BioEnv v1.0 | biomarker_validation_lung</span>');
+addLine('<span class="t-dim">Click "Run Episode" to start the demo.</span>');
+</script>
+</body>
+</html>

models.py ADDED Viewed

	@@ -0,0 +1,927 @@

+"""
+Data models for the Drug Target Validation RL Environment.
+Defines the POMDP action and observation contracts for an agent that acts
+as a computational pharma scientist. Given a proposed drug target and a
+disease context, the agent issues bioinformatics / clinical / experimental
+queries one at a time and finally submits a go / no-go validation report.
+"""
+from __future__ import annotations
+from enum import Enum
+from typing import Any, Dict, List, Optional
+from pydantic import BaseModel, Field
+from openenv.core.env_server.types import Action, Observation
+# ── Action vocabulary ───────────────────────────────────────────────────────
+class ActionType(str, Enum):
+    # Expression & Omics
+    QUERY_EXPRESSION = "query_expression"
+    DIFFERENTIAL_EXPRESSION = "differential_expression"
+    PATHWAY_ENRICHMENT = "pathway_enrichment"
+    COEXPRESSION_NETWORK = "coexpression_network"
+    # Protein & Structure
+    PROTEIN_STRUCTURE_LOOKUP = "protein_structure_lookup"
+    BINDING_SITE_ANALYSIS = "binding_site_analysis"
+    PROTEIN_INTERACTION_NETWORK = "protein_interaction_network"
+    DRUGGABILITY_SCREEN = "druggability_screen"
+    # Clinical & Safety
+    CLINICAL_TRIAL_LOOKUP = "clinical_trial_lookup"
+    TOXICITY_PANEL = "toxicity_panel"
+    OFF_TARGET_SCREEN = "off_target_screen"
+    PATIENT_STRATIFICATION = "patient_stratification"
+    # Literature & Evidence
+    LITERATURE_SEARCH = "literature_search"
+    EVIDENCE_SYNTHESIS = "evidence_synthesis"
+    COMPETITOR_LANDSCAPE = "competitor_landscape"
+    # Experimental (expensive, consume more credits)
+    IN_VITRO_ASSAY = "in_vitro_assay"
+    IN_VIVO_MODEL = "in_vivo_model"
+    CRISPR_KNOCKOUT = "crispr_knockout"
+    BIOMARKER_CORRELATION = "biomarker_correlation"
+    # Meta
+    FLAG_RED_FLAG = "flag_red_flag"
+    REQUEST_EXPERT_REVIEW = "request_expert_review"
+    SUBMIT_VALIDATION_REPORT = "submit_validation_report"  # terminal action
+OMICS_ACTIONS = frozenset({
+    ActionType.QUERY_EXPRESSION,
+    ActionType.DIFFERENTIAL_EXPRESSION,
+    ActionType.PATHWAY_ENRICHMENT,
+    ActionType.COEXPRESSION_NETWORK,
+})
+PROTEIN_ACTIONS = frozenset({
+    ActionType.PROTEIN_STRUCTURE_LOOKUP,
+    ActionType.BINDING_SITE_ANALYSIS,
+    ActionType.PROTEIN_INTERACTION_NETWORK,
+    ActionType.DRUGGABILITY_SCREEN,
+})
+CLINICAL_ACTIONS = frozenset({
+    ActionType.CLINICAL_TRIAL_LOOKUP,
+    ActionType.TOXICITY_PANEL,
+    ActionType.OFF_TARGET_SCREEN,
+    ActionType.PATIENT_STRATIFICATION,
+})
+LITERATURE_ACTIONS = frozenset({
+    ActionType.LITERATURE_SEARCH,
+    ActionType.EVIDENCE_SYNTHESIS,
+    ActionType.COMPETITOR_LANDSCAPE,
+})
+EXPERIMENTAL_ACTIONS = frozenset({
+    ActionType.IN_VITRO_ASSAY,
+    ActionType.IN_VIVO_MODEL,
+    ActionType.CRISPR_KNOCKOUT,
+    ActionType.BIOMARKER_CORRELATION,
+})
+META_ACTIONS = frozenset({
+    ActionType.FLAG_RED_FLAG,
+    ActionType.REQUEST_EXPERT_REVIEW,
+    ActionType.SUBMIT_VALIDATION_REPORT,
+})
+# ── Tool registry (pharma / bioinformatics) ─────────────────────────────────
+class ToolCategory(str, Enum):
+    EXPRESSION_DB = "expression_db"
+    OMICS_ANALYSIS = "omics_analysis"
+    PATHWAY_DB = "pathway_db"
+    PROTEIN_STRUCTURE = "protein_structure"
+    BINDING_SITE = "binding_site"
+    INTERACTION_NETWORK = "interaction_network"
+    DRUGGABILITY = "druggability"
+    CLINICAL_DB = "clinical_db"
+    SAFETY_DB = "safety_db"
+    OFF_TARGET = "off_target"
+    LITERATURE = "literature"
+    PATIENT_GENOMICS = "patient_genomics"
+    IN_VITRO = "in_vitro"
+    IN_VIVO = "in_vivo"
+    CRISPR = "crispr"
+    BIOMARKER = "biomarker"
+class ToolSpec(BaseModel):
+    """Registry entry describing a pharma / bioinformatics tool or database."""
+    name: str
+    category: ToolCategory
+    relevant_actions: List[ActionType] = Field(default_factory=list)
+    description: str = ""
+    input_types: List[str] = Field(default_factory=list)
+    output_types: List[str] = Field(default_factory=list)
+    typical_runtime_hours: float = 0.1
+    typical_credit_cost: int = 1
+    requires_compute: bool = False
+    open_source: bool = True
+TOOL_REGISTRY: Dict[str, ToolSpec] = {
+    # ── Expression & omics databases ──
+    "GTEx": ToolSpec(
+        name="GTEx",
+        category=ToolCategory.EXPRESSION_DB,
+        relevant_actions=[ActionType.QUERY_EXPRESSION],
+        description="Tissue-level expression atlas across normal human tissues",
+        input_types=["gene_symbol"],
+        output_types=["tissue_expression"],
+        typical_credit_cost=2,
+    ),
+    "TCGA": ToolSpec(
+        name="TCGA",
+        category=ToolCategory.EXPRESSION_DB,
+        relevant_actions=[
+            ActionType.QUERY_EXPRESSION,
+            ActionType.DIFFERENTIAL_EXPRESSION,
+            ActionType.BIOMARKER_CORRELATION,
+        ],
+        description="The Cancer Genome Atlas tumor vs normal expression / mutation",
+        input_types=["gene_symbol", "indication"],
+        output_types=["tumor_expression", "mutation_frequency"],
+        typical_credit_cost=2,
+    ),
+    "Human_Protein_Atlas": ToolSpec(
+        name="Human_Protein_Atlas",
+        category=ToolCategory.EXPRESSION_DB,
+        relevant_actions=[ActionType.QUERY_EXPRESSION],
+        description="Antibody-based protein expression across normal and cancer tissues",
+        input_types=["gene_symbol"],
+        output_types=["protein_expression", "tissue_specificity"],
+    ),
+    "DepMap": ToolSpec(
+        name="DepMap",
+        category=ToolCategory.OMICS_ANALYSIS,
+        relevant_actions=[
+            ActionType.CRISPR_KNOCKOUT,
+            ActionType.COEXPRESSION_NETWORK,
+        ],
+        description="Cancer Dependency Map: genome-scale CRISPR essentiality scores",
+        input_types=["gene_symbol", "cell_line_panel"],
+        output_types=["essentiality_score", "synthetic_lethality"],
+        typical_credit_cost=4,
+    ),
+    "ARCHS4": ToolSpec(
+        name="ARCHS4",
+        category=ToolCategory.OMICS_ANALYSIS,
+        relevant_actions=[
+            ActionType.COEXPRESSION_NETWORK,
+            ActionType.QUERY_EXPRESSION,
+        ],
+        description="Massive RNA-seq compendium for coexpression and tissue queries",
+        input_types=["gene_symbol"],
+        output_types=["coexpression_partners"],
+    ),
+    "GEO": ToolSpec(
+        name="GEO",
+        category=ToolCategory.OMICS_ANALYSIS,
+        relevant_actions=[
+            ActionType.DIFFERENTIAL_EXPRESSION,
+            ActionType.QUERY_EXPRESSION,
+        ],
+        description="Gene Expression Omnibus: curated bulk and single-cell datasets",
+        input_types=["gene_symbol", "indication"],
+        output_types=["de_result"],
+    ),
+    # ── Pathway / annotation databases ──
+    "Reactome": ToolSpec(
+        name="Reactome",
+        category=ToolCategory.PATHWAY_DB,
+        relevant_actions=[ActionType.PATHWAY_ENRICHMENT],
+        description="Curated human pathway and reaction database",
+        input_types=["gene_list"],
+        output_types=["pathway_enrichment"],
+    ),
+    "KEGG": ToolSpec(
+        name="KEGG",
+        category=ToolCategory.PATHWAY_DB,
+        relevant_actions=[ActionType.PATHWAY_ENRICHMENT],
+        description="KEGG metabolic and signalling pathways",
+        input_types=["gene_list"],
+        output_types=["pathway_enrichment"],
+    ),
+    "MSigDB": ToolSpec(
+        name="MSigDB",
+        category=ToolCategory.PATHWAY_DB,
+        relevant_actions=[ActionType.PATHWAY_ENRICHMENT],
+        description="Molecular Signatures Database for GSEA",
+        input_types=["ranked_gene_list"],
+        output_types=["pathway_enrichment"],
+    ),
+    # ── Protein structure / binding-site tools ──
+    "AlphaFold": ToolSpec(
+        name="AlphaFold",
+        category=ToolCategory.PROTEIN_STRUCTURE,
+        relevant_actions=[
+            ActionType.PROTEIN_STRUCTURE_LOOKUP,
+            ActionType.BINDING_SITE_ANALYSIS,
+        ],
+        description="Predicted full-length 3D protein structures",
+        input_types=["uniprot_id", "gene_symbol"],
+        output_types=["pdb_structure", "plddt_confidence"],
+        typical_credit_cost=3,
+    ),
+    "PDB": ToolSpec(
+        name="PDB",
+        category=ToolCategory.PROTEIN_STRUCTURE,
+        relevant_actions=[ActionType.PROTEIN_STRUCTURE_LOOKUP],
+        description="Experimentally determined protein structures",
+        input_types=["uniprot_id"],
+        output_types=["pdb_structure"],
+    ),
+    "UniProt": ToolSpec(
+        name="UniProt",
+        category=ToolCategory.PROTEIN_STRUCTURE,
+        relevant_actions=[
+            ActionType.PROTEIN_STRUCTURE_LOOKUP,
+            ActionType.PROTEIN_INTERACTION_NETWORK,
+        ],
+        description="Curated protein sequence and functional annotation",
+        input_types=["gene_symbol"],
+        output_types=["uniprot_entry", "domain_annotation"],
+    ),
+    "fpocket": ToolSpec(
+        name="fpocket",
+        category=ToolCategory.BINDING_SITE,
+        relevant_actions=[ActionType.BINDING_SITE_ANALYSIS],
+        description="Geometric pocket detection on protein structures",
+        input_types=["pdb_structure"],
+        output_types=["pocket_list", "druggability_score"],
+        requires_compute=True,
+    ),
+    "SiteMap": ToolSpec(
+        name="SiteMap",
+        category=ToolCategory.BINDING_SITE,
+        relevant_actions=[ActionType.BINDING_SITE_ANALYSIS],
+        description="Schrödinger binding-site detection and scoring",
+        input_types=["pdb_structure"],
+        output_types=["pocket_list", "site_score"],
+        open_source=False,
+        typical_credit_cost=3,
+    ),
+    # ── Druggability / chemistry ──
+    "ChEMBL": ToolSpec(
+        name="ChEMBL",
+        category=ToolCategory.DRUGGABILITY,
+        relevant_actions=[
+            ActionType.DRUGGABILITY_SCREEN,
+            ActionType.COMPETITOR_LANDSCAPE,
+        ],
+        description="Bioactivity database of drug-like molecules vs targets",
+        input_types=["gene_symbol", "uniprot_id"],
+        output_types=["bioactivity", "known_ligands"],
+        typical_credit_cost=3,
+    ),
+    "DrugBank": ToolSpec(
+        name="DrugBank",
+        category=ToolCategory.DRUGGABILITY,
+        relevant_actions=[
+            ActionType.DRUGGABILITY_SCREEN,
+            ActionType.COMPETITOR_LANDSCAPE,
+        ],
+        description="Comprehensive drug and target reference",
+        input_types=["gene_symbol"],
+        output_types=["approved_drugs", "drug_target_pairs"],
+    ),
+    "OpenTargets": ToolSpec(
+        name="OpenTargets",
+        category=ToolCategory.DRUGGABILITY,
+        relevant_actions=[
+            ActionType.DRUGGABILITY_SCREEN,
+            ActionType.EVIDENCE_SYNTHESIS,
+        ],
+        description="Integrated target-disease evidence platform",
+        input_types=["gene_symbol", "indication"],
+        output_types=["target_score", "evidence_summary"],
+    ),
+    "canSAR": ToolSpec(
+        name="canSAR",
+        category=ToolCategory.DRUGGABILITY,
+        relevant_actions=[ActionType.DRUGGABILITY_SCREEN],
+        description="Cancer translational research and drug discovery knowledgebase",
+        input_types=["gene_symbol"],
+        output_types=["druggability_score", "ligandability"],
+    ),
+    # ── Interaction networks ──
+    "STRING": ToolSpec(
+        name="STRING",
+        category=ToolCategory.INTERACTION_NETWORK,
+        relevant_actions=[
+            ActionType.PROTEIN_INTERACTION_NETWORK,
+            ActionType.COEXPRESSION_NETWORK,
+        ],
+        description="Protein-protein interaction database with confidence scores",
+        input_types=["gene_symbol"],
+        output_types=["ppi_network"],
+    ),
+    "BioGRID": ToolSpec(
+        name="BioGRID",
+        category=ToolCategory.INTERACTION_NETWORK,
+        relevant_actions=[ActionType.PROTEIN_INTERACTION_NETWORK],
+        description="Curated genetic and protein-protein interactions",
+        input_types=["gene_symbol"],
+        output_types=["ppi_network", "genetic_interactions"],
+    ),
+    # ── Clinical & safety ──
+    "ClinicalTrials_gov": ToolSpec(
+        name="ClinicalTrials_gov",
+        category=ToolCategory.CLINICAL_DB,
+        relevant_actions=[
+            ActionType.CLINICAL_TRIAL_LOOKUP,
+            ActionType.COMPETITOR_LANDSCAPE,
+        ],
+        description="Registry of human clinical trials worldwide",
+        input_types=["gene_symbol", "indication"],
+        output_types=["trial_list", "phase_status"],
+    ),
+    "FAERS": ToolSpec(
+        name="FAERS",
+        category=ToolCategory.SAFETY_DB,
+        relevant_actions=[ActionType.TOXICITY_PANEL],
+        description="FDA Adverse Event Reporting System",
+        input_types=["drug_name", "gene_symbol"],
+        output_types=["adverse_events"],
+    ),
+    "ToxCast": ToolSpec(
+        name="ToxCast",
+        category=ToolCategory.SAFETY_DB,
+        relevant_actions=[ActionType.TOXICITY_PANEL],
+        description="EPA high-throughput toxicology assays",
+        input_types=["compound", "gene_symbol"],
+        output_types=["toxicity_assays"],
+        typical_credit_cost=3,
+    ),
+    "gnomAD": ToolSpec(
+        name="gnomAD",
+        category=ToolCategory.PATIENT_GENOMICS,
+        relevant_actions=[
+            ActionType.PATIENT_STRATIFICATION,
+            ActionType.OFF_TARGET_SCREEN,
+        ],
+        description="Population variant frequencies and constraint metrics",
+        input_types=["gene_symbol"],
+        output_types=["pLI_score", "loftool_score"],
+    ),
+    "ClinVar": ToolSpec(
+        name="ClinVar",
+        category=ToolCategory.PATIENT_GENOMICS,
+        relevant_actions=[ActionType.PATIENT_STRATIFICATION],
+        description="Clinically interpreted germline and somatic variants",
+        input_types=["gene_symbol"],
+        output_types=["pathogenic_variants"],
+    ),
+    # ── Off-target / selectivity ──
+    "Eurofins_DiscoverX": ToolSpec(
+        name="Eurofins_DiscoverX",
+        category=ToolCategory.OFF_TARGET,
+        relevant_actions=[ActionType.OFF_TARGET_SCREEN],
+        description="Kinome-wide selectivity profiling panels",
+        input_types=["compound"],
+        output_types=["kinase_selectivity"],
+        open_source=False,
+        typical_credit_cost=3,
+    ),
+    "SafetyPanel": ToolSpec(
+        name="SafetyPanel",
+        category=ToolCategory.OFF_TARGET,
+        relevant_actions=[
+            ActionType.OFF_TARGET_SCREEN,
+            ActionType.TOXICITY_PANEL,
+        ],
+        description="Standard secondary pharmacology / off-target assay panel",
+        input_types=["compound"],
+        output_types=["off_target_hits"],
+        typical_credit_cost=3,
+    ),
+    # ── Literature ──
+    "PubMed": ToolSpec(
+        name="PubMed",
+        category=ToolCategory.LITERATURE,
+        relevant_actions=[
+            ActionType.LITERATURE_SEARCH,
+            ActionType.EVIDENCE_SYNTHESIS,
+        ],
+        description="Biomedical literature database",
+        input_types=["query"],
+        output_types=["abstract_list"],
+        typical_credit_cost=1,
+    ),
+    "Europe_PMC": ToolSpec(
+        name="Europe_PMC",
+        category=ToolCategory.LITERATURE,
+        relevant_actions=[ActionType.LITERATURE_SEARCH],
+        description="Open biomedical literature search with full-text mining",
+        input_types=["query"],
+        output_types=["abstract_list", "fulltext_excerpts"],
+    ),
+    # ── Experimental wet-lab ──
+    "InVitroPanel": ToolSpec(
+        name="InVitroPanel",
+        category=ToolCategory.IN_VITRO,
+        relevant_actions=[
+            ActionType.IN_VITRO_ASSAY,
+            ActionType.BIOMARKER_CORRELATION,
+        ],
+        description="Cell-line viability / IC50 panel against the proposed target",
+        input_types=["compound", "cell_line_panel"],
+        output_types=["IC50", "selectivity_window"],
+        typical_runtime_hours=72.0,
+        typical_credit_cost=5,
+        requires_compute=False,
+    ),
+    "MouseModel": ToolSpec(
+        name="MouseModel",
+        category=ToolCategory.IN_VIVO,
+        relevant_actions=[ActionType.IN_VIVO_MODEL],
+        description="In-vivo efficacy + tolerability in disease-relevant mouse models",
+        input_types=["compound", "indication"],
+        output_types=["efficacy_endpoint", "tolerability", "PK_PD"],
+        typical_runtime_hours=720.0,
+        typical_credit_cost=8,
+    ),
+    "CRISPR_screen": ToolSpec(
+        name="CRISPR_screen",
+        category=ToolCategory.CRISPR,
+        relevant_actions=[ActionType.CRISPR_KNOCKOUT],
+        description="Genome- or focused-library CRISPR knockout / dependency screen",
+        input_types=["gene_symbol", "cell_line_panel"],
+        output_types=["essentiality_score", "synthetic_lethality"],
+        typical_credit_cost=4,
+    ),
+    "BiomarkerPanel": ToolSpec(
+        name="BiomarkerPanel",
+        category=ToolCategory.BIOMARKER,
+        relevant_actions=[
+            ActionType.BIOMARKER_CORRELATION,
+            ActionType.PATIENT_STRATIFICATION,
+        ],
+        description="Patient-derived biomarker correlation with target activity",
+        input_types=["gene_symbol", "patient_cohort"],
+        output_types=["biomarker_correlation"],
+        typical_credit_cost=3,
+    ),
+}
+# ── Registry helper functions ──────────────────────────────────────────────
+def tools_by_category(category: ToolCategory) -> List[ToolSpec]:
+    """Return all registered tools in a given category."""
+    return [t for t in TOOL_REGISTRY.values() if t.category == category]
+def tools_for_action(action_type: ActionType) -> List[ToolSpec]:
+    """Return all registered tools that are relevant for a given action type."""
+    return [t for t in TOOL_REGISTRY.values() if action_type in t.relevant_actions]
+# ── Action schema ───────────────────────────────────────────────────────────
+class DrugTargetAction(Action):
+    """Structured action for one drug-target-validation step.
+    Hybrid representation: a discrete ``action_type`` plus typed
+    ``parameters``, an optional free-text ``reasoning`` string, and the
+    terminal-only ``final_decision`` / ``confidence`` fields used when the
+    agent submits its validation report.
+    """
+    action_type: ActionType = Field(
+        ...,
+        description=(
+            "Discrete simulator step type. Each action type maps to a "
+            "specific class of pharma / bioinformatics query, in-vitro / "
+            "in-vivo experiment, or terminal report submission."
+        ),
+    )
+    parameters: Dict[str, Any] = Field(
+        default_factory=dict,
+        description=(
+            "Action-specific arguments such as the database to query, the "
+            "compound to profile, or include_allosteric flags. Use only "
+            "parameters that materially change the simulated output."
+        ),
+    )
+    reasoning: str = Field(
+        "",
+        description=(
+            "Short scientific rationale explaining why this is the right "
+            "next step in the current investigation."
+        ),
+    )
+    final_decision: Optional[str] = Field(
+        None,
+        description=(
+            "'go' or 'no_go' recommendation. Only set on a "
+            "SUBMIT_VALIDATION_REPORT action."
+        ),
+    )
+    confidence: Optional[float] = Field(
+        None,
+        ge=0.0,
+        le=1.0,
+        description=(
+            "Calibrated confidence in the final decision in [0, 1]. Only "
+            "set on a SUBMIT_VALIDATION_REPORT action."
+        ),
+    )
+# ── Intermediate outputs ──────────��─────────────────────────────────────────
+class OutputType(str, Enum):
+    EXPRESSION_RESULT = "expression_result"
+    DE_RESULT = "de_result"
+    PATHWAY_RESULT = "pathway_result"
+    COEXPRESSION_RESULT = "coexpression_result"
+    STRUCTURE_RESULT = "structure_result"
+    BINDING_SITE_RESULT = "binding_site_result"
+    INTERACTION_RESULT = "interaction_result"
+    DRUGGABILITY_RESULT = "druggability_result"
+    CLINICAL_RESULT = "clinical_result"
+    TOXICITY_RESULT = "toxicity_result"
+    OFF_TARGET_RESULT = "off_target_result"
+    PATIENT_STRATIFICATION_RESULT = "patient_stratification_result"
+    LITERATURE_RESULT = "literature_result"
+    EVIDENCE_SYNTHESIS_RESULT = "evidence_synthesis_result"
+    COMPETITOR_LANDSCAPE_RESULT = "competitor_landscape_result"
+    IN_VITRO_RESULT = "in_vitro_result"
+    IN_VIVO_RESULT = "in_vivo_result"
+    CRISPR_RESULT = "crispr_result"
+    BIOMARKER_RESULT = "biomarker_result"
+    RED_FLAG_NOTE = "red_flag_note"
+    EXPERT_REVIEW = "expert_review"
+    VALIDATION_REPORT = "validation_report"
+    FAILURE_REPORT = "failure_report"
+class IntermediateOutput(BaseModel):
+    """A single simulated output from one validation step."""
+    output_type: OutputType
+    step_index: int
+    success: bool = True
+    quality_score: float = Field(1.0, ge=0.0, le=1.0)
+    summary: str = ""
+    data: Dict[str, Any] = Field(default_factory=dict)
+    uncertainty: float = Field(0.0, ge=0.0, le=1.0)
+    warnings: List[str] = Field(default_factory=list)
+    artifacts_available: List[str] = Field(default_factory=list)
+# ── Observable state components ─────────────────────────────────────────────
+class CreditUsage(BaseModel):
+    """Agent-visible view of the experimental credit budget."""
+    credits_used: int = 0
+    credits_remaining: int = 50
+    credits_total: int = 50
+class ValidationStepRecord(BaseModel):
+    """One row of the agent's pipeline history."""
+    step_index: int
+    action_type: ActionType
+    parameters: Dict[str, Any] = Field(default_factory=dict)
+    output_summary: str = ""
+    output_type: OutputType
+    success: bool = True
+    quality_score: float = 1.0
+    credit_cost: int = 0
+class EvidenceDossier(BaseModel):
+    """Structured running dossier of everything the agent has discovered.
+    Maintained on the environment side and surfaced verbatim inside each
+    ``ValidationObservation``. It is the primary state the agent should
+    consult when deciding what to investigate next.
+    """
+    expression_findings: Dict[str, Any] = Field(default_factory=dict)
+    protein_findings: Dict[str, Any] = Field(default_factory=dict)
+    clinical_findings: Dict[str, Any] = Field(default_factory=dict)
+    safety_findings: Dict[str, Any] = Field(default_factory=dict)
+    literature_findings: Dict[str, Any] = Field(default_factory=dict)
+    experimental_results: List[Dict[str, Any]] = Field(default_factory=list)
+    flagged_red_flags: List[str] = Field(default_factory=list)
+    credits_used: int = 0
+class ValidationTaskSpec(BaseModel):
+    """Specification of the drug-target-validation problem to solve."""
+    problem_statement: str = "Unspecified drug target validation problem"
+    target_gene: str = "UNKNOWN"
+    disease_context: str = "unspecified disease"
+    indication: str = "unspecified indication"
+    credits_limit: int = 50
+    success_criteria: List[str] = Field(default_factory=list)
+    prior_observations: List[str] = Field(default_factory=list)
+    available_actions: List[str] = Field(
+        default_factory=lambda: [a.value for a in ActionType],
+    )
+    expected_findings: List[Any] = Field(default_factory=list)
+    dataset_metadata: Dict[str, Any] = Field(default_factory=dict)
+# ── Observation schema ──────────────────────────────────────────────────────
+class ValidationObservation(Observation):
+    """Full observable state returned to the agent at each timestep.
+    Deliberately excludes the hidden ``TargetProfile``, which the agent
+    must infer through investigation.
+    """
+    target_gene: str = "UNKNOWN"
+    disease_context: str = "unspecified disease"
+    indication: str = "unspecified indication"
+    credits_remaining: int = 50
+    credits_total: int = 50
+    dossier: EvidenceDossier = Field(default_factory=EvidenceDossier)
+    pipeline_history: List[Dict[str, Any]] = Field(default_factory=list)
+    available_actions: List[str] = Field(default_factory=list)
+    step_index: int = 0
+    done: bool = False
+    reward: float = 0.0
+    step_reward_breakdown: Dict[str, float] = Field(default_factory=dict)
+    rule_violations: List[str] = Field(default_factory=list)
+    latest_output: Optional[IntermediateOutput] = None
+    metadata: Dict[str, Any] = Field(default_factory=dict)
+# ── Agent prompt scaffolding ────────────────────────────────────────────────
+AGENT_ACTION_GUIDANCE: Dict[ActionType, str] = {
+    ActionType.QUERY_EXPRESSION: (
+        "Cheap expression lookup across normal and disease tissues. Run "
+        "early to gauge tissue specificity and disease over-expression."
+    ),
+    ActionType.DIFFERENTIAL_EXPRESSION: (
+        "Disease-vs-normal differential expression. Useful to confirm "
+        "disease-driven dysregulation of the target."
+    ),
+    ActionType.PATHWAY_ENRICHMENT: (
+        "Find pathways the target participates in. Best after expression / "
+        "DE so you have an informative gene context."
+    ),
+    ActionType.COEXPRESSION_NETWORK: (
+        "Identify functionally related genes. Useful for mechanism "
+        "hypotheses and synthetic-lethality candidates."
+    ),
+    ActionType.PROTEIN_STRUCTURE_LOOKUP: (
+        "Pull experimental or AlphaFold structures of the target."
+    ),
+    ActionType.BINDING_SITE_ANALYSIS: (
+        "Detect ligandable pockets. Pass include_allosteric=true for "
+        "non-classical sites."
+    ),
+    ActionType.PROTEIN_INTERACTION_NETWORK: (
+        "Map first-degree PPI partners. Useful for off-target reasoning."
+    ),
+    ActionType.DRUGGABILITY_SCREEN: (
+        "High-level druggability assessment. Critical for any go/no_go."
+    ),
+    ActionType.CLINICAL_TRIAL_LOOKUP: (
+        "Look up clinical precedent for this target / indication. Often "
+        "decisive for borderline scenarios."
+    ),
+    ActionType.TOXICITY_PANEL: (
+        "Probe target-mediated toxicity. Best after expression so on-target "
+        "tissue toxicity can be interpreted."
+    ),
+    ActionType.OFF_TARGET_SCREEN: (
+        "Quantify off-target / paralog selectivity. Always run when "
+        "selectivity is plausibly limiting."
+    ),
+    ActionType.PATIENT_STRATIFICATION: (
+        "Identify responder subpopulations and biomarker hypotheses."
+    ),
+    ActionType.LITERATURE_SEARCH: (
+        "Cheap PubMed / Europe-PMC scan. Cheap to run and often surfaces "
+        "recent precedent that overrides historical priors."
+    ),
+    ActionType.EVIDENCE_SYNTHESIS: (
+        "Aggregate prior findings into a coherent picture. Best run after "
+        "several queries have populated the dossier."
+    ),
+    ActionType.COMPETITOR_LANDSCAPE: (
+        "Survey other programs against the same target. Useful for "
+        "differentiation strategy."
+    ),
+    ActionType.IN_VITRO_ASSAY: (
+        "Expensive cell-line assay (5 credits). Run after computational "
+        "evidence justifies wet-lab spend."
+    ),
+    ActionType.IN_VIVO_MODEL: (
+        "Most expensive action (8 credits). Should only follow positive "
+        "in-vitro signal."
+    ),
+    ActionType.CRISPR_KNOCKOUT: (
+        "Functional knockout / dependency check (4 credits)."
+    ),
+    ActionType.BIOMARKER_CORRELATION: (
+        "Correlate target activity with patient biomarkers (3 credits)."
+    ),
+    ActionType.FLAG_RED_FLAG: (
+        "Free annotation that records a concern in the dossier without "
+        "spending credits."
+    ),
+    ActionType.REQUEST_EXPERT_REVIEW: (
+        "Lightweight critique by a simulated reviewer. Use sparingly."
+    ),
+    ActionType.SUBMIT_VALIDATION_REPORT: (
+        "Terminal action. Must include final_decision ('go' / 'no_go') and "
+        "a calibrated confidence score; the episode ends immediately."
+    ),
+}
+AGENT_ENVIRONMENT_RULES: List[str] = [
+    (
+        "You start with a fixed pool of experimental credits; every action "
+        "deducts a known credit cost and credit-exhaustion ends the episode."
+    ),
+    (
+        "Each successful action returns concrete pharma evidence, so "
+        "repeated queries of the same type are usually wasteful."
+    ),
+    (
+        "Some prerequisites apply: e.g. interpret toxicity in light of "
+        "expression, and run in-vitro work before in-vivo."
+    ),
+    (
+        "Always finish the episode by submitting a calibrated "
+        "submit_validation_report — exhausting credits without a report "
+        "yields the worst possible reward."
+    ),
+]
+_TOOL_CATEGORY_AGENT_NOTES: Dict[ToolCategory, str] = {
+    ToolCategory.EXPRESSION_DB: (
+        "Use early to characterise expression in normal vs disease tissue."
+    ),
+    ToolCategory.OMICS_ANALYSIS: (
+        "Use to mine bulk / single-cell expression compendia for context."
+    ),
+    ToolCategory.PATHWAY_DB: (
+        "Use after gathering a gene list for enrichment / mechanism."
+    ),
+    ToolCategory.PROTEIN_STRUCTURE: (
+        "Use when reasoning about binding pockets or structure-based design."
+    ),
+    ToolCategory.BINDING_SITE: (
+        "Use to score pocket druggability and detect allosteric sites."
+    ),
+    ToolCategory.INTERACTION_NETWORK: (
+        "Use to reason about partners, paralogs, and pathway context."
+    ),
+    ToolCategory.DRUGGABILITY: (
+        "Use to assess overall ligandability and known chemical matter."
+    ),
+    ToolCategory.CLINICAL_DB: (
+        "Use to gather clinical precedent and competitor activity."
+    ),
+    ToolCategory.SAFETY_DB: (
+        "Use after expression / off-target queries to interpret risk."
+    ),
+    ToolCategory.OFF_TARGET: (
+        "Use whenever paralogs or kinase selectivity could limit the program."
+    ),
+    ToolCategory.LITERATURE: (
+        "Cheap and often decisive — recent literature can flip historical "
+        "priors."
+    ),
+    ToolCategory.PATIENT_GENOMICS: (
+        "Use for stratification and human genetics-based de-risking."
+    ),
+    ToolCategory.IN_VITRO: (
+        "Expensive; run only after computational evidence justifies it."
+    ),
+    ToolCategory.IN_VIVO: (
+        "Most expensive; only run after in-vitro / target-engagement data."
+    ),
+    ToolCategory.CRISPR: (
+        "Use to test functional dependency or synthetic lethality."
+    ),
+    ToolCategory.BIOMARKER: (
+        "Use to correlate target activity with patient-level biomarkers."
+    ),
+}
+def describe_tool_for_agent(tool_name: str) -> str:
+    """Return a compact environment-aware tool description for prompts."""
+    tool = TOOL_REGISTRY.get(tool_name)
+    if tool is None:
+        return tool_name
+    parts = [f"{tool.name}: {tool.description}."]
+    if tool.input_types or tool.output_types:
+        inputs = ", ".join(tool.input_types) or "context"
+        outputs = ", ".join(tool.output_types) or "evidence"
+        parts.append(f"Consumes {inputs}; yields {outputs}.")
+    category_note = _TOOL_CATEGORY_AGENT_NOTES.get(tool.category)
+    if category_note:
+        parts.append(category_note)
+    if tool.relevant_actions:
+        action_names = ", ".join(a.value for a in tool.relevant_actions[:3])
+        parts.append(f"Relevant for: {action_names}.")
+    if tool.typical_credit_cost > 0:
+        parts.append(f"Approx cost: {tool.typical_credit_cost} credits.")
+    return " ".join(parts)
+def build_agent_system_prompt() -> str:
+    """Build the shared agent system prompt for training and inference."""
+    lines = [
+        "You are a computational drug discovery scientist evaluating a "
+        "proposed drug target.",
+        "",
+        "Each turn, you observe the running evidence dossier and remaining "
+        "credits, and you must pick the next investigation step. Your goal "
+        "is to gather sufficient evidence to submit a calibrated go / no_go "
+        "validation report before credits run out.",
+        "",
+        "Environment-specific reasoning rules:",
+    ]
+    lines.extend(f"  - {rule}" for rule in AGENT_ENVIRONMENT_RULES)
+    lines.append("")
+    lines.append("Action guidance:")
+    lines.extend(
+        f"  - {action_type.value}: {AGENT_ACTION_GUIDANCE[action_type]}"
+        for action_type in ActionType
+    )
+    lines.extend([
+        "",
+        "Respond with ONLY valid JSON, nothing else:",
+        '{"action_type": "...", "parameters": {}, "reasoning": "..."}',
+        "",
+        "When you submit the final report, use this exact shape:",
+        '{"action_type": "submit_validation_report", "parameters": {}, '
+        '"reasoning": "...", "final_decision": "go", "confidence": 0.8}',
+    ])
+    return "\n".join(lines)
+def build_agent_observation_context(
+    obs: ValidationObservation,
+    *,
+    max_tools: int = 6,
+) -> str:
+    """Summarize action / tool context for the agent's prompt."""
+    sections: List[str] = []
+    sections.append(
+        f"Target: {obs.target_gene} | Indication: {obs.indication} | "
+        f"Disease: {obs.disease_context}"
+    )
+    sections.append(
+        f"Credits: {obs.credits_remaining}/{obs.credits_total} remaining"
+    )
+    by_category: Dict[ToolCategory, List[ToolSpec]] = {}
+    for tool in TOOL_REGISTRY.values():
+        by_category.setdefault(tool.category, []).append(tool)
+    sections.append("Representative tools available (already filtered):")
+    shown = 0
+    for category, tools in by_category.items():
+        if shown >= max_tools:
+            break
+        first = tools[0]
+        sections.append(f"  - {describe_tool_for_agent(first.name)}")
+        shown += 1
+    return "\n".join(sections)

openenv.yaml ADDED Viewed

	@@ -0,0 +1,12 @@

+spec_version: 1
+name: drug_target_validation
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000
+description: "RL environment for drug target validation — agent investigates a proposed drug target and makes a go/no-go recommendation"
+tags:
+  - biology
+  - drug-discovery
+  - pharma
+  - world-modeling

pyproject.toml ADDED Viewed

	@@ -0,0 +1,55 @@

+[build-system]
+requires = ["setuptools>=45", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "drugenv"
+version = "0.1.0"
+description = "OpenEnv RL environment for teaching LLMs computational drug-target validation"
+requires-python = ">=3.10,<3.13"
+dependencies = [
+    "openenv-core[core]>=0.2.3",
+    "numpy>=1.24.0",
+    "scipy>=1.10.0",
+    "pydantic>=2.0.0",
+    "fastapi>=0.110",
+    "uvicorn>=0.27",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0.0",
+    "pytest-cov>=4.0.0",
+]
+train = [
+    "torch==2.6.0",
+    "torchvision==0.21.0",
+    "torchaudio==2.6.0",
+    "transformers==4.51.3",
+    "trl==0.18.2",
+    "peft==0.13.2",
+    "accelerate==1.5.0",
+    "datasets==3.4.1",
+    "bitsandbytes==0.45.5",
+    "matplotlib>=3.8",
+    "huggingface_hub>=0.26",
+]
+[project.scripts]
+drugenv-server = "server.app:main"
+[tool.setuptools]
+include-package-data = true
+packages = [
+    "server",
+    "server.simulator",
+    "server.rules",
+    "server.rewards",
+    "server.tasks",
+    "server.biology",
+    "training",
+    "tests",
+]
+[tool.uv]
+package = false

server/Dockerfile ADDED Viewed

	@@ -0,0 +1,80 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+# Multi-stage build using openenv-base
+# This Dockerfile is flexible and works for both:
+# - In-repo environments (with local OpenEnv sources)
+# - Standalone environments (with openenv from PyPI/Git)
+# The build script (openenv build) handles context detection and sets appropriate build args.
+ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
+FROM ${BASE_IMAGE} AS builder
+WORKDIR /app
+# Ensure git is available (required for installing dependencies from VCS)
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends git && \
+    rm -rf /var/lib/apt/lists/*
+# Build argument to control whether we're building standalone or in-repo
+ARG BUILD_MODE=in-repo
+ARG ENV_NAME=hackathon
+# Copy environment code (always at root of build context)
+COPY . /app/env
+# For in-repo builds, openenv is already vendored in the build context
+# For standalone builds, openenv will be installed via pyproject.toml
+WORKDIR /app/env
+# Ensure uv is available (for local builds where base image lacks it)
+RUN if ! command -v uv >/dev/null 2>&1; then \
+        curl -LsSf https://astral.sh/uv/install.sh | sh && \
+        mv /root/.local/bin/uv /usr/local/bin/uv && \
+        mv /root/.local/bin/uvx /usr/local/bin/uvx; \
+    fi
+# Install dependencies using uv sync
+# If uv.lock exists, use it; otherwise resolve on the fly
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-install-project --no-editable; \
+    else \
+        uv sync --no-install-project --no-editable; \
+    fi
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-editable; \
+    else \
+        uv sync --no-editable; \
+    fi
+# Final runtime stage
+FROM ${BASE_IMAGE}
+WORKDIR /app
+# Copy the virtual environment from builder
+COPY --from=builder /app/env/.venv /app/.venv
+# Copy the environment code
+COPY --from=builder /app/env /app/env
+# Set PATH to use the virtual environment
+ENV PATH="/app/.venv/bin:$PATH"
+# Set PYTHONPATH so imports work correctly
+ENV PYTHONPATH="/app/env:$PYTHONPATH"
+# Health check
+HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
+    CMD curl -f http://localhost:8000/health || exit 1
+# Run the FastAPI server
+# The module path is constructed to work with the /app/env structure
+CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]

server/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ from .hackathon_environment import DrugTargetEnvironment
2	+
3	+ __all__ = ["DrugTargetEnvironment"]

server/app.py ADDED Viewed

	@@ -0,0 +1,81 @@

+"""FastAPI application for the Drug Target Validation Environment.
+Endpoints:
+    - POST /reset:  Reset the environment
+    - POST /step:   Execute an action
+    - GET  /state:  Get current environment state
+    - GET  /schema: Get action/observation schemas
+    - WS   /ws:     WebSocket endpoint for persistent sessions
+    - GET  /        Demo UI
+"""
+import os
+from pathlib import Path
+try:
+    from openenv.core.env_server.http_server import create_app
+except Exception as e:  # pragma: no cover
+    raise ImportError(
+        "openenv is required for the web interface. "
+        "Install dependencies with 'uv sync'"
+    ) from e
+from fastapi.responses import HTMLResponse
+from models import DrugTargetAction, ValidationObservation
+from .hackathon_environment import DrugTargetEnvironment
+app = create_app(
+    DrugTargetEnvironment,
+    DrugTargetAction,
+    ValidationObservation,
+    env_name="drug_target_validation",
+    max_concurrent_envs=int(os.environ.get("MAX_ENVS", "4")),
+)
+DEMO_HTML = Path(__file__).resolve().parent.parent / "demo.html"
+@app.get("/", response_class=HTMLResponse)
+async def demo_ui():
+    if DEMO_HTML.exists():
+        return HTMLResponse(content=DEMO_HTML.read_text(), status_code=200)
+    return HTMLResponse(
+        content=(
+            "<h1>Drug Target Validation Env API</h1>"
+            "<p>Visit /docs for API documentation.</p>"
+        ),
+        status_code=200,
+    )
+def main(host: str = "0.0.0.0", port: int = None):
+    import uvicorn
+    if port is None:
+        port = int(os.environ.get("PORT", "8000"))
+    uvicorn.run(app, host=host, port=port)
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--host", default="0.0.0.0")
+    parser.add_argument("--port", type=int, default=None)
+    args = parser.parse_args()
+    main(host=args.host, port=args.port)
+# ── Mount the Gradio demo at /demo (env Space landing) ────────────────────
+# Optional: the env Space ships a small Gradio Blocks UI at
+# ``space/env/gradio_demo.py``; mounting it here means a single Docker
+# image can serve both the OpenEnv HTTP API and the human-friendly
+# demo. Failures are degraded silently so a server-only deploy (no
+# gradio installed) still boots.
+try:  # pragma: no cover - import-time best-effort
+    import gradio as _gr  # type: ignore
+    from space.env.gradio_demo import build_gradio_demo as _build_gradio_demo
+    _demo = _build_gradio_demo()
+    if isinstance(_demo, _gr.Blocks):
+        _gr.mount_gradio_app(app, _demo, path="/demo")
+except Exception:  # pragma: no cover
+    pass

server/biology/__init__.py ADDED Viewed

	@@ -0,0 +1,11 @@

+from .target_index import (
+    score_decision_accuracy,
+    score_evidence_coverage,
+    score_reasoning_coherence,
+)
+__all__ = [
+    "score_decision_accuracy",
+    "score_evidence_coverage",
+    "score_reasoning_coherence",
+]

server/biology/target_index.py ADDED Viewed

	@@ -0,0 +1,96 @@

+"""Scoring helpers for the drug-target-validation reward function.
+Implements the three core terminal-reward signals:
+  * ``score_evidence_coverage`` — what fraction of the scenario's
+    ``key_evidence_dimensions`` did the agent actually investigate?
+  * ``score_decision_accuracy`` — was the final go / no_go correct, scaled
+    by how confidently it was stated?
+  * ``score_reasoning_coherence`` — did the agent's action sequence
+    respect light scientific prerequisites (e.g. expression before
+    toxicity, in-vitro before in-vivo)?
+"""
+from __future__ import annotations
+from typing import Iterable, List, Sequence
+# Soft prerequisite map used by ``score_reasoning_coherence``. Each key
+# *should* be preceded by at least one of the listed prerequisite action
+# names earlier in the trajectory.
+_PREREQUISITES: dict = {
+    "toxicity_panel": ["query_expression"],
+    "in_vivo_model": ["in_vitro_assay"],
+    "biomarker_correlation": ["query_expression", "patient_stratification"],
+    "off_target_screen": ["druggability_screen", "query_expression"],
+}
+def score_evidence_coverage(
+    discovered_dimensions: Iterable[str],
+    key_dimensions: Sequence[str],
+) -> float:
+    """Fraction of ``key_dimensions`` that appear in
+    ``discovered_dimensions``.
+    Returns 1.0 when no key dimensions are required (degenerate scenario)
+    so that the coverage term doesn't punish trivially-easy targets.
+    """
+    if not key_dimensions:
+        return 1.0
+    discovered = {d.lower() for d in discovered_dimensions}
+    hits = sum(1 for d in key_dimensions if d.lower() in discovered)
+    return hits / len(key_dimensions)
+def score_decision_accuracy(
+    predicted_decision: str | None,
+    confidence: float | None,
+    correct_decision: str,
+) -> float:
+    """Decision accuracy in [0, 1], with a confidence-aware scaling.
+    The base score is 1.0 for a correct decision and 0.0 for an incorrect
+    one. We then multiply by ``2 * |confidence - 0.5|`` so a confidently
+    correct answer is fully rewarded, an uncertain answer is partly
+    rewarded, and a confidently *wrong* answer is fully penalised.
+    """
+    if predicted_decision is None:
+        return 0.0
+    correct = predicted_decision.strip().lower() == correct_decision.strip().lower()
+    base = 1.0 if correct else 0.0
+    if confidence is None:
+        confidence = 0.5
+    confidence = max(0.0, min(1.0, float(confidence)))
+    confidence_weight = 2.0 * abs(confidence - 0.5)
+    if correct:
+        # Full score 1.0 when confident & correct, 0.0 when uncertain & correct.
+        return base * confidence_weight
+    # When wrong, return a *negative* signal so the caller can penalise
+    # confident wrong answers more than uncertain ones.
+    return -confidence_weight
+def score_reasoning_coherence(action_history: List[str]) -> float:
+    """Fraction of actions that respected their soft prerequisites.
+    An action with no listed prerequisite contributes a perfect 1.0.
+    """
+    if not action_history:
+        return 1.0
+    seen: set = set()
+    n_checked = 0
+    n_passed = 0
+    for action in action_history:
+        prereqs = _PREREQUISITES.get(action)
+        if prereqs is None:
+            seen.add(action)
+            continue
+        n_checked += 1
+        if any(req in seen for req in prereqs):
+            n_passed += 1
+        seen.add(action)
+    if n_checked == 0:
+        return 1.0
+    return n_passed / n_checked

server/hackathon_environment.py ADDED Viewed

	@@ -0,0 +1,325 @@

+"""Drug Target Validation Environment.
+Implements the OpenEnv ``Environment`` interface as a POMDP where the
+agent issues one structured pharma / bioinformatics step at a time and
+ultimately submits a go / no_go validation report.
+"""
+from __future__ import annotations
+from typing import Any, Dict, List, Optional
+from uuid import uuid4
+from openenv.core.env_server.interfaces import Environment
+from openenv.core.env_server.types import State
+from models import (
+    ActionType,
+    DrugTargetAction,
+    EvidenceDossier,
+    IntermediateOutput,
+    OutputType,
+    ValidationObservation,
+    ValidationStepRecord,
+    ValidationTaskSpec,
+)
+from server.rules.engine import RuleEngine
+from server.rewards.reward import RewardBreakdown, RewardComputer
+from server.simulator.latent_state import FullLatentState
+from server.simulator.noise import NoiseModel
+from server.simulator.transition import (
+    ACTION_COSTS,
+    TransitionEngine,
+    compute_action_cost,
+)
+from server.tasks.generator import TaskGenerator
+MAX_STEPS = 30
+class DrugTargetEnvironment(Environment):
+    """POMDP environment for drug target validation.
+    The agent observes ``ValidationObservation`` (partial view) while the
+    environment maintains a ``FullLatentState`` (hidden ``TargetProfile``
+    plus credit / progress state).
+    """
+    SUPPORTS_CONCURRENT_SESSIONS: bool = True
+    def __init__(
+        self,
+        scenario_name: Optional[str] = None,
+        *,
+        domain_randomise: bool = True,
+    ) -> None:
+        self._state = State(episode_id=str(uuid4()), step_count=0)
+        self._latent: Optional[FullLatentState] = None
+        self._task: Optional[ValidationTaskSpec] = None
+        self._scenario_name = scenario_name
+        self._noise = NoiseModel()
+        self._engine = TransitionEngine(self._noise)
+        self._rules = RuleEngine()
+        self._rewards = RewardComputer()
+        self._task_gen = TaskGenerator(domain_randomise=domain_randomise)
+        self._history: List[ValidationStepRecord] = []
+        self._dossier: EvidenceDossier = EvidenceDossier()
+        self._evidence_dimensions_covered: List[str] = []
+        self._action_history: List[str] = []
+        self._submitted_decision: Optional[str] = None
+        self._submitted_confidence: Optional[float] = None
+        self._cumulative_reward: float = 0.0
+    # ── Environment interface ───────────────────────────────────────────
+    def reset(self, seed: Optional[int] = None) -> ValidationObservation:
+        seed = seed if seed is not None else hash(uuid4()) % (2**31)
+        self._noise.reseed(seed)
+        self._state = State(episode_id=str(uuid4()), step_count=0)
+        self._task, self._latent = self._task_gen.generate(
+            seed=seed,
+            scenario_name=self._scenario_name,
+        )
+        self._latent.rng_seed = seed
+        self._history.clear()
+        self._dossier = EvidenceDossier(
+            credits_used=0,
+        )
+        self._evidence_dimensions_covered.clear()
+        self._action_history.clear()
+        self._submitted_decision = None
+        self._submitted_confidence = None
+        self._cumulative_reward = 0.0
+        return self._build_observation(reward=0.0, done=False)
+    def step(  # type: ignore[override]
+        self, action: DrugTargetAction
+    ) -> ValidationObservation:
+        assert self._latent is not None, "Call reset() before step()"
+        assert self._task is not None
+        self._state.step_count += 1
+        prev_state = self._latent.model_copy(deep=True)
+        prev_history = list(self._action_history)
+        violations = self._rules.check(
+            action,
+            self._latent,
+            evidence_dimensions_covered=self._evidence_dimensions_covered,
+        )
+        hard_v = self._rules.hard_violations(violations)
+        soft_v = self._rules.soft_violations(violations)
+        result = self._engine.step(
+            self._latent,
+            action,
+            hard_violations=hard_v,
+            soft_violations=soft_v,
+        )
+        self._latent = result.next_state
+        self._action_history.append(action.action_type.value)
+        step_rb = self._rewards.step_reward(
+            action,
+            prev_state,
+            self._latent,
+            result.output,
+            hard_v,
+            soft_v,
+            action_history=prev_history,
+        )
+        cost = compute_action_cost(action)
+        self._history.append(ValidationStepRecord(
+            step_index=self._state.step_count,
+            action_type=action.action_type,
+            parameters=action.parameters,
+            output_summary=result.output.summary,
+            output_type=result.output.output_type,
+            success=result.output.success,
+            quality_score=result.output.quality_score,
+            credit_cost=cost,
+        ))
+        self._update_discoveries(action, result.output)
+        self._dossier.credits_used = self._latent.credits.credits_used
+        if (
+            action.action_type == ActionType.SUBMIT_VALIDATION_REPORT
+            and result.output.success
+            and not hard_v
+        ):
+            self._submitted_decision = action.final_decision
+            self._submitted_confidence = action.confidence
+        done = result.done or self._state.step_count >= MAX_STEPS
+        terminal_rb = RewardBreakdown()
+        if done:
+            terminal_rb = self._rewards.terminal_reward(
+                self._latent,
+                final_decision=self._submitted_decision,
+                confidence=self._submitted_confidence,
+                action_history=list(self._action_history),
+            )
+        total_reward = step_rb.total + terminal_rb.total
+        self._cumulative_reward += total_reward
+        breakdown = step_rb.to_dict()
+        breakdown.update({f"term_{k}": v for k, v in terminal_rb.to_dict().items()})
+        return self._build_observation(
+            reward=total_reward,
+            done=done,
+            latest_output=result.output,
+            rule_violations=hard_v + soft_v,
+            reward_breakdown=breakdown,
+            metadata_extra={"reward_breakdown": breakdown},
+        )
+    @property
+    def state(self) -> State:
+        return self._state
+    def set_scenario(self, scenario_name: Optional[str]) -> None:
+        """Set the scenario used on the next reset."""
+        self._scenario_name = scenario_name
+    # ── internal helpers ────────────────────────────────────────────────
+    def _build_observation(
+        self,
+        *,
+        reward: float,
+        done: bool,
+        latest_output: Optional[IntermediateOutput] = None,
+        rule_violations: Optional[List[str]] = None,
+        reward_breakdown: Optional[Dict[str, float]] = None,
+        metadata_extra: Optional[Dict[str, Any]] = None,
+    ) -> ValidationObservation:
+        assert self._task is not None
+        assert self._latent is not None
+        meta: Dict[str, Any] = {
+            "episode_id": self._state.episode_id,
+            "step": self._state.step_count,
+            "cumulative_reward": self._cumulative_reward,
+        }
+        if metadata_extra:
+            meta.update(metadata_extra)
+        return ValidationObservation(
+            target_gene=self._task.target_gene,
+            disease_context=self._task.disease_context,
+            indication=self._task.indication,
+            credits_remaining=self._latent.credits.credits_remaining,
+            credits_total=self._latent.credits.credits_total,
+            dossier=self._dossier.model_copy(deep=True),
+            pipeline_history=[h.model_dump() for h in self._history],
+            available_actions=list(self._task.available_actions),
+            step_index=self._state.step_count,
+            done=done,
+            reward=reward,
+            step_reward_breakdown=reward_breakdown or {},
+            rule_violations=rule_violations or [],
+            latest_output=latest_output,
+            metadata=meta,
+        )
+    def _update_discoveries(
+        self,
+        action: DrugTargetAction,
+        output: IntermediateOutput,
+    ) -> None:
+        """Fold the latest output into the running ``EvidenceDossier`` and
+        the per-dimension coverage tracker."""
+        if not output.success:
+            return
+        data = dict(output.data or {})
+        if output.output_type in {
+            OutputType.EXPRESSION_RESULT,
+            OutputType.DE_RESULT,
+            OutputType.PATHWAY_RESULT,
+            OutputType.COEXPRESSION_RESULT,
+        }:
+            self._dossier.expression_findings[action.action_type.value] = data
+            self._track_dim("expression")
+            if output.output_type == OutputType.PATHWAY_RESULT:
+                self._track_dim("pathway")
+        if output.output_type in {
+            OutputType.STRUCTURE_RESULT,
+            OutputType.BINDING_SITE_RESULT,
+            OutputType.INTERACTION_RESULT,
+            OutputType.DRUGGABILITY_RESULT,
+        }:
+            self._dossier.protein_findings[action.action_type.value] = data
+            if output.output_type in {
+                OutputType.DRUGGABILITY_RESULT,
+                OutputType.BINDING_SITE_RESULT,
+            }:
+                self._track_dim("druggability")
+            if output.output_type == OutputType.STRUCTURE_RESULT:
+                self._track_dim("structure")
+            if output.output_type == OutputType.INTERACTION_RESULT:
+                self._track_dim("interactions")
+        if output.output_type == OutputType.CLINICAL_RESULT:
+            self._dossier.clinical_findings[action.action_type.value] = data
+            self._track_dim("clinical")
+        if output.output_type == OutputType.PATIENT_STRATIFICATION_RESULT:
+            self._dossier.clinical_findings[action.action_type.value] = data
+            self._track_dim("patient_stratification")
+        if output.output_type in {
+            OutputType.TOXICITY_RESULT,
+            OutputType.OFF_TARGET_RESULT,
+        }:
+            self._dossier.safety_findings[action.action_type.value] = data
+            if output.output_type == OutputType.TOXICITY_RESULT:
+                self._track_dim("toxicity")
+            if output.output_type == OutputType.OFF_TARGET_RESULT:
+                self._track_dim("off_target")
+        if output.output_type in {
+            OutputType.LITERATURE_RESULT,
+            OutputType.EVIDENCE_SYNTHESIS_RESULT,
+            OutputType.COMPETITOR_LANDSCAPE_RESULT,
+        }:
+            self._dossier.literature_findings[action.action_type.value] = data
+            self._track_dim("literature")
+        if output.output_type in {
+            OutputType.IN_VITRO_RESULT,
+            OutputType.IN_VIVO_RESULT,
+            OutputType.CRISPR_RESULT,
+            OutputType.BIOMARKER_RESULT,
+        }:
+            entry = {"action": action.action_type.value, **data}
+            self._dossier.experimental_results.append(entry)
+            if output.output_type == OutputType.IN_VITRO_RESULT:
+                self._track_dim("in_vitro")
+            if output.output_type == OutputType.IN_VIVO_RESULT:
+                self._track_dim("in_vivo")
+            if output.output_type == OutputType.CRISPR_RESULT:
+                self._track_dim("crispr")
+            if output.output_type == OutputType.BIOMARKER_RESULT:
+                self._track_dim("biomarker")
+        if output.output_type == OutputType.RED_FLAG_NOTE:
+            note = data.get("note", "(no detail)")
+            if note not in self._dossier.flagged_red_flags:
+                self._dossier.flagged_red_flags.append(str(note))
+    def _track_dim(self, dim: str) -> None:
+        if dim not in self._evidence_dimensions_covered:
+            self._evidence_dimensions_covered.append(dim)
+__all__ = ["DrugTargetEnvironment", "MAX_STEPS"]

server/requirements.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ -r ../requirements.txt

server/rewards/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ from .reward import RewardBreakdown, RewardComputer
2	+
3	+ __all__ = ["RewardBreakdown", "RewardComputer"]

server/rewards/reward.py ADDED Viewed

	@@ -0,0 +1,265 @@

+"""Decomposable reward function for the drug-target-validation POMDP.
+Reward components
+─────────────────
+  evidence_coverage   — did the agent investigate the
+                        ``key_evidence_dimensions`` for the scenario?
+  decision_accuracy   — was the final go / no_go correct, weighted by
+                        the agent's stated confidence? (terminal only)
+  credit_efficiency   — did the agent avoid redundant or wasteful calls?
+  reasoning_coherence — did the action sequence respect light scientific
+                        prerequisites?
+  novelty             — bonus for opening a new evidence dimension.
+  penalty             — hard violations / credit-exhaustion / very-low
+                        confidence at submission.
+  shaping             — potential-based shaping over the coverage potential
+                        so the dense signal telescopes correctly.
+  terminal            — composite terminal reward.
+Step reward
+  R_t = evidence_novelty_bonus + reasoning_coherence_bonus
+        + credit_efficiency_penalty + rule_violation_penalty
+        + [φ(s_{t+1}) − φ(s_t)]
+Terminal reward
+  R_T = 0.4 * decision_accuracy
+        + 0.35 * evidence_coverage
+        + 0.15 * credit_efficiency
+        + 0.10 * reasoning_coherence
+"""
+from __future__ import annotations
+from dataclasses import dataclass, field
+from typing import Dict, List, Optional
+from models import (
+    ActionType,
+    DrugTargetAction,
+    IntermediateOutput,
+    META_ACTIONS,
+)
+from server.biology.target_index import (
+    score_decision_accuracy,
+    score_evidence_coverage,
+    score_reasoning_coherence,
+)
+from server.simulator.latent_state import FullLatentState
+from server.simulator.transition import TransitionEngine
+@dataclass
+class RewardBreakdown:
+    """Decomposed reward components recorded per step / per terminal."""
+    evidence_coverage: float = 0.0
+    decision_accuracy: float = 0.0
+    credit_efficiency: float = 0.0
+    reasoning_coherence: float = 0.0
+    novelty: float = 0.0
+    penalty: float = 0.0
+    shaping: float = 0.0
+    terminal: float = 0.0
+    components: Dict[str, float] = field(default_factory=dict)
+    @property
+    def total(self) -> float:
+        return (
+            self.evidence_coverage
+            + self.decision_accuracy
+            + self.credit_efficiency
+            + self.reasoning_coherence
+            + self.novelty
+            + self.penalty
+            + self.shaping
+            + self.terminal
+        )
+    def to_dict(self) -> Dict[str, float]:
+        d = {
+            "evidence_coverage": self.evidence_coverage,
+            "decision_accuracy": self.decision_accuracy,
+            "credit_efficiency": self.credit_efficiency,
+            "reasoning_coherence": self.reasoning_coherence,
+            "novelty": self.novelty,
+            "penalty": self.penalty,
+            "shaping": self.shaping,
+            "terminal": self.terminal,
+            "total": self.total,
+        }
+        d.update(self.components)
+        return d
+class RewardComputer:
+    """Computes step-wise and terminal rewards for the POMDP."""
+    def __init__(
+        self,
+        novelty_weight: float = 0.20,
+        coherence_weight: float = 0.10,
+        efficiency_weight: float = 0.10,
+    ):
+        self.w_novelty = novelty_weight
+        self.w_coh = coherence_weight
+        self.w_eff = efficiency_weight
+    # ── step reward ─────────────────────────────────────────────────────
+    def step_reward(
+        self,
+        action: DrugTargetAction,
+        prev_state: FullLatentState,
+        next_state: FullLatentState,
+        output: IntermediateOutput,
+        hard_violations: List[str],
+        soft_violations: List[str],
+        action_history: Optional[List[str]] = None,
+    ) -> RewardBreakdown:
+        rb = RewardBreakdown()
+        # Hard violations short-circuit the step.
+        if hard_violations:
+            rb.penalty = -0.5 * len(hard_violations)
+            rb.components["hard_violations"] = float(len(hard_violations))
+            return rb
+        # Novelty bonus: did this action open a new evidence dimension?
+        prev_dims = set(TransitionEngine.covered_evidence_dimensions(prev_state))
+        next_dims = set(TransitionEngine.covered_evidence_dimensions(next_state))
+        new_dims = next_dims - prev_dims
+        if new_dims:
+            rb.novelty = self.w_novelty * len(new_dims)
+            rb.components["new_evidence_dims"] = float(len(new_dims))
+        # Reasoning coherence: small bonus / penalty based on the running
+        # action history (including this step).
+        history = list(action_history or []) + [action.action_type.value]
+        coherence = score_reasoning_coherence(history)
+        rb.reasoning_coherence = self.w_coh * (coherence - 0.5)
+        rb.components["coherence_running"] = coherence
+        # Credit efficiency penalty: cost relative to total budget.
+        if next_state.credits.credits_total > 0:
+            cost = next_state.credits.credits_used - prev_state.credits.credits_used
+            spent_frac = cost / max(next_state.credits.credits_total, 1)
+            rb.credit_efficiency = -self.w_eff * spent_frac
+            rb.components["credit_spent_frac"] = spent_frac
+        # Soft-violation penalty (e.g. redundancy, wrong ordering).
+        if soft_violations:
+            rb.penalty -= 0.15 * len(soft_violations)
+            rb.components["soft_violations"] = float(len(soft_violations))
+        # Penalise meta-only churn before any evidence was collected.
+        if (
+            action.action_type in META_ACTIONS
+            and action.action_type != ActionType.SUBMIT_VALIDATION_REPORT
+            and not next_dims
+        ):
+            rb.penalty -= 0.20
+            rb.components["premature_meta_action_penalty"] = -0.20
+        # Potential-based shaping over evidence coverage.
+        phi_prev = self._potential(prev_state)
+        phi_next = self._potential(next_state)
+        rb.shaping = phi_next - phi_prev
+        return rb
+    # ── terminal reward ─────────────────────────────────────────────────
+    def terminal_reward(
+        self,
+        state: FullLatentState,
+        final_decision: Optional[str],
+        confidence: Optional[float],
+        action_history: Optional[List[str]] = None,
+    ) -> RewardBreakdown:
+        rb = RewardBreakdown()
+        target = state.target
+        discovered_dims = TransitionEngine.covered_evidence_dimensions(state)
+        coverage = score_evidence_coverage(
+            discovered_dims, target.key_evidence_dimensions
+        )
+        rb.evidence_coverage = coverage
+        rb.components["discovered_dims_count"] = float(len(discovered_dims))
+        rb.components["required_dims_count"] = float(
+            len(target.key_evidence_dimensions)
+        )
+        decision_signed = score_decision_accuracy(
+            final_decision, confidence, target.correct_decision,
+        )
+        # Map signed score to non-negative ``decision_accuracy`` and route
+        # the negative arm into ``penalty`` so the breakdown is readable.
+        rb.decision_accuracy = max(0.0, decision_signed)
+        if decision_signed < 0:
+            rb.penalty += decision_signed  # negative
+            rb.components["confident_wrong_answer_penalty"] = decision_signed
+        # Credit efficiency (how much budget remains).
+        credits_total = max(1, state.credits.credits_total)
+        credits_remaining_frac = state.credits.credits_remaining / credits_total
+        # Penalise running totally redundant calls (count > 2 of same type).
+        redundant_calls = sum(
+            max(0, count - 2) for count in state.action_call_counts.values()
+        )
+        total_calls = max(1, sum(state.action_call_counts.values()))
+        redundancy_frac = redundant_calls / total_calls
+        credit_efficiency = max(0.0, 1.0 - redundancy_frac)
+        rb.credit_efficiency = credit_efficiency
+        rb.components["credits_remaining_frac"] = credits_remaining_frac
+        rb.components["redundancy_frac"] = redundancy_frac
+        # Reasoning coherence on the full trajectory.
+        coherence = score_reasoning_coherence(action_history or [])
+        rb.reasoning_coherence = coherence
+        rb.components["final_reasoning_coherence"] = coherence
+        # Hard penalties.
+        if not state.progress.report_submitted:
+            rb.penalty -= 1.0
+            rb.components["no_report_submitted_penalty"] = -1.0
+        if (
+            state.progress.report_submitted
+            and (final_decision is None or confidence is None)
+        ):
+            rb.penalty -= 1.0
+            rb.components["malformed_report_penalty"] = -1.0
+        if (
+            state.progress.report_submitted
+            and confidence is not None
+            and confidence < 0.30
+        ):
+            rb.penalty -= 0.30
+            rb.components["low_confidence_submission_penalty"] = -0.30
+        rb.terminal = (
+            0.40 * rb.decision_accuracy
+            + 0.35 * coverage
+            + 0.15 * credit_efficiency
+            + 0.10 * coherence
+        )
+        return rb
+    # ── helpers ─────────────────────────────────────────────────────────
+    @staticmethod
+    def _potential(state: FullLatentState) -> float:
+        """Progress potential φ(s) — fraction of *target* evidence
+        dimensions covered. Returns 0.0 once a report has been submitted
+        so the shaping signal telescopes correctly.
+        """
+        if state.progress.report_submitted:
+            return 0.0
+        target = state.target
+        dims = TransitionEngine.covered_evidence_dimensions(state)
+        if not target.key_evidence_dimensions:
+            return min(1.0, len(dims) / 6.0)
+        hits = sum(
+            1 for d in target.key_evidence_dimensions if d in set(dims)
+        )
+        return hits / len(target.key_evidence_dimensions)

server/rules/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ from .engine import RuleEngine, RuleViolation
2	+
3	+ __all__ = ["RuleEngine", "RuleViolation"]

server/rules/engine.py ADDED Viewed

	@@ -0,0 +1,210 @@

+"""Pharma rule engine — hard and soft constraint checking.
+Hard violations block action execution entirely (the action still
+deducts no credits and the simulator returns a ``FailureReport``).
+Soft violations allow execution but degrade output quality and incur
+penalties.
+"""
+from __future__ import annotations
+from dataclasses import dataclass
+from enum import Enum
+from typing import Iterable, List, Optional
+from models import ActionType, DrugTargetAction
+from server.simulator.latent_state import FullLatentState
+class Severity(str, Enum):
+    HARD = "hard"
+    SOFT = "soft"
+@dataclass
+class RuleViolation:
+    rule_id: str
+    severity: Severity
+    message: str
+class RuleEngine:
+    """Evaluates drug-target-validation constraints against the current
+    latent state before each action is applied.
+    """
+    def check(
+        self,
+        action: DrugTargetAction,
+        state: FullLatentState,
+        *,
+        evidence_dimensions_covered: Optional[Iterable[str]] = None,
+    ) -> List[RuleViolation]:
+        violations: List[RuleViolation] = []
+        violations.extend(self._check_resource_constraints(action, state))
+        violations.extend(self._check_submission(
+            action, state, evidence_dimensions_covered or [],
+        ))
+        violations.extend(self._check_redundancy(action, state))
+        violations.extend(self._check_ordering(action, state))
+        return violations
+    @staticmethod
+    def hard_violations(violations: List[RuleViolation]) -> List[str]:
+        return [v.message for v in violations if v.severity == Severity.HARD]
+    @staticmethod
+    def soft_violations(violations: List[RuleViolation]) -> List[str]:
+        return [v.message for v in violations if v.severity == Severity.SOFT]
+    # ── resource / credit constraints ───────────────────────────────────
+    def _check_resource_constraints(
+        self, action: DrugTargetAction, s: FullLatentState
+    ) -> List[RuleViolation]:
+        vs: List[RuleViolation] = []
+        from server.simulator.transition import compute_action_cost
+        cost = compute_action_cost(action)
+        if s.credits.exhausted and action.action_type != ActionType.SUBMIT_VALIDATION_REPORT:
+            vs.append(RuleViolation(
+                rule_id="credits_exhausted",
+                severity=Severity.HARD,
+                message="Credits exhausted - submit validation report or end episode",
+            ))
+        elif cost > s.credits.credits_remaining and cost > 0:
+            vs.append(RuleViolation(
+                rule_id="credits_insufficient",
+                severity=Severity.HARD,
+                message=(
+                    f"Action costs {cost} credits but only "
+                    f"{s.credits.credits_remaining} remain"
+                ),
+            ))
+        return vs
+    # ── submission validation ───────────────────────────────────────────
+    def _check_submission(
+        self,
+        action: DrugTargetAction,
+        s: FullLatentState,
+        evidence_dimensions_covered: Iterable[str],
+    ) -> List[RuleViolation]:
+        vs: List[RuleViolation] = []
+        if action.action_type != ActionType.SUBMIT_VALIDATION_REPORT:
+            return vs
+        # Hard: report with no evidence at all.
+        if not list(evidence_dimensions_covered):
+            vs.append(RuleViolation(
+                rule_id="report_without_evidence",
+                severity=Severity.HARD,
+                message=(
+                    "Cannot submit validation report without gathering "
+                    "any evidence"
+                ),
+            ))
+        # Hard: report missing decision or confidence.
+        if action.final_decision is None:
+            vs.append(RuleViolation(
+                rule_id="report_missing_decision",
+                severity=Severity.HARD,
+                message=(
+                    "Submitting validation report without a final_decision "
+                    "is not allowed"
+                ),
+            ))
+        elif action.final_decision.lower() not in {"go", "no_go"}:
+            vs.append(RuleViolation(
+                rule_id="report_invalid_decision",
+                severity=Severity.HARD,
+                message=(
+                    f"final_decision must be 'go' or 'no_go', got "
+                    f"{action.final_decision!r}"
+                ),
+            ))
+        if action.confidence is None:
+            vs.append(RuleViolation(
+                rule_id="report_missing_confidence",
+                severity=Severity.HARD,
+                message=(
+                    "Submitting validation report without a confidence "
+                    "score is not allowed"
+                ),
+            ))
+        elif action.confidence < 0.30:
+            vs.append(RuleViolation(
+                rule_id="report_low_confidence",
+                severity=Severity.SOFT,
+                message=(
+                    f"Submitting with very low confidence "
+                    f"({action.confidence:.2f}) — the agent appears "
+                    f"poorly calibrated"
+                ),
+            ))
+        return vs
+    # ── redundancy checks ───────────────────────────────────────────────
+    def _check_redundancy(
+        self, action: DrugTargetAction, s: FullLatentState
+    ) -> List[RuleViolation]:
+        vs: List[RuleViolation] = []
+        if action.action_type == ActionType.FLAG_RED_FLAG:
+            return vs
+        if action.action_type == ActionType.SUBMIT_VALIDATION_REPORT:
+            if s.progress.report_submitted:
+                vs.append(RuleViolation(
+                    rule_id="duplicate_report",
+                    severity=Severity.HARD,
+                    message="Validation report has already been submitted",
+                ))
+            return vs
+        count = s.action_call_counts.get(action.action_type.value, 0)
+        if count >= 2:
+            vs.append(RuleViolation(
+                rule_id=f"redundant_{action.action_type.value}",
+                severity=Severity.SOFT,
+                message=(
+                    f"Action '{action.action_type.value}' has already been "
+                    f"executed {count} time(s); further repeats are "
+                    f"redundant"
+                ),
+            ))
+        return vs
+    # ── ordering checks ─────────────────────────────────────────────────
+    def _check_ordering(
+        self, action: DrugTargetAction, s: FullLatentState
+    ) -> List[RuleViolation]:
+        vs: List[RuleViolation] = []
+        p = s.progress
+        if action.action_type == ActionType.IN_VIVO_MODEL and not p.in_vitro_done:
+            vs.append(RuleViolation(
+                rule_id="in_vivo_before_in_vitro",
+                severity=Severity.SOFT,
+                message=(
+                    "Running in_vivo_model before in_vitro_assay is "
+                    "scientifically backwards"
+                ),
+            ))
+        if (
+            action.action_type == ActionType.TOXICITY_PANEL
+            and not p.expression_queried
+        ):
+            vs.append(RuleViolation(
+                rule_id="toxicity_before_expression",
+                severity=Severity.SOFT,
+                message=(
+                    "Toxicity panel before any expression query — "
+                    "tissue-specific toxicity will be hard to interpret"
+                ),
+            ))
+        return vs

server/simulator/__init__.py ADDED Viewed

	@@ -0,0 +1,21 @@

+from .latent_state import (
+    CreditState,
+    DataQualityState,
+    FullLatentState,
+    TargetProfile,
+    ValidationProgress,
+)
+from .noise import NoiseModel
+from .output_generator import OutputGenerator
+from .transition import TransitionEngine
+__all__ = [
+    "CreditState",
+    "DataQualityState",
+    "FullLatentState",
+    "NoiseModel",
+    "OutputGenerator",
+    "TargetProfile",
+    "TransitionEngine",
+    "ValidationProgress",
+]

server/simulator/latent_state.py ADDED Viewed

	@@ -0,0 +1,175 @@

+"""Hidden ground-truth target state for the drug-target-validation POMDP.
+The agent never directly observes any of these models; it must infer them
+through investigation. The simulator uses ``FullLatentState`` to generate
+all simulated outputs and to compute terminal rewards.
+"""
+from __future__ import annotations
+from typing import List, Optional
+from pydantic import BaseModel, Field
+class TargetProfile(BaseModel):
+    """Hidden ground-truth drug target properties."""
+    # Expression
+    expression_level: str = Field(
+        "moderate",
+        description=(
+            "One of 'high_specific', 'high_nonspecific', 'moderate', 'low'."
+        ),
+    )
+    tissue_specificity: float = Field(0.5, ge=0.0, le=1.0)
+    disease_overexpression: float = Field(
+        1.0, description="Fold change vs. matched normal tissue."
+    )
+    # Druggability
+    druggability_score: float = Field(0.5, ge=0.0, le=1.0)
+    binding_pocket_quality: str = Field(
+        "good",
+        description=(
+            "One of 'excellent', 'good', 'poor', 'undruggable'."
+        ),
+    )
+    has_known_ligands: bool = False
+    allosteric_site_available: bool = Field(
+        False,
+        description=(
+            "Whether a non-classical (allosteric) druggable site exists. "
+            "Only revealed by binding-site analyses with the appropriate "
+            "parameters."
+        ),
+    )
+    # Selectivity
+    selectivity_ratio: float = Field(
+        5.0,
+        description="On-target vs off-target activity ratio.",
+    )
+    off_target_count: int = 0
+    off_target_genes: List[str] = Field(default_factory=list)
+    # Safety
+    toxicity_profile: str = Field(
+        "mild",
+        description="One of 'clean', 'mild', 'moderate', 'severe'.",
+    )
+    toxicity_tissues: List[str] = Field(default_factory=list)
+    # Clinical
+    clinical_precedent: str = Field(
+        "none",
+        description=(
+            "One of 'positive', 'mixed', 'negative', 'none'."
+        ),
+    )
+    clinical_stage_reached: Optional[str] = Field(
+        None,
+        description=(
+            "Highest clinical stage previously reached: 'phase1' / 'phase2' "
+            "/ 'phase3' / None."
+        ),
+    )
+    competitor_programs: List[str] = Field(default_factory=list)
+    # Patient stratification / biomarker context
+    requires_patient_stratification: bool = False
+    responder_biomarker: Optional[str] = None
+    # In-vitro / in-vivo expectations
+    in_vitro_ic50_nM: float = Field(
+        100.0, description="Expected on-target IC50 (nM)."
+    )
+    in_vivo_efficacy: str = Field(
+        "moderate",
+        description=(
+            "Expected pharmacological efficacy in disease-relevant models: "
+            "'strong', 'moderate', 'weak', 'none'."
+        ),
+    )
+    crispr_essentiality: float = Field(
+        -0.3,
+        description=(
+            "DepMap-style essentiality score (more negative = more "
+            "essential)."
+        ),
+    )
+    # Hidden truth used for terminal reward computation
+    true_viability_score: float = Field(0.5, ge=0.0, le=1.0)
+    correct_decision: str = Field(
+        "no_go", description="Either 'go' or 'no_go'."
+    )
+    misleading_signals: List[str] = Field(default_factory=list)
+    key_evidence_dimensions: List[str] = Field(
+        default_factory=list,
+        description=(
+            "Evidence categories the agent must touch to score well, e.g. "
+            "'expression', 'druggability', 'off_target', 'toxicity', "
+            "'clinical', 'literature', 'in_vitro', 'in_vivo', "
+            "'patient_stratification'."
+        ),
+    )
+class DataQualityState(BaseModel):
+    """Technical noise parameters for simulated experimental outputs."""
+    noise_level: float = Field(0.1, ge=0.0, le=1.0)
+    false_positive_rate: float = Field(0.05, ge=0.0, le=1.0)
+    false_negative_rate: float = Field(0.05, ge=0.0, le=1.0)
+    database_coverage: float = Field(0.85, ge=0.0, le=1.0)
+class CreditState(BaseModel):
+    """Tracks the single unified experimental-credit budget."""
+    credits_total: int = 50
+    credits_used: int = 0
+    @property
+    def credits_remaining(self) -> int:
+        return max(0, self.credits_total - self.credits_used)
+    @property
+    def exhausted(self) -> bool:
+        return self.credits_used >= self.credits_total
+class ValidationProgress(BaseModel):
+    """Flags tracking which evidence dimensions have been investigated."""
+    expression_queried: bool = False
+    druggability_assessed: bool = False
+    selectivity_checked: bool = False
+    toxicity_assessed: bool = False
+    clinical_checked: bool = False
+    literature_reviewed: bool = False
+    in_vitro_done: bool = False
+    in_vivo_done: bool = False
+    patient_stratification_done: bool = False
+    pathway_analysed: bool = False
+    structure_resolved: bool = False
+    interactions_mapped: bool = False
+    crispr_done: bool = False
+    biomarker_correlated: bool = False
+    evidence_synthesised: bool = False
+    expert_reviewed: bool = False
+    report_submitted: bool = False
+class FullLatentState(BaseModel):
+    """Complete hidden state of the simulated drug-target world."""
+    target: TargetProfile = Field(default_factory=TargetProfile)
+    data_quality: DataQualityState = Field(default_factory=DataQualityState)
+    credits: CreditState = Field(default_factory=CreditState)
+    progress: ValidationProgress = Field(default_factory=ValidationProgress)
+    # Tracking which action types have been executed (used by rules / rewards)
+    action_call_counts: dict = Field(default_factory=dict)
+    rng_seed: int = 0

server/simulator/noise.py ADDED Viewed

	@@ -0,0 +1,128 @@

+"""Stochastic noise models for the biological simulator."""
+from __future__ import annotations
+from typing import Dict, List, Tuple
+import numpy as np
+class NoiseModel:
+    """Generates calibrated noise for simulated experimental outputs.
+    All randomness is funnelled through a single ``numpy.Generator``
+    so that episodes are reproducible given the same seed.
+    """
+    def __init__(self, seed: int = 42):
+        self.rng = np.random.default_rng(seed)
+    def reseed(self, seed: int) -> None:
+        self.rng = np.random.default_rng(seed)
+    # ── expression-level noise ──────────────────────────────────────────
+    def add_expression_noise(
+        self,
+        true_values: Dict[str, float],
+        noise_level: float,
+        dropout_rate: float,
+    ) -> Dict[str, float]:
+        noisy: Dict[str, float] = {}
+        for gene, value in true_values.items():
+            # Dropout probability is inversely proportional to expression
+            # magnitude: lowly expressed genes drop out much more readily,
+            # matching the zero-inflation pattern in real scRNA-seq data.
+            p_drop = dropout_rate / (1.0 + abs(value))
+            if self.rng.random() < p_drop:
+                noisy[gene] = 0.0
+            else:
+                sigma = noise_level * abs(value) + 0.1
+                noisy[gene] = float(value + self.rng.normal(0, sigma))
+        return noisy
+    # ── effect-size sampling ────────────────────────────────────────────
+    def sample_effect_sizes(
+        self,
+        true_effects: Dict[str, float],
+        sample_size: int,
+        noise_level: float,
+    ) -> Dict[str, float]:
+        se = noise_level / max(np.sqrt(max(sample_size, 1)), 1e-6)
+        return {
+            gene: float(effect + self.rng.normal(0, se))
+            for gene, effect in true_effects.items()
+        }
+    def sample_p_values(
+        self,
+        true_effects: Dict[str, float],
+        sample_size: int,
+        noise_level: float,
+    ) -> Dict[str, float]:
+        """Simulate approximate p-values from z-statistics."""
+        from scipy import stats  # type: ignore[import-untyped]
+        p_values: Dict[str, float] = {}
+        se = noise_level / max(np.sqrt(max(sample_size, 1)), 1e-6)
+        for gene, effect in true_effects.items():
+            z = abs(effect) / max(se, 1e-8)
+            p_values[gene] = float(2 * stats.norm.sf(z))
+        return p_values
+    # ── false discovery helpers ─────────────────────────────────────────
+    def generate_false_positives(
+        self, n_background_genes: int, fdr: float
+    ) -> List[str]:
+        n_fp = int(self.rng.binomial(n_background_genes, fdr))
+        return [f"FP_GENE_{i}" for i in range(n_fp)]
+    def generate_false_negatives(
+        self, true_genes: List[str], fnr: float
+    ) -> List[str]:
+        """Return the subset of *true_genes* that are missed."""
+        return [g for g in true_genes if self.rng.random() < fnr]
+    # ── quality helpers ─────────────────────────────────────────────────
+    def quality_degradation(
+        self, base_quality: float, factors: List[float]
+    ) -> float:
+        q = base_quality
+        for f in factors:
+            q *= f
+        return float(np.clip(q + self.rng.normal(0, 0.02), 0.0, 1.0))
+    def sample_qc_metric(
+        self, mean: float, std: float, clip_lo: float = 0.0, clip_hi: float = 1.0
+    ) -> float:
+        return float(np.clip(self.rng.normal(mean, std), clip_lo, clip_hi))
+    def sample_count(self, lam: float) -> int:
+        return int(self.rng.poisson(max(lam, 0)))
+    def coin_flip(self, p: float) -> bool:
+        return bool(self.rng.random() < p)
+    def sample_cluster_count(
+        self, n_true_populations: int, quality: float
+    ) -> int:
+        """Over- or under-clustering depending on preprocessing quality."""
+        delta = self.rng.integers(-2, 3)
+        noise_clusters = max(0, int(round((1.0 - quality) * 3)))
+        return max(1, n_true_populations + delta + noise_clusters)
+    def shuffle_ranking(
+        self, items: List[str], noise_level: float
+    ) -> List[str]:
+        """Permute a ranking with Gaussian noise on ordinals."""
+        n = len(items)
+        if n == 0:
+            return []
+        scores = np.arange(n, dtype=float) + self.rng.normal(
+            0, noise_level * n, size=n
+        )
+        order = np.argsort(scores)
+        return [items[int(i)] for i in order]

server/simulator/output_generator.py ADDED Viewed

	@@ -0,0 +1,695 @@

+"""Generate simulated drug-target-validation outputs from latent state."""
+from __future__ import annotations
+from typing import Any, Dict, List
+from models import (
+    ActionType,
+    DrugTargetAction,
+    IntermediateOutput,
+    OutputType,
+)
+from .latent_state import FullLatentState, TargetProfile
+from .noise import NoiseModel
+# Pool of plausible adverse-event tissues used to inject realistic
+# false-positive toxicity hits.
+_NOISE_TISSUES: List[str] = [
+    "liver", "kidney", "GI", "skin", "cardiac", "CNS", "lung",
+]
+class OutputGenerator:
+    """Creates structured ``IntermediateOutput`` objects from the hidden
+    ``TargetProfile`` plus a stochastic noise model.
+    Every action has a dedicated handler that:
+      - reads relevant fields from the ``TargetProfile``
+      - applies ``DataQualityState``-driven noise (false positive / false
+        negative / database coverage)
+      - returns a typed ``IntermediateOutput`` whose ``data`` dict is the
+        evidence the agent reasons over.
+    """
+    def __init__(self, noise: NoiseModel):
+        self.noise = noise
+    def generate(
+        self,
+        action: DrugTargetAction,
+        state: FullLatentState,
+        step_index: int,
+    ) -> IntermediateOutput:
+        handler = _HANDLERS.get(action.action_type, self._default)
+        out = handler(self, action, state, step_index)
+        # Database coverage globally reduces quality_score for under-curated
+        # targets.
+        coverage = state.data_quality.database_coverage
+        if coverage < 1.0:
+            out.quality_score = float(
+                max(0.0, out.quality_score * (0.5 + 0.5 * coverage))
+            )
+        return out
+    # ── Expression & omics ──────────────────────────────────────────────
+    def _query_expression(
+        self, action: DrugTargetAction, s: FullLatentState, idx: int
+    ) -> IntermediateOutput:
+        t = s.target
+        flipped = self.noise.coin_flip(s.data_quality.false_positive_rate)
+        observed_specificity = float(
+            max(0.0, min(1.0, t.tissue_specificity
+                          + self.noise.rng.normal(0, s.data_quality.noise_level)))
+        )
+        observed_overexpr = float(
+            max(0.1, t.disease_overexpression
+                + self.noise.rng.normal(0, 0.4 * s.data_quality.noise_level))
+        )
+        specificity_concern = (t.expression_level == "high_nonspecific")
+        # Soft summary that *can* mislead when expression is high but
+        # non-specific.
+        if t.expression_level in {"high_specific", "high_nonspecific"}:
+            summary = (
+                f"{action.parameters.get('database', 'GTEx')}: "
+                f"{t.expression_level} expression "
+                f"({observed_overexpr:.2f}× over normal)"
+            )
+        else:
+            summary = (
+                f"{action.parameters.get('database', 'GTEx')}: "
+                f"{t.expression_level} expression"
+            )
+        return IntermediateOutput(
+            output_type=OutputType.EXPRESSION_RESULT,
+            step_index=idx,
+            quality_score=0.85 if not flipped else 0.55,
+            summary=summary,
+            data={
+                "expression_level": t.expression_level,
+                "tissue_specificity": round(observed_specificity, 3),
+                "disease_overexpression": round(observed_overexpr, 2),
+                "specificity_concern": specificity_concern,
+                "database": action.parameters.get("database", "GTEx"),
+            },
+            uncertainty=0.10 + 0.5 * s.data_quality.noise_level,
+            artifacts_available=["expression_table"],
+        )
+    def _differential_expression(
+        self, action: DrugTargetAction, s: FullLatentState, idx: int
+    ) -> IntermediateOutput:
+        t = s.target
+        log2fc = float(self.noise.rng.normal(
+            0.0 if t.disease_overexpression < 1.0
+            else max(0.5, 1.5 * (t.disease_overexpression - 1.0)),
+            0.4 + s.data_quality.noise_level,
+        ))
+        n_de_genes = self.noise.sample_count(40 + int(20 * t.disease_overexpression))
+        return IntermediateOutput(
+            output_type=OutputType.DE_RESULT,
+            step_index=idx,
+            quality_score=0.80,
+            summary=(
+                f"DE in {action.parameters.get('cohort', 'TCGA')}: "
+                f"{t.target if hasattr(t, 'target') else ''} log2FC≈{log2fc:.2f}, "
+                f"{n_de_genes} co-regulated genes"
+            ),
+            data={
+                "target_log2fc": round(log2fc, 3),
+                "n_de_genes": n_de_genes,
+                "cohort": action.parameters.get("cohort", "TCGA"),
+            },
+            uncertainty=0.15 + s.data_quality.noise_level,
+            artifacts_available=["de_table"],
+        )
+    def _pathway_enrichment(
+        self, action: DrugTargetAction, s: FullLatentState, idx: int
+    ) -> IntermediateOutput:
+        # Pathway calls are largely driven by indication-level priors.
+        pathways = [
+            {"pathway": "MAPK_signalling", "score": round(0.6 + self.noise.rng.normal(0, 0.1), 3)},
+            {"pathway": "Cell_cycle", "score": round(0.55 + self.noise.rng.normal(0, 0.1), 3)},
+            {"pathway": "Apoptosis", "score": round(0.45 + self.noise.rng.normal(0, 0.1), 3)},
+            {"pathway": "DNA_damage_response", "score": round(0.40 + self.noise.rng.normal(0, 0.1), 3)},
+        ]
+        return IntermediateOutput(
+            output_type=OutputType.PATHWAY_RESULT,
+            step_index=idx,
+            quality_score=0.70,
+            summary=f"Pathway enrichment: {len(pathways)} top pathways",
+            data={"top_pathways": pathways},
+            uncertainty=0.20,
+            artifacts_available=["enrichment_table"],
+        )
+    def _coexpression_network(
+        self, action: DrugTargetAction, s: FullLatentState, idx: int
+    ) -> IntermediateOutput:
+        partners = list(s.target.off_target_genes[:5]) + [
+            f"PARTNER_{i}" for i in range(2)
+        ]
+        return IntermediateOutput(
+            output_type=OutputType.COEXPRESSION_RESULT,
+            step_index=idx,
+            quality_score=0.65,
+            summary=f"{len(partners)} top coexpression partners identified",
+            data={"partners": partners},
+            uncertainty=0.25,
+            artifacts_available=["coexpression_table"],
+        )
+    # ── Protein & structure ─────────────────────────────────────────────
+    def _protein_structure_lookup(
+        self, action: DrugTargetAction, s: FullLatentState, idx: int
+    ) -> IntermediateOutput:
+        method = action.parameters.get("method", "AlphaFold")
+        plddt = float(self.noise.sample_qc_metric(0.78, 0.08, 0.30, 1.0))
+        return IntermediateOutput(
+            output_type=OutputType.STRUCTURE_RESULT,
+            step_index=idx,
+            quality_score=plddt,
+            summary=f"{method} structure resolved (pLDDT={plddt:.2f})",
+            data={
+                "method": method,
+                "pLDDT": round(plddt, 3),
+                "n_residues": int(self.noise.sample_count(420)),
+            },
+            uncertainty=1.0 - plddt,
+            artifacts_available=["pdb_structure"],
+        )
+    def _binding_site_analysis(
+        self, action: DrugTargetAction, s: FullLatentState, idx: int
+    ) -> IntermediateOutput:
+        t = s.target
+        include_allosteric = bool(action.parameters.get("include_allosteric", False))
+        classic_score = {
+            "excellent": 0.92,
+            "good": 0.70,
+            "poor": 0.32,
+            "undruggable": 0.10,
+        }[t.binding_pocket_quality]
+        classic_score = float(self.noise.sample_qc_metric(
+            classic_score, 0.05, 0.0, 1.0
+        ))
+        allo_detected = bool(include_allosteric and t.allosteric_site_available)
+        allo_score = (
+            float(self.noise.sample_qc_metric(0.65, 0.08, 0.0, 1.0))
+            if allo_detected else 0.0
+        )
+        return IntermediateOutput(
+            output_type=OutputType.BINDING_SITE_RESULT,
+            step_index=idx,
+            quality_score=max(classic_score, allo_score),
+            summary=(
+                f"Binding-site analysis: classic_score={classic_score:.2f}"
+                + (f", allosteric_site_score={allo_score:.2f}" if allo_detected else "")
+            ),
+            data={
+                "binding_pocket_quality": t.binding_pocket_quality,
+                "classic_score": round(classic_score, 3),
+                "allosteric_site_detected": allo_detected,
+                "allosteric_site_score": round(allo_score, 3) if allo_detected else None,
+                "include_allosteric": include_allosteric,
+            },
+            uncertainty=0.12,
+            artifacts_available=["pocket_table"],
+        )
+    def _protein_interaction_network(
+        self, action: DrugTargetAction, s: FullLatentState, idx: int
+    ) -> IntermediateOutput:
+        partners = list(s.target.off_target_genes[:6])
+        return IntermediateOutput(
+            output_type=OutputType.INTERACTION_RESULT,
+            step_index=idx,
+            quality_score=0.70,
+            summary=f"{len(partners)} high-confidence interactors",
+            data={
+                "partners": partners,
+                "source": action.parameters.get("source", "STRING"),
+            },
+            uncertainty=0.20,
+            artifacts_available=["ppi_network"],
+        )
+    def _druggability_screen(
+        self, action: DrugTargetAction, s: FullLatentState, idx: int
+    ) -> IntermediateOutput:
+        t = s.target
+        observed_score = float(self.noise.sample_qc_metric(
+            t.druggability_score, 0.06, 0.0, 1.0
+        ))
+        return IntermediateOutput(
+            output_type=OutputType.DRUGGABILITY_RESULT,
+            step_index=idx,
+            quality_score=0.85,
+            summary=(
+                f"Druggability score={observed_score:.2f}, "
+                f"pocket={t.binding_pocket_quality}, "
+                f"known_ligands={t.has_known_ligands}"
+            ),
+            data={
+                "druggability_score": round(observed_score, 3),
+                "binding_pocket_quality": t.binding_pocket_quality,
+                "has_known_ligands": t.has_known_ligands,
+                "n_known_ligands": int(self.noise.sample_count(
+                    20 if t.has_known_ligands else 1
+                )),
+            },
+            uncertainty=0.15,
+            artifacts_available=["druggability_report"],
+        )
+    # ── Clinical & safety ───────────────────────────────────────────────
+    def _clinical_trial_lookup(
+        self, action: DrugTargetAction, s: FullLatentState, idx: int
+    ) -> IntermediateOutput:
+        t = s.target
+        positive_signals: List[str] = []
+        negative_signals: List[str] = []
+        if t.clinical_precedent in {"positive", "mixed"}:
+            positive_signals.append(
+                f"Reached {t.clinical_stage_reached or 'preclinical'} with at "
+                f"least one program"
+            )
+        if t.clinical_precedent in {"mixed", "negative"}:
+            negative_signals.append("Prior failures or withdrawals on record")
+        if t.clinical_precedent == "negative":
+            negative_signals.append("No active programs progressing")
+        return IntermediateOutput(
+            output_type=OutputType.CLINICAL_RESULT,
+            step_index=idx,
+            quality_score=0.85,
+            summary=(
+                f"Clinical precedent: {t.clinical_precedent} "
+                f"(stage={t.clinical_stage_reached})"
+            ),
+            data={
+                "clinical_precedent": t.clinical_precedent,
+                "clinical_stage_reached": t.clinical_stage_reached,
+                "positive_signals": positive_signals,
+                "negative_signals": negative_signals,
+                "competitor_programs": list(t.competitor_programs),
+            },
+            uncertainty=0.10,
+            artifacts_available=["trial_table"],
+        )
+    def _toxicity_panel(
+        self, action: DrugTargetAction, s: FullLatentState, idx: int
+    ) -> IntermediateOutput:
+        t = s.target
+        # Higher uncertainty if the agent jumps to toxicity before expression
+        prereq_met = s.progress.expression_queried
+        unc = 0.15 if prereq_met else 0.45
+        toxicity_tissues = list(t.toxicity_tissues)
+        # False-positive tissue noise
+        if self.noise.coin_flip(s.data_quality.false_positive_rate):
+            toxicity_tissues = list(toxicity_tissues) + [
+                str(self.noise.rng.choice(_NOISE_TISSUES))
+            ]
+        return IntermediateOutput(
+            output_type=OutputType.TOXICITY_RESULT,
+            step_index=idx,
+            quality_score=0.80 if prereq_met else 0.55,
+            summary=(
+                f"Toxicity profile: {t.toxicity_profile}, "
+                f"flagged tissues: {toxicity_tissues}"
+            ),
+            data={
+                "toxicity_profile": t.toxicity_profile,
+                "toxicity_tissues": toxicity_tissues,
+                "prerequisite_expression_done": prereq_met,
+            },
+            uncertainty=unc,
+            warnings=[] if prereq_met else [
+                "Toxicity called without prior expression context — "
+                "interpret with caution"
+            ],
+            artifacts_available=["toxicity_panel_report"],
+        )
+    def _off_target_screen(
+        self, action: DrugTargetAction, s: FullLatentState, idx: int
+    ) -> IntermediateOutput:
+        t = s.target
+        observed_count = max(0, int(self.noise.sample_count(t.off_target_count or 1)))
+        observed_genes = list(t.off_target_genes[:max(1, observed_count)])
+        observed_ratio = float(self.noise.sample_qc_metric(
+            t.selectivity_ratio, 0.5, 0.0, 100.0
+        ))
+        return IntermediateOutput(
+            output_type=OutputType.OFF_TARGET_RESULT,
+            step_index=idx,
+            quality_score=0.80,
+            summary=(
+                f"Off-target screen: selectivity ratio={observed_ratio:.2f}, "
+                f"{len(observed_genes)} hits"
+            ),
+            data={
+                "selectivity_ratio": round(observed_ratio, 3),
+                "off_target_count": observed_count,
+                "off_target_genes": observed_genes,
+            },
+            uncertainty=0.15,
+            artifacts_available=["off_target_table"],
+        )
+    def _patient_stratification(
+        self, action: DrugTargetAction, s: FullLatentState, idx: int
+    ) -> IntermediateOutput:
+        t = s.target
+        return IntermediateOutput(
+            output_type=OutputType.PATIENT_STRATIFICATION_RESULT,
+            step_index=idx,
+            quality_score=0.78,
+            summary=(
+                f"Patient stratification: required={t.requires_patient_stratification}, "
+                f"biomarker={t.responder_biomarker}"
+            ),
+            data={
+                "requires_stratification": t.requires_patient_stratification,
+                "responder_biomarker": t.responder_biomarker,
+                "estimated_responder_fraction": round(float(
+                    self.noise.sample_qc_metric(
+                        0.30 if t.requires_patient_stratification else 0.65,
+                        0.10, 0.0, 1.0,
+                    )
+                ), 3),
+            },
+            uncertainty=0.20,
+            artifacts_available=["stratification_report"],
+        )
+    # ── Literature & evidence ───────────────────────────────────────────
+    def _literature_search(
+        self, action: DrugTargetAction, s: FullLatentState, idx: int
+    ) -> IntermediateOutput:
+        t = s.target
+        n_abstracts = int(self.noise.sample_count(4)) + 3
+        abstracts: List[Dict[str, Any]] = []
+        for i in range(min(5, n_abstracts)):
+            abstracts.append({
+                "title": (
+                    f"Recent perspective on {action.parameters.get('query', 'target')} "
+                    f"({2020 + i % 6})"
+                ),
+                "snippet": "...findings consistent with a viable program...",
+            })
+        # Scenario-specific recent precedent: surface a precedent-changing
+        # abstract when the current target has positive recent clinical
+        # precedent reached at least phase 2.
+        if (
+            t.clinical_precedent in {"positive", "mixed"}
+            and t.clinical_stage_reached in {"phase2", "phase3"}
+        ):
+            abstracts.insert(0, {
+                "title": (
+                    "Clinical activity of recent inhibitors against this "
+                    "target supports renewed interest"
+                ),
+                "snippet": (
+                    "...recent programs have demonstrated clinical activity, "
+                    "overturning prior assumptions of undruggability..."
+                ),
+            })
+        return IntermediateOutput(
+            output_type=OutputType.LITERATURE_RESULT,
+            step_index=idx,
+            quality_score=0.70,
+            summary=f"{len(abstracts)} relevant abstracts retrieved",
+            data={
+                "abstracts": abstracts,
+                "query": action.parameters.get("query", ""),
+            },
+            uncertainty=0.18,
+            artifacts_available=["abstract_list"],
+        )
+    def _evidence_synthesis(
+        self, action: DrugTargetAction, s: FullLatentState, idx: int
+    ) -> IntermediateOutput:
+        # Quality grows with the number of evidence dimensions already covered.
+        flags = s.progress.model_dump()
+        covered = sum(1 for k, v in flags.items() if isinstance(v, bool) and v)
+        quality = float(min(0.85, 0.20 + 0.06 * covered))
+        return IntermediateOutput(
+            output_type=OutputType.EVIDENCE_SYNTHESIS_RESULT,
+            step_index=idx,
+            quality_score=quality,
+            summary=f"Evidence synthesis (coverage signal={covered})",
+            data={
+                "evidence_signal_count": covered,
+                "notes": (
+                    "Synthesis is more reliable once multiple evidence "
+                    "dimensions have been investigated."
+                ),
+            },
+            uncertainty=max(0.20, 0.80 - 0.06 * covered),
+            artifacts_available=["synthesis_report"],
+        )
+    def _competitor_landscape(
+        self, action: DrugTargetAction, s: FullLatentState, idx: int
+    ) -> IntermediateOutput:
+        t = s.target
+        return IntermediateOutput(
+            output_type=OutputType.COMPETITOR_LANDSCAPE_RESULT,
+            step_index=idx,
+            quality_score=0.75,
+            summary=f"{len(t.competitor_programs)} competitor programs identified",
+            data={
+                "competitor_programs": list(t.competitor_programs),
+                "clinical_precedent": t.clinical_precedent,
+            },
+            uncertainty=0.15,
+            artifacts_available=["competitor_report"],
+        )
+    # ── Experimental ───────────────────────────────────────────────────
+    def _in_vitro_assay(
+        self, action: DrugTargetAction, s: FullLatentState, idx: int
+    ) -> IntermediateOutput:
+        t = s.target
+        ic50 = float(self.noise.sample_qc_metric(
+            t.in_vitro_ic50_nM, 0.2 * t.in_vitro_ic50_nM, 0.5, 100_000.0
+        ))
+        sel_window = float(self.noise.sample_qc_metric(
+            t.selectivity_ratio, 0.4, 0.0, 100.0
+        ))
+        viability_drop = float(self.noise.sample_qc_metric(
+            0.5 if t.in_vivo_efficacy in {"strong", "moderate"} else 0.2,
+            0.1, 0.0, 1.0,
+        ))
+        return IntermediateOutput(
+            output_type=OutputType.IN_VITRO_RESULT,
+            step_index=idx,
+            quality_score=0.85,
+            summary=(
+                f"In-vitro: IC50={ic50:.1f} nM, selectivity_window={sel_window:.2f}, "
+                f"viability_drop={viability_drop:.2f}"
+            ),
+            data={
+                "IC50_nM": round(ic50, 2),
+                "selectivity_window": round(sel_window, 3),
+                "viability_drop": round(viability_drop, 3),
+            },
+            uncertainty=0.18,
+            artifacts_available=["in_vitro_report"],
+        )
+    def _in_vivo_model(
+        self, action: DrugTargetAction, s: FullLatentState, idx: int
+    ) -> IntermediateOutput:
+        t = s.target
+        efficacy_score = {
+            "strong": 0.85, "moderate": 0.55, "weak": 0.25, "none": 0.05,
+        }.get(t.in_vivo_efficacy, 0.5)
+        efficacy = float(self.noise.sample_qc_metric(efficacy_score, 0.08, 0.0, 1.0))
+        tolerability = float(self.noise.sample_qc_metric(
+            {"clean": 0.9, "mild": 0.75, "moderate": 0.5, "severe": 0.25}
+            .get(t.toxicity_profile, 0.6),
+            0.08, 0.0, 1.0,
+        ))
+        return IntermediateOutput(
+            output_type=OutputType.IN_VIVO_RESULT,
+            step_index=idx,
+            quality_score=0.85,
+            summary=(
+                f"In-vivo: efficacy={efficacy:.2f}, tolerability={tolerability:.2f}"
+            ),
+            data={
+                "efficacy_endpoint": round(efficacy, 3),
+                "tolerability": round(tolerability, 3),
+                "PK_PD_summary": {
+                    "halflife_hours": round(float(
+                        self.noise.sample_qc_metric(8.0, 2.0, 0.5, 48.0)
+                    ), 2),
+                    "Cmax_nM": round(float(
+                        self.noise.sample_qc_metric(500.0, 150.0, 1.0, 5000.0)
+                    ), 2),
+                },
+            },
+            uncertainty=0.20,
+            artifacts_available=["in_vivo_report"],
+        )
+    def _crispr_knockout(
+        self, action: DrugTargetAction, s: FullLatentState, idx: int
+    ) -> IntermediateOutput:
+        t = s.target
+        ess = float(self.noise.sample_qc_metric(
+            t.crispr_essentiality, 0.15, -3.0, 1.0
+        ))
+        synthetic_lethal = list(t.off_target_genes[:3])
+        return IntermediateOutput(
+            output_type=OutputType.CRISPR_RESULT,
+            step_index=idx,
+            quality_score=0.80,
+            summary=(
+                f"CRISPR essentiality score={ess:.2f}; "
+                f"{len(synthetic_lethal)} synthetic-lethal candidates"
+            ),
+            data={
+                "essentiality_score": round(ess, 3),
+                "synthetic_lethal_partners": synthetic_lethal,
+            },
+            uncertainty=0.18,
+            artifacts_available=["crispr_report"],
+        )
+    def _biomarker_correlation(
+        self, action: DrugTargetAction, s: FullLatentState, idx: int
+    ) -> IntermediateOutput:
+        t = s.target
+        corr = float(self.noise.sample_qc_metric(
+            0.6 if t.responder_biomarker else 0.2, 0.12, -1.0, 1.0,
+        ))
+        return IntermediateOutput(
+            output_type=OutputType.BIOMARKER_RESULT,
+            step_index=idx,
+            quality_score=0.78,
+            summary=(
+                f"Biomarker correlation r={corr:.2f} "
+                f"({t.responder_biomarker or 'no_biomarker'})"
+            ),
+            data={
+                "biomarker": t.responder_biomarker,
+                "correlation": round(corr, 3),
+            },
+            uncertainty=0.22,
+            artifacts_available=["biomarker_report"],
+        )
+    # ── Meta ────────────────────────────────────────────────────────────
+    def _flag_red_flag(
+        self, action: DrugTargetAction, s: FullLatentState, idx: int
+    ) -> IntermediateOutput:
+        note = str(action.parameters.get("note", "(no detail)"))
+        return IntermediateOutput(
+            output_type=OutputType.RED_FLAG_NOTE,
+            step_index=idx,
+            quality_score=1.0,
+            summary=f"Red flag recorded: {note[:80]}",
+            data={"note": note},
+            uncertainty=0.0,
+            artifacts_available=["dossier_red_flag"],
+        )
+    def _request_expert_review(
+        self, action: DrugTargetAction, s: FullLatentState, idx: int
+    ) -> IntermediateOutput:
+        flags = s.progress.model_dump()
+        covered = sum(1 for k, v in flags.items() if isinstance(v, bool) and v)
+        quality = float(min(0.75, 0.20 + 0.05 * covered))
+        return IntermediateOutput(
+            output_type=OutputType.EXPERT_REVIEW,
+            step_index=idx,
+            quality_score=quality,
+            summary=(
+                f"Expert review (coverage signal={covered})"
+            ),
+            data={
+                "evidence_signal_count": covered,
+                "review": (
+                    "Review more meaningful when more evidence dimensions "
+                    "have been opened."
+                ),
+            },
+            uncertainty=max(0.25, 0.80 - 0.05 * covered),
+            artifacts_available=["expert_review_note"],
+        )
+    def _submit_validation_report(
+        self, action: DrugTargetAction, s: FullLatentState, idx: int
+    ) -> IntermediateOutput:
+        decision = action.final_decision or "no_decision"
+        confidence = float(action.confidence) if action.confidence is not None else 0.0
+        return IntermediateOutput(
+            output_type=OutputType.VALIDATION_REPORT,
+            step_index=idx,
+            quality_score=1.0,
+            summary=(
+                f"Validation report submitted: decision={decision}, "
+                f"confidence={confidence:.2f}"
+            ),
+            data={
+                "decision": decision,
+                "confidence": confidence,
+                "reasoning": action.reasoning or "",
+            },
+            uncertainty=0.0,
+            artifacts_available=["validation_report"],
+        )
+    # ── Default ────────────────────────────────────────────────────────
+    def _default(
+        self, action: DrugTargetAction, s: FullLatentState, idx: int
+    ) -> IntermediateOutput:
+        return IntermediateOutput(
+            output_type=OutputType.FAILURE_REPORT,
+            step_index=idx,
+            success=False,
+            summary=f"Unhandled action type: {action.action_type}",
+            data={},
+        )
+_HANDLERS = {
+    ActionType.QUERY_EXPRESSION: OutputGenerator._query_expression,
+    ActionType.DIFFERENTIAL_EXPRESSION: OutputGenerator._differential_expression,
+    ActionType.PATHWAY_ENRICHMENT: OutputGenerator._pathway_enrichment,
+    ActionType.COEXPRESSION_NETWORK: OutputGenerator._coexpression_network,
+    ActionType.PROTEIN_STRUCTURE_LOOKUP: OutputGenerator._protein_structure_lookup,
+    ActionType.BINDING_SITE_ANALYSIS: OutputGenerator._binding_site_analysis,
+    ActionType.PROTEIN_INTERACTION_NETWORK: OutputGenerator._protein_interaction_network,
+    ActionType.DRUGGABILITY_SCREEN: OutputGenerator._druggability_screen,
+    ActionType.CLINICAL_TRIAL_LOOKUP: OutputGenerator._clinical_trial_lookup,
+    ActionType.TOXICITY_PANEL: OutputGenerator._toxicity_panel,
+    ActionType.OFF_TARGET_SCREEN: OutputGenerator._off_target_screen,
+    ActionType.PATIENT_STRATIFICATION: OutputGenerator._patient_stratification,
+    ActionType.LITERATURE_SEARCH: OutputGenerator._literature_search,
+    ActionType.EVIDENCE_SYNTHESIS: OutputGenerator._evidence_synthesis,
+    ActionType.COMPETITOR_LANDSCAPE: OutputGenerator._competitor_landscape,
+    ActionType.IN_VITRO_ASSAY: OutputGenerator._in_vitro_assay,
+    ActionType.IN_VIVO_MODEL: OutputGenerator._in_vivo_model,
+    ActionType.CRISPR_KNOCKOUT: OutputGenerator._crispr_knockout,
+    ActionType.BIOMARKER_CORRELATION: OutputGenerator._biomarker_correlation,
+    ActionType.FLAG_RED_FLAG: OutputGenerator._flag_red_flag,
+    ActionType.REQUEST_EXPERT_REVIEW: OutputGenerator._request_expert_review,
+    ActionType.SUBMIT_VALIDATION_REPORT: OutputGenerator._submit_validation_report,
+}

server/simulator/transition.py ADDED Viewed

	@@ -0,0 +1,201 @@

+"""Transition dynamics engine for the drug-target-validation simulator.
+Orchestrates latent-state updates, output generation, credit accounting,
+and constraint propagation for every agent action.
+"""
+from __future__ import annotations
+from copy import deepcopy
+from dataclasses import dataclass, field
+from typing import Dict, List, Optional, Tuple
+from models import (
+    ActionType,
+    DrugTargetAction,
+    IntermediateOutput,
+    OutputType,
+)
+from .latent_state import FullLatentState
+from .noise import NoiseModel
+from .output_generator import OutputGenerator
+# Credit costs per ActionType.
+_BASE_ACTION_COSTS: Dict[ActionType, int] = {
+    ActionType.QUERY_EXPRESSION: 2,
+    ActionType.DIFFERENTIAL_EXPRESSION: 2,
+    ActionType.PATHWAY_ENRICHMENT: 2,
+    ActionType.COEXPRESSION_NETWORK: 2,
+    ActionType.PROTEIN_STRUCTURE_LOOKUP: 3,
+    ActionType.BINDING_SITE_ANALYSIS: 3,
+    ActionType.PROTEIN_INTERACTION_NETWORK: 2,
+    ActionType.DRUGGABILITY_SCREEN: 3,
+    ActionType.CLINICAL_TRIAL_LOOKUP: 3,
+    ActionType.TOXICITY_PANEL: 3,
+    ActionType.OFF_TARGET_SCREEN: 3,
+    ActionType.PATIENT_STRATIFICATION: 3,
+    ActionType.LITERATURE_SEARCH: 1,
+    ActionType.EVIDENCE_SYNTHESIS: 1,
+    ActionType.COMPETITOR_LANDSCAPE: 1,
+    ActionType.IN_VITRO_ASSAY: 5,
+    ActionType.IN_VIVO_MODEL: 8,
+    ActionType.CRISPR_KNOCKOUT: 4,
+    ActionType.BIOMARKER_CORRELATION: 3,
+    ActionType.FLAG_RED_FLAG: 0,
+    ActionType.REQUEST_EXPERT_REVIEW: 1,
+    ActionType.SUBMIT_VALIDATION_REPORT: 0,
+}
+# Public alias kept for callers that historically imported ACTION_COSTS.
+ACTION_COSTS = _BASE_ACTION_COSTS
+def compute_action_cost(action: DrugTargetAction) -> int:
+    """Return the credit cost for a single action."""
+    return _BASE_ACTION_COSTS.get(action.action_type, 0)
+# Map action type → progress flag that should be set when it succeeds.
+_PROGRESS_MAP: Dict[ActionType, str] = {
+    ActionType.QUERY_EXPRESSION: "expression_queried",
+    ActionType.DIFFERENTIAL_EXPRESSION: "expression_queried",
+    ActionType.PATHWAY_ENRICHMENT: "pathway_analysed",
+    ActionType.COEXPRESSION_NETWORK: "interactions_mapped",
+    ActionType.PROTEIN_STRUCTURE_LOOKUP: "structure_resolved",
+    ActionType.BINDING_SITE_ANALYSIS: "druggability_assessed",
+    ActionType.PROTEIN_INTERACTION_NETWORK: "interactions_mapped",
+    ActionType.DRUGGABILITY_SCREEN: "druggability_assessed",
+    ActionType.CLINICAL_TRIAL_LOOKUP: "clinical_checked",
+    ActionType.TOXICITY_PANEL: "toxicity_assessed",
+    ActionType.OFF_TARGET_SCREEN: "selectivity_checked",
+    ActionType.PATIENT_STRATIFICATION: "patient_stratification_done",
+    ActionType.LITERATURE_SEARCH: "literature_reviewed",
+    ActionType.EVIDENCE_SYNTHESIS: "evidence_synthesised",
+    ActionType.COMPETITOR_LANDSCAPE: "literature_reviewed",
+    ActionType.IN_VITRO_ASSAY: "in_vitro_done",
+    ActionType.IN_VIVO_MODEL: "in_vivo_done",
+    ActionType.CRISPR_KNOCKOUT: "crispr_done",
+    ActionType.BIOMARKER_CORRELATION: "biomarker_correlated",
+    ActionType.REQUEST_EXPERT_REVIEW: "expert_reviewed",
+    ActionType.SUBMIT_VALIDATION_REPORT: "report_submitted",
+}
+@dataclass
+class TransitionResult:
+    """Bundle returned by the transition engine after one step."""
+    next_state: FullLatentState
+    output: IntermediateOutput
+    reward_components: Dict[str, float] = field(default_factory=dict)
+    hard_violations: List[str] = field(default_factory=list)
+    soft_violations: List[str] = field(default_factory=list)
+    done: bool = False
+class TransitionEngine:
+    """Applies one action to the latent state, producing the next state and
+    a simulated intermediate output. Delegates output generation to
+    ``OutputGenerator``.
+    """
+    def __init__(self, noise: NoiseModel):
+        self.noise = noise
+        self.output_gen = OutputGenerator(noise)
+    def step(
+        self,
+        state: FullLatentState,
+        action: DrugTargetAction,
+        *,
+        hard_violations: Optional[List[str]] = None,
+        soft_violations: Optional[List[str]] = None,
+    ) -> TransitionResult:
+        s = deepcopy(state)
+        step_idx = sum(s.action_call_counts.values()) + 1
+        hard_v = hard_violations or []
+        soft_v = soft_violations or []
+        if hard_v:
+            output = IntermediateOutput(
+                output_type=OutputType.FAILURE_REPORT,
+                step_index=step_idx,
+                success=False,
+                summary=f"Action blocked: {'; '.join(hard_v)}",
+            )
+            done = action.action_type == ActionType.SUBMIT_VALIDATION_REPORT
+            return TransitionResult(
+                next_state=s,
+                output=output,
+                hard_violations=hard_v,
+                soft_violations=soft_v,
+                done=done,
+            )
+        # Track call counts before deduction so the rule engine can use
+        # them when reasoning about redundancy on the next step.
+        key = action.action_type.value
+        s.action_call_counts[key] = s.action_call_counts.get(key, 0) + 1
+        # Deduct credits.
+        cost = compute_action_cost(action)
+        s.credits.credits_used += cost
+        # If credits exhausted *and* this isn't a terminal report, the
+        # episode ends with a failure-style output (the caller still
+        # records the action).
+        credits_exhausted_after = s.credits.exhausted
+        # Generate the simulated output.
+        output = self.output_gen.generate(action, s, step_idx)
+        if soft_v:
+            output.quality_score = float(max(0.0, output.quality_score * 0.7))
+            output.warnings = list(output.warnings) + list(soft_v)
+        # Update progress flags for successful actions.
+        flag = _PROGRESS_MAP.get(action.action_type)
+        if flag and output.success:
+            setattr(s.progress, flag, True)
+        # Determine episode termination.
+        done = (
+            action.action_type == ActionType.SUBMIT_VALIDATION_REPORT
+            or credits_exhausted_after
+        )
+        return TransitionResult(
+            next_state=s,
+            output=output,
+            soft_violations=soft_v,
+            done=done,
+        )
+    @staticmethod
+    def covered_evidence_dimensions(s: FullLatentState) -> List[str]:
+        """Return the set of *evidence dimensions* the agent has touched.
+        Mirrors the keys used in ``TargetProfile.key_evidence_dimensions``
+        so the reward computer can compute coverage directly.
+        """
+        p = s.progress
+        flags: List[Tuple[str, bool]] = [
+            ("expression", p.expression_queried),
+            ("druggability", p.druggability_assessed),
+            ("off_target", p.selectivity_checked),
+            ("toxicity", p.toxicity_assessed),
+            ("clinical", p.clinical_checked),
+            ("literature", p.literature_reviewed),
+            ("in_vitro", p.in_vitro_done),
+            ("in_vivo", p.in_vivo_done),
+            ("patient_stratification", p.patient_stratification_done),
+            ("pathway", p.pathway_analysed),
+            ("structure", p.structure_resolved),
+            ("interactions", p.interactions_mapped),
+            ("crispr", p.crispr_done),
+            ("biomarker", p.biomarker_correlated),
+        ]
+        return [name for name, hit in flags if hit]

server/tasks/__init__.py ADDED Viewed

	@@ -0,0 +1,4 @@

+from .generator import TaskGenerator
+from .scenarios import SCENARIO_LIBRARY, Scenario
+__all__ = ["SCENARIO_LIBRARY", "Scenario", "TaskGenerator"]

server/tasks/generator.py ADDED Viewed

	@@ -0,0 +1,132 @@

+"""Task generator — produces (ValidationTaskSpec, FullLatentState) pairs
+for drug-target-validation episodes.
+Supports two modes:
+  1. Select from the curated ``SCENARIO_LIBRARY``.
+  2. Add procedurally-generated scenarios on top.
+"""
+from __future__ import annotations
+from typing import List, Optional, Tuple
+import numpy as np
+from models import ActionType, ValidationTaskSpec
+from server.simulator.latent_state import (
+    CreditState,
+    DataQualityState,
+    FullLatentState,
+    TargetProfile,
+    ValidationProgress,
+)
+from .scenarios import SCENARIO_LIBRARY, Scenario
+from .procedural_generator import generate_procedural_scenarios
+class TaskGenerator:
+    """Generates task + latent-state pairs for environment episodes."""
+    def __init__(
+        self,
+        scenarios: Optional[List[Scenario]] = None,
+        domain_randomise: bool = True,
+    ):
+        if scenarios is not None:
+            self.scenarios = scenarios
+        else:
+            self.scenarios = list(SCENARIO_LIBRARY) + generate_procedural_scenarios(
+                n=20, seed=42,
+            )
+        self.domain_randomise = domain_randomise
+    def generate(
+        self,
+        *,
+        seed: Optional[int] = None,
+        scenario_name: Optional[str] = None,
+    ) -> Tuple[ValidationTaskSpec, FullLatentState]:
+        rng = np.random.default_rng(seed)
+        if scenario_name:
+            scenario = self._find_scenario(scenario_name)
+        else:
+            idx = int(rng.integers(0, len(self.scenarios)))
+            scenario = self.scenarios[idx]
+        task = scenario.task.model_copy(deep=True)
+        target = scenario.target.model_copy(deep=True)
+        data_quality = scenario.data_quality.model_copy(deep=True)
+        if self.domain_randomise:
+            self._randomise(rng, task, target, data_quality)
+        if not task.available_actions:
+            task.available_actions = [a.value for a in ActionType]
+        latent = FullLatentState(
+            target=target,
+            data_quality=data_quality,
+            progress=ValidationProgress(),
+            credits=CreditState(credits_total=task.credits_limit),
+            rng_seed=seed or 0,
+        )
+        return task, latent
+    def list_scenarios(self) -> List[str]:
+        return [s.name for s in self.scenarios]
+    # ── internals ───────────────────────────────────────────────────────
+    def _find_scenario(self, name: str) -> Scenario:
+        for s in self.scenarios:
+            if s.name == name:
+                return s
+        available = ", ".join(self.list_scenarios())
+        raise ValueError(f"Unknown scenario '{name}'. Available: {available}")
+    @staticmethod
+    def _randomise(
+        rng: np.random.Generator,
+        task: ValidationTaskSpec,
+        target: TargetProfile,
+        data_quality: DataQualityState,
+    ) -> None:
+        """Light domain randomisation that nudges noise / numerics without
+        flipping ``correct_decision`` or ``key_evidence_dimensions``."""
+        # Credit budget jitter
+        task.credits_limit = int(
+            max(15, round(task.credits_limit * float(rng.uniform(0.9, 1.1))))
+        )
+        # Data-quality jitter
+        data_quality.noise_level = float(np.clip(
+            data_quality.noise_level + rng.normal(0, 0.02), 0.02, 0.4
+        ))
+        data_quality.false_positive_rate = float(np.clip(
+            data_quality.false_positive_rate + rng.normal(0, 0.01), 0.0, 0.3
+        ))
+        data_quality.false_negative_rate = float(np.clip(
+            data_quality.false_negative_rate + rng.normal(0, 0.01), 0.0, 0.3
+        ))
+        data_quality.database_coverage = float(np.clip(
+            data_quality.database_coverage + rng.normal(0, 0.03), 0.5, 1.0
+        ))
+        # Target profile numerics — keep categorical fields fixed.
+        target.tissue_specificity = float(np.clip(
+            target.tissue_specificity * float(rng.uniform(0.9, 1.1)), 0.0, 1.0
+        ))
+        target.disease_overexpression = float(max(
+            0.1, target.disease_overexpression * float(rng.uniform(0.85, 1.15))
+        ))
+        target.druggability_score = float(np.clip(
+            target.druggability_score * float(rng.uniform(0.9, 1.1)), 0.0, 1.0
+        ))
+        target.selectivity_ratio = float(max(
+            0.0, target.selectivity_ratio * float(rng.uniform(0.85, 1.15))
+        ))
+        target.in_vitro_ic50_nM = float(max(
+            0.5, target.in_vitro_ic50_nM * float(rng.uniform(0.7, 1.3))
+        ))

server/tasks/procedural_generator.py ADDED Viewed

	@@ -0,0 +1,232 @@

+"""Procedural drug-target-validation scenario generator.
+Composes coherent ``Scenario`` objects by sampling from a pool of real
+cancer targets and disease contexts and bundling them with an internally
+consistent ``TargetProfile`` (viable vs non-viable bundles).
+"""
+from __future__ import annotations
+import logging
+from typing import List, Optional
+import numpy as np
+from models import ValidationTaskSpec
+from server.simulator.latent_state import (
+    DataQualityState,
+    TargetProfile,
+)
+from .scenarios import Scenario
+logger = logging.getLogger(__name__)
+_TARGET_POOL: List[str] = [
+    "BRAF", "MET", "FGFR1", "PIK3CA", "AKT1", "CDK4", "MDM2", "BCL2",
+    "PARP1", "IDH1", "IDH2", "FLT3", "JAK2", "BTK", "MTOR", "ALK",
+    "ROS1", "KIT", "ERBB2", "ABL1",
+]
+_DISEASE_POOL: List[str] = [
+    "non-small cell lung cancer",
+    "colorectal cancer",
+    "melanoma",
+    "acute myeloid leukemia",
+    "chronic myeloid leukemia",
+    "glioblastoma",
+    "breast cancer",
+    "ovarian cancer",
+]
+_DIFFICULTY_PARAMS = {
+    "easy": {
+        "noise_level": (0.05, 0.10),
+        "false_positive_rate": (0.02, 0.05),
+        "false_negative_rate": (0.02, 0.05),
+        "database_coverage": (0.90, 1.0),
+        "credits_limit": (45, 60),
+        "viable_prob": 0.65,
+        "n_key_evidence": (1, 2),
+        "misleading_prob": 0.0,
+    },
+    "medium": {
+        "noise_level": (0.08, 0.15),
+        "false_positive_rate": (0.04, 0.08),
+        "false_negative_rate": (0.04, 0.08),
+        "database_coverage": (0.80, 0.95),
+        "credits_limit": (40, 55),
+        "viable_prob": 0.50,
+        "n_key_evidence": (2, 3),
+        "misleading_prob": 0.20,
+    },
+    "hard": {
+        "noise_level": (0.12, 0.22),
+        "false_positive_rate": (0.06, 0.12),
+        "false_negative_rate": (0.06, 0.12),
+        "database_coverage": (0.65, 0.90),
+        "credits_limit": (35, 50),
+        "viable_prob": 0.45,
+        "n_key_evidence": (3, 4),
+        "misleading_prob": 0.50,
+    },
+}
+def _build_viable_target(rng: np.random.Generator) -> TargetProfile:
+    return TargetProfile(
+        expression_level=str(rng.choice(["high_specific", "moderate"])),
+        tissue_specificity=float(rng.uniform(0.55, 0.90)),
+        disease_overexpression=float(rng.uniform(2.0, 5.0)),
+        druggability_score=float(rng.uniform(0.55, 0.90)),
+        binding_pocket_quality=str(rng.choice(["excellent", "good"])),
+        has_known_ligands=True,
+        allosteric_site_available=bool(rng.choice([True, False])),
+        selectivity_ratio=float(rng.uniform(5.0, 20.0)),
+        off_target_count=int(rng.integers(0, 4)),
+        off_target_genes=[],
+        toxicity_profile=str(rng.choice(["clean", "mild", "moderate"])),
+        toxicity_tissues=[],
+        clinical_precedent=str(rng.choice(["positive", "mixed"])),
+        clinical_stage_reached=str(rng.choice(["phase1", "phase2", "phase3"])),
+        competitor_programs=[],
+        requires_patient_stratification=bool(rng.choice([True, False])),
+        responder_biomarker=None,
+        in_vitro_ic50_nM=float(rng.uniform(2.0, 100.0)),
+        in_vivo_efficacy=str(rng.choice(["strong", "moderate"])),
+        crispr_essentiality=float(rng.uniform(-1.5, -0.5)),
+        true_viability_score=float(rng.uniform(0.65, 0.90)),
+        correct_decision="go",
+    )
+def _build_nonviable_target(rng: np.random.Generator) -> TargetProfile:
+    return TargetProfile(
+        expression_level=str(rng.choice(["high_nonspecific", "low", "moderate"])),
+        tissue_specificity=float(rng.uniform(0.10, 0.45)),
+        disease_overexpression=float(rng.uniform(0.5, 1.8)),
+        druggability_score=float(rng.uniform(0.05, 0.40)),
+        binding_pocket_quality=str(rng.choice(["poor", "undruggable"])),
+        has_known_ligands=False,
+        allosteric_site_available=False,
+        selectivity_ratio=float(rng.uniform(0.5, 3.0)),
+        off_target_count=int(rng.integers(5, 12)),
+        off_target_genes=[f"OFF_{i}" for i in range(int(rng.integers(2, 6)))],
+        toxicity_profile=str(rng.choice(["moderate", "severe"])),
+        toxicity_tissues=[
+            str(rng.choice(["liver", "kidney", "cardiac", "CNS", "GI"]))
+        ],
+        clinical_precedent=str(rng.choice(["negative", "none", "mixed"])),
+        clinical_stage_reached=None,
+        competitor_programs=[],
+        requires_patient_stratification=False,
+        responder_biomarker=None,
+        in_vitro_ic50_nM=float(rng.uniform(500.0, 10_000.0)),
+        in_vivo_efficacy=str(rng.choice(["weak", "none"])),
+        crispr_essentiality=float(rng.uniform(-0.3, 0.3)),
+        true_viability_score=float(rng.uniform(0.05, 0.35)),
+        correct_decision="no_go",
+    )
+_DIMENSION_POOL: List[str] = [
+    "expression",
+    "druggability",
+    "off_target",
+    "toxicity",
+    "clinical",
+    "literature",
+    "in_vitro",
+    "in_vivo",
+    "patient_stratification",
+]
+def generate_scenario(
+    seed: int,
+    difficulty: str = "medium",
+) -> Scenario:
+    """Generate a single procedural scenario with complete latent state."""
+    rng = np.random.default_rng(seed)
+    params = _DIFFICULTY_PARAMS[difficulty]
+    target_gene = str(rng.choice(_TARGET_POOL))
+    disease = str(rng.choice(_DISEASE_POOL))
+    if rng.random() < params["viable_prob"]:
+        target = _build_viable_target(rng)
+    else:
+        target = _build_nonviable_target(rng)
+    n_key = int(rng.integers(*params["n_key_evidence"]))
+    target.key_evidence_dimensions = list(
+        rng.choice(_DIMENSION_POOL, size=min(n_key, len(_DIMENSION_POOL)),
+                   replace=False)
+    )
+    if rng.random() < params["misleading_prob"]:
+        target.misleading_signals = [
+            "high_expression_looks_positive"
+            if target.correct_decision == "no_go"
+            else "historical_undruggability"
+        ]
+    data_quality = DataQualityState(
+        noise_level=round(float(rng.uniform(*params["noise_level"])), 3),
+        false_positive_rate=round(
+            float(rng.uniform(*params["false_positive_rate"])), 3
+        ),
+        false_negative_rate=round(
+            float(rng.uniform(*params["false_negative_rate"])), 3
+        ),
+        database_coverage=round(
+            float(rng.uniform(*params["database_coverage"])), 3
+        ),
+    )
+    credits_limit = int(rng.integers(*params["credits_limit"]))
+    task = ValidationTaskSpec(
+        problem_statement=(
+            f"Validate {target_gene} as a drug target in {disease}."
+        ),
+        target_gene=target_gene,
+        disease_context=disease,
+        indication=f"{target_gene}-driven {disease}",
+        credits_limit=credits_limit,
+        success_criteria=[
+            f"Investigate the key evidence for {target_gene}",
+            "Submit a calibrated go / no_go validation report",
+        ],
+    )
+    name = f"proc_{target_gene}_{difficulty}_{seed}"
+    tags = [difficulty, target_gene, disease.replace(" ", "_")]
+    return Scenario(
+        name=name,
+        task=task,
+        target=target,
+        data_quality=data_quality,
+        difficulty=difficulty,
+        tags=tags,
+    )
+def generate_procedural_scenarios(
+    n: int = 20,
+    seed: int = 42,
+) -> List[Scenario]:
+    """Pre-generate a pool of procedural scenarios across difficulties."""
+    rng = np.random.default_rng(seed)
+    scenarios: List[Scenario] = []
+    difficulties = ["easy", "medium", "hard"]
+    for i in range(n):
+        diff = difficulties[i % len(difficulties)]
+        child_seed = int(rng.integers(0, 2**31))
+        scenarios.append(generate_scenario(seed=child_seed, difficulty=diff))
+    logger.info("Generated %d procedural scenarios.", len(scenarios))
+    return scenarios

server/tasks/scenarios.py ADDED Viewed

	@@ -0,0 +1,370 @@

+"""Pre-defined drug-target-validation scenarios.
+Each ``Scenario`` bundles a ``ValidationTaskSpec`` together with the
+matching hidden ``TargetProfile`` so the simulator can instantiate
+consistent episodes. The library spans the easy → very-hard difficulty
+range and intentionally includes misleading-signal scenarios where the
+naive answer disagrees with the correct decision.
+"""
+from __future__ import annotations
+from dataclasses import dataclass, field
+from typing import List
+from models import ValidationTaskSpec
+from server.simulator.latent_state import (
+    DataQualityState,
+    TargetProfile,
+)
+@dataclass
+class Scenario:
+    """A reproducible (task, ground-truth) pair."""
+    name: str
+    task: ValidationTaskSpec
+    target: TargetProfile
+    data_quality: DataQualityState = field(default_factory=DataQualityState)
+    difficulty: str = "medium"
+    tags: List[str] = field(default_factory=list)
+# ── Scenario library ────────────────────────────────────────────────────────
+SCENARIO_LIBRARY: List[Scenario] = [
+    # ── 1. EGFR / NSCLC — easy go ────────────────────────────────────────
+    Scenario(
+        name="egfr_nsclc_viable",
+        difficulty="easy",
+        tags=["oncology", "kinase", "clear_go"],
+        task=ValidationTaskSpec(
+            problem_statement=(
+                "Validate EGFR as a drug target in EGFR-mutant non-small "
+                "cell lung cancer."
+            ),
+            target_gene="EGFR",
+            disease_context="EGFR-mutant non-small cell lung cancer (NSCLC)",
+            indication="EGFR-mutant NSCLC",
+            credits_limit=50,
+            success_criteria=[
+                "Confirm tumor-selective expression",
+                "Confirm druggable kinase pocket",
+                "Confirm positive clinical precedent",
+                "Submit go recommendation with calibrated confidence",
+            ],
+            prior_observations=[
+                "EGFR mutations are well-established oncogenic drivers in NSCLC",
+            ],
+        ),
+        target=TargetProfile(
+            expression_level="high_specific",
+            tissue_specificity=0.85,
+            disease_overexpression=4.5,
+            druggability_score=0.92,
+            binding_pocket_quality="excellent",
+            has_known_ligands=True,
+            allosteric_site_available=True,
+            selectivity_ratio=15.0,
+            off_target_count=2,
+            off_target_genes=["ERBB2", "ERBB4"],
+            toxicity_profile="mild",
+            toxicity_tissues=["skin", "GI"],
+            clinical_precedent="positive",
+            clinical_stage_reached="phase3",
+            competitor_programs=["erlotinib", "gefitinib", "osimertinib"],
+            requires_patient_stratification=True,
+            responder_biomarker="EGFR_activating_mutation",
+            in_vitro_ic50_nM=2.0,
+            in_vivo_efficacy="strong",
+            crispr_essentiality=-1.4,
+            true_viability_score=0.88,
+            correct_decision="go",
+            misleading_signals=[],
+            key_evidence_dimensions=["expression", "druggability"],
+        ),
+        data_quality=DataQualityState(
+            noise_level=0.08,
+            false_positive_rate=0.04,
+            false_negative_rate=0.04,
+            database_coverage=0.95,
+        ),
+    ),
+    # ── 2. KRAS G12C / PDAC — borderline go ──────────────────────────────
+    Scenario(
+        name="kras_pdac_borderline",
+        difficulty="medium",
+        tags=["oncology", "GTPase", "borderline_go", "literature_critical"],
+        task=ValidationTaskSpec(
+            problem_statement=(
+                "Validate KRAS G12C as a drug target in pancreatic ductal "
+                "adenocarcinoma (PDAC)."
+            ),
+            target_gene="KRAS_G12C",
+            disease_context="Pancreatic ductal adenocarcinoma (PDAC)",
+            indication="KRAS G12C-mutant PDAC",
+            credits_limit=50,
+            success_criteria=[
+                "Re-evaluate druggability given recent inhibitor development",
+                "Check clinical precedent for KRAS G12C inhibitors",
+                "Submit go recommendation if recent advances support it",
+            ],
+            prior_observations=[
+                "KRAS was historically considered undruggable",
+                "Recent G12C-specific inhibitors have entered clinical use",
+            ],
+        ),
+        target=TargetProfile(
+            expression_level="high_specific",
+            tissue_specificity=0.70,
+            disease_overexpression=3.0,
+            druggability_score=0.65,
+            binding_pocket_quality="good",
+            has_known_ligands=True,
+            allosteric_site_available=True,
+            selectivity_ratio=6.0,
+            off_target_count=4,
+            off_target_genes=["HRAS", "NRAS", "RRAS", "MRAS"],
+            toxicity_profile="moderate",
+            toxicity_tissues=["GI", "skin"],
+            clinical_precedent="positive",
+            clinical_stage_reached="phase2",
+            competitor_programs=["sotorasib", "adagrasib"],
+            requires_patient_stratification=True,
+            responder_biomarker="KRAS_G12C_mutation",
+            in_vitro_ic50_nM=15.0,
+            in_vivo_efficacy="moderate",
+            crispr_essentiality=-1.1,
+            true_viability_score=0.62,
+            correct_decision="go",
+            misleading_signals=["historical_undruggability"],
+            key_evidence_dimensions=[
+                "druggability",
+                "literature",
+                "clinical",
+            ],
+        ),
+        data_quality=DataQualityState(
+            noise_level=0.12,
+            false_positive_rate=0.06,
+            false_negative_rate=0.06,
+            database_coverage=0.85,
+        ),
+    ),
+    # ── 3. CD33 / AML — misleading no-go ────────────────────────────────
+    Scenario(
+        name="cd33_aml_misleading",
+        difficulty="hard",
+        tags=["oncology", "antibody", "misleading", "selectivity_critical"],
+        task=ValidationTaskSpec(
+            problem_statement=(
+                "Validate CD33 as a drug target in acute myeloid leukemia "
+                "(AML)."
+            ),
+            target_gene="CD33",
+            disease_context="Acute myeloid leukemia (AML)",
+            indication="CD33-positive AML",
+            credits_limit=50,
+            success_criteria=[
+                "Quantify on-target expression in AML blasts vs normal myeloid",
+                "Run off-target / paralog screen",
+                "Run toxicity panel and clinical precedent",
+                "Submit calibrated go/no_go decision",
+            ],
+            prior_observations=[
+                "CD33 is highly expressed on AML blasts",
+                "Gemtuzumab ozogamicin had a complicated regulatory history",
+            ],
+        ),
+        target=TargetProfile(
+            expression_level="high_nonspecific",
+            tissue_specificity=0.35,
+            disease_overexpression=2.0,
+            druggability_score=0.55,
+            binding_pocket_quality="good",
+            has_known_ligands=True,
+            allosteric_site_available=False,
+            selectivity_ratio=1.6,
+            off_target_count=8,
+            off_target_genes=[
+                "CD33L",
+                "SIGLEC5",
+                "SIGLEC6",
+                "SIGLEC7",
+                "SIGLEC9",
+            ],
+            toxicity_profile="severe",
+            toxicity_tissues=[
+                "bone_marrow",
+                "myeloid_progenitors",
+                "liver",
+            ],
+            clinical_precedent="mixed",
+            clinical_stage_reached="phase3",
+            competitor_programs=["gemtuzumab_ozogamicin"],
+            requires_patient_stratification=False,
+            responder_biomarker=None,
+            in_vitro_ic50_nM=120.0,
+            in_vivo_efficacy="weak",
+            crispr_essentiality=-0.2,
+            true_viability_score=0.22,
+            correct_decision="no_go",
+            misleading_signals=[
+                "high_expression_looks_positive",
+                "partial_clinical_precedent",
+            ],
+            key_evidence_dimensions=[
+                "off_target",
+                "toxicity",
+                "clinical",
+            ],
+        ),
+        data_quality=DataQualityState(
+            noise_level=0.15,
+            false_positive_rate=0.08,
+            false_negative_rate=0.08,
+            database_coverage=0.85,
+        ),
+    ),
+    # ── 4. TP53 — clear no-go ────────────────────────────────────────────
+    Scenario(
+        name="tp53_solid_tumors_clear_fail",
+        difficulty="easy_medium",
+        tags=["oncology", "transcription_factor", "clear_no_go"],
+        task=ValidationTaskSpec(
+            problem_statement=(
+                "Validate TP53 (small-molecule restoration approach) as a "
+                "drug target across solid tumors."
+            ),
+            target_gene="TP53",
+            disease_context="Pan-cancer solid tumors with TP53 loss",
+            indication="TP53-mutant solid tumors",
+            credits_limit=50,
+            success_criteria=[
+                "Assess druggability honestly",
+                "Submit no_go if druggability is poor",
+            ],
+            prior_observations=[
+                "TP53 is the most frequently mutated gene in cancer",
+                "Direct small-molecule restoration has historically failed",
+            ],
+        ),
+        target=TargetProfile(
+            expression_level="moderate",
+            tissue_specificity=0.20,
+            disease_overexpression=0.6,
+            druggability_score=0.10,
+            binding_pocket_quality="undruggable",
+            has_known_ligands=False,
+            allosteric_site_available=False,
+            selectivity_ratio=1.0,
+            off_target_count=0,
+            off_target_genes=[],
+            toxicity_profile="moderate",
+            toxicity_tissues=["multiple"],
+            clinical_precedent="negative",
+            clinical_stage_reached="phase1",
+            competitor_programs=["APR-246_eprenetapopt"],
+            requires_patient_stratification=False,
+            responder_biomarker=None,
+            in_vitro_ic50_nM=10000.0,
+            in_vivo_efficacy="none",
+            crispr_essentiality=0.1,
+            true_viability_score=0.08,
+            correct_decision="no_go",
+            misleading_signals=[],
+            key_evidence_dimensions=["druggability"],
+        ),
+        data_quality=DataQualityState(
+            noise_level=0.10,
+            false_positive_rate=0.05,
+            false_negative_rate=0.05,
+            database_coverage=0.90,
+        ),
+    ),
+    # ── 5. SHP2 / JMML — very hard go ────────────────────────────────────
+    Scenario(
+        name="ptpn11_juvenile_mml_complex",
+        difficulty="very_hard",
+        tags=[
+            "oncology",
+            "phosphatase",
+            "allosteric",
+            "patient_stratification",
+            "complex_go",
+        ],
+        task=ValidationTaskSpec(
+            problem_statement=(
+                "Validate SHP2 (PTPN11) as a drug target in juvenile "
+                "myelomonocytic leukemia (JMML)."
+            ),
+            target_gene="PTPN11",
+            disease_context="Juvenile myelomonocytic leukemia (JMML)",
+            indication="PTPN11 GOF-mutant JMML",
+            credits_limit=50,
+            success_criteria=[
+                "Detect allosteric druggability via dedicated pocket analysis",
+                "Quantify pan-phosphatase off-target risk",
+                "Identify GOF-mutation-stratified patient population",
+                "Run in-vitro confirmation before final go/no_go",
+            ],
+            prior_observations=[
+                "PTPN11 GOF mutations drive JMML",
+                "Active site is shallow and considered undruggable; allosteric "
+                "inhibitors have changed the landscape",
+            ],
+        ),
+        target=TargetProfile(
+            expression_level="moderate",
+            tissue_specificity=0.45,
+            disease_overexpression=1.6,
+            druggability_score=0.40,
+            binding_pocket_quality="poor",
+            has_known_ligands=True,
+            allosteric_site_available=True,
+            selectivity_ratio=2.5,
+            off_target_count=12,
+            off_target_genes=[
+                "PTPN6",
+                "PTPN11_paralog",
+                "PTPN1",
+                "PTPN2",
+                "DUSP6",
+            ],
+            toxicity_profile="moderate",
+            toxicity_tissues=["bone_marrow", "GI"],
+            clinical_precedent="mixed",
+            clinical_stage_reached="phase2",
+            competitor_programs=["TNO155", "RMC-4630"],
+            requires_patient_stratification=True,
+            responder_biomarker="PTPN11_GOF_mutation",
+            in_vitro_ic50_nM=45.0,
+            in_vivo_efficacy="moderate",
+            crispr_essentiality=-0.9,
+            true_viability_score=0.58,
+            correct_decision="go",
+            misleading_signals=[
+                "pan-phosphatase_toxicity_concern",
+                "low_classic_druggability_score",
+            ],
+            key_evidence_dimensions=[
+                "druggability",
+                "off_target",
+                "patient_stratification",
+                "in_vitro",
+            ],
+        ),
+        data_quality=DataQualityState(
+            noise_level=0.18,
+            false_positive_rate=0.10,
+            false_negative_rate=0.10,
+            database_coverage=0.80,
+        ),
+    ),
+]

space/__init__.py ADDED Viewed

File without changes

space/training/Dockerfile ADDED Viewed

	@@ -0,0 +1,36 @@

+# DrugEnv trainer Space (Docker, single H200 GPU)
+# Serves the FastAPI control panel (space.training.app:app) on port 8000,
+# matched by README YAML app_port: 8000.
+FROM nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04
+ENV DEBIAN_FRONTEND=noninteractive \
+    PYTHONUNBUFFERED=1 \
+    PIP_NO_CACHE_DIR=1 \
+    HF_HOME=/home/user/.cache/huggingface \
+    TRANSFORMERS_CACHE=/home/user/.cache/huggingface/transformers \
+    PYTHONPATH=/home/user/app \
+    PORT=8000
+RUN apt-get update && apt-get install -y --no-install-recommends \
+        python3.11 python3.11-venv python3.11-dev python3-pip \
+        git curl ca-certificates build-essential \
+    && rm -rf /var/lib/apt/lists/* \
+    && ln -sf /usr/bin/python3.11 /usr/local/bin/python \
+    && ln -sf /usr/bin/python3.11 /usr/local/bin/python3
+RUN useradd -ms /bin/bash user
+USER user
+ENV PATH="/home/user/.local/bin:${PATH}"
+WORKDIR /home/user/app
+# Copy the entire repo first so relative -r references inside the
+# trainer requirements file (-r ../../requirements-train.txt etc.)
+# resolve correctly. Only after the tree is in place do we install.
+COPY --chown=user:user . /home/user/app
+RUN python -m pip install --upgrade pip && \
+    python -m pip install --user -r /home/user/app/space/training/requirements.txt
+EXPOSE 8000
+CMD ["python", "-m", "uvicorn", "space.training.app:app", "--host", "0.0.0.0", "--port", "8000"]

space/training/README.md ADDED Viewed

	@@ -0,0 +1,116 @@

+---
+title: DrugEnv Trainer
+sdk: docker
+pinned: false
+app_port: 8000
+tags:
+  - openenv
+  - reinforcement-learning
+  - drug-discovery
+  - grpo
+---
+# 🧬 DrugEnv Trainer
+A self-contained Hugging Face Space that runs **GRPO** (Group-Relative
+Policy Optimization) inside the **DrugEnv** drug-target-validation
+environment, with a live dashboard that streams reward curves, mid-
+training checkpoint evals, and a before/after summary as the run
+progresses.
+The trainer is designed to be flipped on with a single `POST /train`
+once the Space has been provisioned and (optionally) given an
+`HF_TOKEN` for pushing the resulting model and evidence artefacts.
+## Expected hardware
+| Knob | Value |
+|---|---|
+| Hardware target | **`h200x1`** (single H200 GPU) |
+| Throughput | ~4× A100 on Qwen2.5-3B-class GRPO |
+| Cost (rough) | ~$0.05–0.10 per GRPO step on Qwen2.5-3B |
+H200 is set via the Space settings page on the Hub — this README and
+the title bar of the dashboard advertise it; the runtime detects what
+it actually got via `torch.cuda.device_count()`.
+## Configuration
+Every knob is an environment variable so the Space can be reconfigured
+without a redeploy. Defaults match a sensible single-H200 run.
+| Variable | Default | Description |
+|---|---|---|
+| `MODEL_NAME` | `Qwen/Qwen2.5-3B-Instruct` | Base model loaded by GRPO. |
+| `TRAINING_BACKEND` | `vanilla` | `vanilla` (transformers) or `unsloth`. |
+| `DIFFICULTY` | `easy` | Default difficulty bucket. |
+| `TOTAL_EPISODES` | `120` | Prompt budget for GRPO. |
+| `MAX_STEPS` | `20` | Max env steps per rollout (DrugEnv allows up to 30). |
+| `NUM_GENERATIONS` | `4` | GRPO group size. |
+| `CHECKPOINT_EVAL_STEPS` | `50` | Run a held-out eval every N updates. |
+| `CHECKPOINT_EVAL_EPISODES` | `4` | Episodes per mid-training eval. |
+| `EVAL_EPISODES` | `8` | Pre/post-training eval size. |
+| `OUTPUT_DIR` | `runs/grpo-output` | Trained model directory. |
+| `EVIDENCE_DIR` | `evidence` | Where CSV/PNG artefacts land. |
+| `PUSH_REPO` | `anugrahteesdollar/drugenv-grpo-qwen3b` | Hub repo to upload to. |
+| `SFT_WARMSTART` | `true` | Run an oracle-driven SFT phase before GRPO. |
+| `SFT_NUM_EPISODES` | `200` | Oracle trajectories collected for SFT. |
+| `SFT_MAX_STEPS` | `25` | Per-episode cap for SFT trajectories. |
+| `SFT_EPOCHS` | `1` | SFT epochs over the collected dataset. |
+| `SFT_LR` | `1e-5` | SFT learning rate. |
+| `AUTOSTART` | `0` | Auto-launch a run on Space startup. |
+## Endpoints
+The control panel is served at `/` and refreshes every 5 s. The
+underlying JSON API surface:
+```
+GET  /              status page (HTML)
+GET  /status        run state
+GET  /metrics       pre / post evaluation metrics
+GET  /sft_summary   SFT warm-start summary (404 if not yet run)
+GET  /evidence      JSON index of evidence artefacts
+GET  /evidence/<name>   serve an artefact (with on-demand PNG synth fallback)
+GET  /logs?tail=N   last N lines of training.log
+POST /train         start a run; body is a JSON object of CONFIG overrides
+GET  /health        liveness probe
+```
+## Triggering a run
+```bash
+# Start with defaults
+curl -X POST https://<your-space>.hf.space/train
+# Override a few knobs for this run only
+curl -X POST https://<your-space>.hf.space/train \
+  -H 'Content-Type: application/json' \
+  -d '{"total_episodes": 240, "difficulty": "medium"}'
+```
+## Why this exists
+DrugEnv's reward function decomposes the terminal grade across
+`decision_accuracy`, `evidence_coverage`, `credit_efficiency`, and
+`reasoning_coherence`, with potential-based per-step shaping over the
+evidence-coverage potential and rule-driven redundancy / prerequisite
+penalties. With those guard-rails in place, the dominant failure mode
+of a small base model is *not* reward hacking — it's that the policy
+never sees a positive-reward rollout in the first place, because
+zero-shot Qwen2.5-3B cannot solve the drug-target-validation pipeline.
+The trainer Space addresses this with an optional SFT warm-start
+phase (`SFT_WARMSTART=true` by default): a short pass on oracle
+trajectories gives the policy a non-zero prior over the correct
+sequence, which GRPO then refines. The control panel surfaces both
+phases so you can see the warm-start loss, the GRPO reward curve, and
+the before / after summary in one view.
+## Evolution note
+The deployment scaffolding here — control panel, on-demand PNG synth,
+auto-refresh, evidence index — was originally validated against a
+particle-physics-themed prototype before being carried forward to
+DrugEnv. The reward shape, action space, and scenario library it
+targets are entirely drug-domain native.

space/training/__init__.py ADDED Viewed

File without changes

space/training/app.py ADDED Viewed

	@@ -0,0 +1,943 @@

+"""FastAPI control panel for the DrugEnv trainer Space.
+Endpoints
+---------
+    GET  /                → status page (HTML)
+    GET  /status          → JSON status of the current training run
+    GET  /metrics         → JSON snapshot of pre / post evaluation metrics
+    GET  /logs            → tail of the training log
+    GET  /sft_summary     → SFT warm-start summary if available
+    GET  /evidence        → JSON index of evidence/ directory
+    GET  /evidence/<name> → individual artefact (PNG/CSV/JSON/MD), with
+                            on-demand PNG synthesis as a fallback
+    POST /train           → start (or restart) a training run
+    GET  /health          → liveness probe
+Designed to run on a Hugging Face Space with ``sdk: docker``. Heavy
+training work runs in a background thread so the HTTP server stays
+responsive.
+"""
+from __future__ import annotations
+import ast
+import io
+import json
+import logging
+import os
+import re
+import subprocess
+import sys
+import threading
+import time
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+from fastapi import FastAPI, HTTPException, Request
+from fastapi.responses import (
+    FileResponse,
+    HTMLResponse,
+    JSONResponse,
+    PlainTextResponse,
+    Response,
+)
+logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
+logger = logging.getLogger(__name__)
+# Expected hardware target for the trainer Space. The actual hardware
+# is set via the Space settings page on the Hub — this constant is just
+# what the dashboard advertises in its title bar so reviewers know what
+# to provision. H200 ≈ 4× A100 throughput and is comfortably the
+# cheapest viable target for Qwen2.5-3B-class GRPO.
+EXPECTED_HARDWARE = "h200x1"
+def _resolve_repo_root() -> Path:
+    env_root = os.environ.get("DRUGENV_ROOT")
+    candidates: List[Path] = []
+    if env_root:
+        candidates.append(Path(env_root))
+    candidates.extend([
+        Path("/home/user/app"),
+        Path(__file__).resolve().parent.parent.parent,
+    ])
+    for p in candidates:
+        try:
+            if p.exists():
+                return p.resolve()
+        except OSError:
+            continue
+    return candidates[-1].resolve()
+REPO_ROOT = _resolve_repo_root()
+LOG_DIR = REPO_ROOT / "training" / "runs"
+try:
+    LOG_DIR.mkdir(parents=True, exist_ok=True)
+except OSError as exc:  # pragma: no cover - read-only filesystem fallback
+    logger.warning("could not create %s (%s); using /tmp", LOG_DIR, exc)
+    LOG_DIR = Path("/tmp/drugenv-runs")
+    LOG_DIR.mkdir(parents=True, exist_ok=True)
+LOG_FILE = LOG_DIR / "training.log"
+EVIDENCE_DIR = REPO_ROOT / "evidence"
+try:
+    EVIDENCE_DIR.mkdir(parents=True, exist_ok=True)
+except OSError:  # pragma: no cover
+    EVIDENCE_DIR = Path("/tmp/drugenv-evidence")
+    EVIDENCE_DIR.mkdir(parents=True, exist_ok=True)
+METRICS_FILE = EVIDENCE_DIR / "before_after_metrics.json"
+def _env(name: str, default: str) -> str:
+    return os.environ.get(name, default)
+def _detect_gpus() -> int:
+    try:
+        import torch  # type: ignore
+        if torch.cuda.is_available():
+            return torch.cuda.device_count()
+    except Exception:
+        pass
+    try:
+        out = subprocess.run(
+            ["nvidia-smi", "--query-gpu=name", "--format=csv,noheader"],
+            capture_output=True, text=True, timeout=5,
+        )
+        return len([l for l in out.stdout.splitlines() if l.strip()])
+    except Exception:
+        return 0
+_NUM_GPUS = _detect_gpus()
+def _bool_env(name: str, default: str) -> bool:
+    return _env(name, default).strip().lower() in ("1", "true", "yes", "on")
+CONFIG = {
+    "training_backend": _env("TRAINING_BACKEND", "vanilla"),
+    "model_name":       _env("MODEL_NAME", "Qwen/Qwen2.5-3B-Instruct"),
+    "difficulty":       _env("DIFFICULTY", "easy"),
+    "total_episodes":   int(_env("TOTAL_EPISODES", "120")),
+    "max_steps":        int(_env("MAX_STEPS", "20")),
+    "num_generations":  int(_env("NUM_GENERATIONS", "4")),
+    "checkpoint_eval_steps":    int(_env("CHECKPOINT_EVAL_STEPS", "50")),
+    "checkpoint_eval_episodes": int(_env("CHECKPOINT_EVAL_EPISODES", "4")),
+    "eval_episodes":    int(_env("EVAL_EPISODES", "8")),
+    "output_dir":       _env("OUTPUT_DIR", "runs/grpo-output"),
+    "evidence_dir":     _env("EVIDENCE_DIR", "evidence"),
+    "num_gpus":         int(_env("NUM_GPUS", "1")),
+    "hf_username":      _env("HF_USERNAME", "anugrahteesdollar"),
+    "push_repo":        _env(
+        "PUSH_REPO",
+        f"{_env('HF_USERNAME', 'anugrahteesdollar')}/drugenv-grpo-qwen3b",
+    ),
+    "autostart":        _env("AUTOSTART", "0") == "1",
+    # ── SFT warm-start phase (defeats the no-submit avoidance hack
+    # by giving GRPO a non-zero prior over correct trajectories) ─────
+    "sft_warmstart":    _bool_env("SFT_WARMSTART", "true"),
+    "sft_num_episodes": int(_env("SFT_NUM_EPISODES", "200")),
+    "sft_max_steps":    int(_env("SFT_MAX_STEPS", "25")),
+    "sft_epochs":       int(_env("SFT_EPOCHS", "1")),
+    "sft_lr":           float(_env("SFT_LR", "1e-5")),
+    "sft_difficulty":   _env("SFT_DIFFICULTY", "mixed"),
+    "sft_out_dir":      _env("SFT_OUT_DIR", "runs/sft-warmstart"),
+}
+# ── Run state ────────────────────────────────────────────────────────────
+class RunState:
+    def __init__(self) -> None:
+        self.lock = threading.Lock()
+        self.thread: Optional[threading.Thread] = None
+        self.process: Optional[subprocess.Popen] = None
+        self.status: str = "idle"  # idle | running | finished | failed
+        self.started_at: Optional[str] = None
+        self.finished_at: Optional[str] = None
+        self.last_error: Optional[str] = None
+        self.last_config: Dict[str, Any] = {}
+    def to_dict(self) -> Dict[str, Any]:
+        with self.lock:
+            return {
+                "status": self.status,
+                "started_at": self.started_at,
+                "finished_at": self.finished_at,
+                "last_error": self.last_error,
+                "last_config": self.last_config,
+                "expected_hardware": EXPECTED_HARDWARE,
+            }
+STATE = RunState()
+# ── Training pipeline ────────────────────────────────────────────────────
+def _stream_subprocess(cmd: list[str], log_handle) -> int:
+    log_handle.write(f"\n$ {' '.join(cmd)}\n")
+    log_handle.flush()
+    proc = subprocess.Popen(
+        cmd,
+        cwd=str(REPO_ROOT),
+        stdout=subprocess.PIPE,
+        stderr=subprocess.STDOUT,
+        bufsize=1,
+        universal_newlines=True,
+        env={**os.environ, "PYTHONPATH": str(REPO_ROOT)},
+    )
+    STATE.process = proc
+    assert proc.stdout is not None
+    for line in proc.stdout:
+        log_handle.write(line)
+        log_handle.flush()
+    rc = proc.wait()
+    log_handle.write(f"[exit code {rc}]\n")
+    log_handle.flush()
+    STATE.process = None
+    return rc
+def _build_sft_warmstart_cmd(config: Dict[str, Any]) -> list[str]:
+    """Compose the SFT-warm-start subprocess command."""
+    python_bin = "/usr/local/bin/python" if Path("/usr/local/bin/python").exists() else sys.executable
+    return [
+        python_bin, "-m", "training.sft_warmstart",
+        "--out_dir",        config["sft_out_dir"],
+        "--num_episodes",   str(config["sft_num_episodes"]),
+        "--max_steps",      str(config["sft_max_steps"]),
+        "--epochs",         str(config["sft_epochs"]),
+        "--lr",             str(config["sft_lr"]),
+        "--base_model",     config["model_name"],
+        "--difficulty",     config["sft_difficulty"],
+        "--evidence_dir",   config["evidence_dir"],
+    ]
+def _build_training_cmd(config: Dict[str, Any]) -> list[str]:
+    """Compose the selected training launcher.
+    When ``sft_warmstart`` is on, ``model_name`` is expected to already
+    have been overwritten with the SFT output directory by the caller
+    (``_training_pipeline``), so this function never has to know about
+    the SFT phase explicitly — it just trains GRPO from the path
+    is sitting in ``model_name``.
+    """
+    backend = str(config.get("training_backend", "vanilla")).lower()
+    python_bin = "/usr/local/bin/python" if Path("/usr/local/bin/python").exists() else sys.executable
+    common = [
+        "--model-id", config["model_name"],
+        "--evidence-dir", config["evidence_dir"],
+        "--output-dir", config["output_dir"],
+        "--checkpoint-eval-steps", str(config["checkpoint_eval_steps"]),
+        "--checkpoint-eval-episodes", str(config["checkpoint_eval_episodes"]),
+    ]
+    if backend == "vanilla":
+        return [python_bin, "-m", "training.training_script", *common]
+    if backend != "unsloth":
+        raise ValueError(f"unknown TRAINING_BACKEND={backend!r}")
+    cmd = ["-m", "training.training_unsloth", *common]
+    n = max(int(config.get("num_gpus", 1)), 1)
+    if n > 1:
+        return [
+            "accelerate", "launch", "--num_processes", str(n),
+            "--mixed_precision", "bf16",
+        ] + cmd
+    return [python_bin] + cmd
+def _push_model_folder_to_hub(*, output_dir: Path, repo_id: str, base_model: str, log) -> None:
+    """Upload a trained model directory to the Hub (best-effort)."""
+    token = os.environ.get("HF_TOKEN")
+    if not token:
+        log.write("\n[skip] HF_TOKEN not set — model not pushed\n")
+        log.flush()
+        return
+    try:
+        from huggingface_hub import HfApi
+        api = HfApi(token=token)
+        api.create_repo(repo_id=repo_id, repo_type="model", exist_ok=True)
+        api.upload_folder(
+            folder_path=str(output_dir),
+            repo_id=repo_id,
+            repo_type="model",
+            commit_message=f"Upload DrugEnv GRPO model based on {base_model}",
+        )
+        log.write(f"\n[ok] uploaded model → https://huggingface.co/{repo_id}\n")
+        log.flush()
+    except Exception as exc:
+        log.write(f"\n[warn] model push failed: {exc}\n")
+        log.flush()
+def _push_evidence_to_hub(*, evidence_dir: Path, repo_id: str, log) -> None:
+    """Upload the entire evidence/ directory to the model repo (best-effort)."""
+    token = os.environ.get("HF_TOKEN")
+    if not token:
+        log.write("\n[skip] HF_TOKEN not set — evidence not pushed\n")
+        log.flush()
+        return
+    try:
+        from huggingface_hub import HfApi
+        api = HfApi(token=token)
+        api.upload_folder(
+            folder_path=str(evidence_dir),
+            repo_id=repo_id,
+            repo_type="model",
+            path_in_repo="evidence",
+            commit_message="Upload DrugEnv training evidence (curves, evals, plots)",
+        )
+        log.write(f"\n[ok] uploaded evidence/ → https://huggingface.co/{repo_id}/tree/main/evidence\n")
+        log.flush()
+    except Exception as exc:
+        log.write(f"\n[warn] evidence push failed: {exc}\n")
+        log.flush()
+def _training_pipeline(config: Dict[str, Any]) -> None:
+    started = datetime.now(timezone.utc).isoformat()
+    with STATE.lock:
+        STATE.status = "running"
+        STATE.started_at = started
+        STATE.finished_at = None
+        STATE.last_error = None
+        STATE.last_config = dict(config)
+    evidence_dir = Path(config["evidence_dir"]).resolve()
+    evidence_dir.mkdir(parents=True, exist_ok=True)
+    LOG_FILE.parent.mkdir(parents=True, exist_ok=True)
+    with open(LOG_FILE, "a") as log:
+        log.write(f"\n=== Training started {started} ===\n")
+        log.write(json.dumps(config, indent=2) + "\n")
+        log.flush()
+        try:
+            output_dir = config["output_dir"]
+            model_name = config["model_name"]
+            push_repo = config["push_repo"]
+            if config.get("sft_warmstart"):
+                # Phase 1 — SFT warm-start. Produces a *full* causal-LM
+                # checkpoint at config['sft_out_dir'] (LoRA adapters are
+                # merged in by training/sft_warmstart.py) so we can hand
+                # it to GRPO as a drop-in --model-id.
+                sft_out = config["sft_out_dir"]
+                log.write(
+                    f"\n--- SFT warm-start ({config['sft_num_episodes']} oracle "
+                    f"episodes, epochs={config['sft_epochs']}, → {sft_out}) ---\n"
+                )
+                log.flush()
+                sft_rc = _stream_subprocess(_build_sft_warmstart_cmd(config), log)
+                if sft_rc != 0:
+                    raise RuntimeError(f"SFT warm-start failed (rc={sft_rc})")
+                log.write(
+                    f"\n[ok] SFT done; switching GRPO base model "
+                    f"{config['model_name']} → {sft_out}\n"
+                )
+                log.flush()
+                config["model_name"] = sft_out
+            backend = str(config.get("training_backend", "vanilla")).lower()
+            log.write(
+                f"\n--- GRPO training ({backend}, "
+                f"{config['num_gpus']} GPU process(es), expected hardware "
+                f"{EXPECTED_HARDWARE}) ---\n"
+            )
+            log.flush()
+            rc = _stream_subprocess(_build_training_cmd(config), log)
+            if rc != 0:
+                raise RuntimeError(f"training failed (rc={rc})")
+            log.write(
+                "\n--- evidence: training/training_script.save_training_plots already "
+                "ran on train-end via the LiveTrainingCallback; CSVs + PNGs live in "
+                f"{config['evidence_dir']} ---\n"
+            )
+            log.flush()
+            if os.environ.get("HF_TOKEN"):
+                log.write("\n--- push trained model to Hub ---\n")
+                log.flush()
+                _push_model_folder_to_hub(
+                    output_dir=Path(output_dir),
+                    repo_id=push_repo,
+                    base_model=model_name,
+                    log=log,
+                )
+                _push_evidence_to_hub(
+                    evidence_dir=evidence_dir,
+                    repo_id=push_repo,
+                    log=log,
+                )
+            else:
+                log.write("\n[skip] HF_TOKEN not set — not pushing to Hub\n")
+                log.flush()
+            with STATE.lock:
+                STATE.status = "finished"
+        except Exception as exc:
+            logger.exception("training pipeline failed")
+            with STATE.lock:
+                STATE.status = "failed"
+                STATE.last_error = str(exc)
+        finally:
+            finished = datetime.now(timezone.utc).isoformat()
+            log.write(f"\n=== Training ended {finished} ===\n")
+            log.flush()
+            with STATE.lock:
+                STATE.finished_at = finished
+def _start_training(config: Dict[str, Any]) -> None:
+    with STATE.lock:
+        if STATE.status == "running":
+            raise RuntimeError("a training run is already in progress")
+        STATE.thread = threading.Thread(
+            target=_training_pipeline,
+            args=(config,),
+            name="drugenv-trainer",
+            daemon=True,
+        )
+        STATE.thread.start()
+# ── On-demand evidence-PNG synthesis ─────────────────────────────────────
+#
+# When the GRPO loop hasn't had a chance to write training_log.csv /
+# reward_components.csv yet, we still want the dashboard to show
+# *something* meaningful. The on-demand handlers below parse the
+# captured TRL stdout log (training/runs/training.log) for log dicts
+# and synthesise the corresponding PNG on the fly, returning it to the
+# browser with ``Cache-Control: no-store`` so cards refresh smoothly.
+_TQDM_PROGRESS_RE = re.compile(r"\b(\d+)\s*/\s*(\d+)\s*\[")
+def _parse_training_log_dicts(text: str) -> List[Dict[str, Any]]:
+    rows: List[Dict[str, Any]] = []
+    last_step: Optional[int] = None
+    for raw in text.splitlines():
+        m = _TQDM_PROGRESS_RE.search(raw)
+        if m:
+            try:
+                last_step = int(m.group(1))
+            except ValueError:
+                pass
+            continue
+        s = raw.strip()
+        if not (s.startswith("{") and s.endswith("}")):
+            continue
+        if "'loss'" not in s and "'reward'" not in s and "'kl'" not in s:
+            continue
+        try:
+            d = ast.literal_eval(s)
+        except (ValueError, SyntaxError):
+            continue
+        if not isinstance(d, dict):
+            continue
+        reward = (
+            d.get("reward")
+            or d.get("rewards/mean")
+            or d.get("rewards/reward_fn/mean")
+        )
+        reward_std = (
+            d.get("reward_std")
+            or d.get("rewards/std")
+            or d.get("rewards/reward_fn/std")
+        )
+        rows.append({
+            "step": last_step if last_step is not None else len(rows),
+            "loss": d.get("loss"),
+            "reward": reward,
+            "reward_std": reward_std,
+            "kl": d.get("kl"),
+            "grad_norm": d.get("grad_norm"),
+            "learning_rate": d.get("learning_rate"),
+            "epoch": d.get("epoch"),
+            "frac_reward_zero_std": d.get("frac_reward_zero_std"),
+            "completions_mean_length": d.get("completions/mean_length"),
+            "completions_clipped_ratio": d.get("completions/clipped_ratio"),
+        })
+    return rows
+def _try_matplotlib():
+    try:
+        import matplotlib  # type: ignore
+        matplotlib.use("Agg")
+        import matplotlib.pyplot as plt  # type: ignore
+        return plt
+    except Exception as exc:  # pragma: no cover - plotting is best-effort
+        logger.warning("matplotlib unavailable: %s", exc)
+        return None
+def _png_bytes(fig) -> bytes:
+    buf = io.BytesIO()
+    fig.savefig(buf, format="png", dpi=140)
+    return buf.getvalue()
+def _read_log_text() -> Optional[str]:
+    if not LOG_FILE.exists():
+        return None
+    try:
+        return LOG_FILE.read_text(errors="replace")
+    except OSError:
+        return None
+def _synth_training_curve_png() -> Optional[bytes]:
+    text = _read_log_text()
+    if not text:
+        return None
+    rows = _parse_training_log_dicts(text)
+    if not rows:
+        return None
+    plt = _try_matplotlib()
+    if plt is None:
+        return None
+    steps = [r["step"] for r in rows]
+    rewards = [(s, r["reward"]) for s, r in zip(steps, rows) if r["reward"] is not None]
+    losses = [(s, r["loss"]) for s, r in zip(steps, rows) if r["loss"] is not None]
+    fig, axes = plt.subplots(2, 1, figsize=(8, 6), sharex=True)
+    if rewards:
+        axes[0].plot([x for x, _ in rewards], [y for _, y in rewards],
+                     lw=1.6, color="#1d4ed8")
+        axes[0].set_ylabel("mean reward")
+        axes[0].set_title(
+            "DrugEnv GRPO training — reward over steps "
+            f"(synthesised from {len(rewards)} log events)"
+        )
+        axes[0].grid(alpha=0.25)
+    if losses:
+        axes[1].plot([x for x, _ in losses], [y for _, y in losses],
+                     lw=1.6, color="#c026d3")
+        axes[1].set_ylabel("GRPO loss")
+        axes[1].set_xlabel("training step")
+        axes[1].grid(alpha=0.25)
+    fig.tight_layout()
+    try:
+        return _png_bytes(fig)
+    finally:
+        plt.close(fig)
+def _synth_reward_components_png() -> Optional[bytes]:
+    text = _read_log_text()
+    if not text:
+        return None
+    rows = _parse_training_log_dicts(text)
+    if not rows:
+        return None
+    plt = _try_matplotlib()
+    if plt is None:
+        return None
+    steps = [r["step"] for r in rows]
+    rmean = [r.get("reward") for r in rows]
+    rstd = [r.get("reward_std") for r in rows]
+    kls = [r.get("kl") for r in rows]
+    fzero = [r.get("frac_reward_zero_std") for r in rows]
+    clen = [r.get("completions_mean_length") for r in rows]
+    fig, axes = plt.subplots(2, 1, figsize=(8, 6.5), sharex=True)
+    band = [(s, m, sd) for s, m, sd in zip(steps, rmean, rstd) if m is not None]
+    if band:
+        sx = [b[0] for b in band]
+        rm = [b[1] for b in band]
+        rs = [b[2] if b[2] is not None else 0.0 for b in band]
+        axes[0].plot(sx, rm, lw=2.0, color="#0f172a", label="reward (group mean)")
+        axes[0].fill_between(
+            sx,
+            [m - s for m, s in zip(rm, rs)],
+            [m + s for m, s in zip(rm, rs)],
+            alpha=0.18, color="#1d4ed8", label="±1 std (group dispersion)",
+        )
+        axes[0].set_ylabel("reward at logging step")
+        axes[0].set_title(
+            "DrugEnv reward — group mean ± dispersion "
+            "(stdout-derived; install EvidenceCallback for terminal vs shaping split)"
+        )
+        axes[0].grid(alpha=0.25)
+        axes[0].legend(loc="lower right", fontsize=9)
+    kl_pts = [(s, k) for s, k in zip(steps, kls) if k is not None]
+    if kl_pts:
+        axes[1].plot([p[0] for p in kl_pts], [p[1] for p in kl_pts],
+                     lw=1.5, color="#9333ea", label="KL divergence")
+        axes[1].set_ylabel("KL", color="#9333ea")
+    fz_pts = [(s, f) for s, f in zip(steps, fzero) if f is not None]
+    cl_pts = [(s, c) for s, c in zip(steps, clen) if c is not None]
+    if fz_pts or cl_pts:
+        ax2 = axes[1].twinx()
+        if fz_pts:
+            ax2.plot([p[0] for p in fz_pts], [p[1] for p in fz_pts],
+                     "o-", lw=1.0, ms=3, color="#ea580c",
+                     label="frac rollouts with zero-std (saturation)")
+            ax2.set_ylim(-0.02, 1.05)
+        if cl_pts:
+            cmax = max(p[1] for p in cl_pts) or 1.0
+            ax2.plot([p[0] for p in cl_pts], [p[1] / cmax for p in cl_pts],
+                     "x:", lw=1.0, ms=4, color="#16a34a",
+                     label=f"completion mean length / {cmax:.0f}")
+        ax2.set_ylabel("auxiliary (right axis, normalised)", color="#475569")
+        ax2.legend(loc="upper right", fontsize=8)
+    axes[1].set_xlabel("training step")
+    axes[1].grid(alpha=0.25)
+    fig.tight_layout()
+    try:
+        return _png_bytes(fig)
+    finally:
+        plt.close(fig)
+_SYNTH_HANDLERS = {
+    "training_curve.png": _synth_training_curve_png,
+    "reward_components.png": _synth_reward_components_png,
+}
+# ── FastAPI app ──────────────────────────────────────────────────────────
+app = FastAPI(title="DrugEnv Trainer", version="0.1.0")
+_HTML = """\
+<!doctype html>
+<html lang=en>
+<head>
+  <meta charset=utf-8>
+  <title>DrugEnv Trainer</title>
+  <meta http-equiv="refresh" content="60">
+  <style>
+    body { font-family: ui-sans-serif, system-ui, sans-serif; margin: 2rem auto;
+           max-width: 1000px; color:#111; padding: 0 1rem; line-height:1.5 }
+    h1 { margin-bottom: 0 }
+    h2 { margin-top: 2rem; border-bottom:1px solid #eee; padding-bottom:.25rem }
+    .muted { color:#666 }
+    pre { background:#0e1116; color:#e6edf3; padding:1rem; border-radius:6px;
+          overflow-x:auto; max-height:40vh; font-size:.85em }
+    button { font-size:1rem; padding:.6rem 1rem; border-radius:6px; border:1px solid #888;
+             background:#fff; cursor:pointer; margin-right:.4rem }
+    .pill { display:inline-block; padding:.1rem .55rem; border-radius:999px;
+            background:#eef; color:#225; font-size:.85em }
+    .ok { background:#dfd; color:#272 }
+    .fail { background:#fdd; color:#822 }
+    .run { background:#fdf6d8; color:#774 }
+    table { border-collapse:collapse; margin:.5rem 0 }
+    td, th { padding:.25rem .8rem .25rem 0; vertical-align: top; text-align:left }
+    th { color:#444; font-weight:600 }
+    .grid { display:grid; grid-template-columns:1fr 1fr; gap:1rem }
+    .card { border:1px solid #e5e7eb; border-radius:8px; padding:.75rem; background:#fafafa }
+    .card img { max-width:100%; border-radius:4px }
+    .delta-pos { color:#15803d; font-weight:600 }
+    .delta-neg { color:#b91c1c; font-weight:600 }
+    code { background:#f4f4f4; padding:.05rem .35rem; border-radius:4px }
+    a { color:#1d4ed8 }
+  </style>
+</head>
+<body>
+  <h1>🧬 DrugEnv Trainer</h1>
+  <p class=muted>GRPO + LoRA on the DrugEnv drug-target-validation environment.
+  Expected hardware: <code>__HW__</code>. H200 ≈ 4× A100 throughput,
+  ~$0.05–0.10 per step on small models like Qwen2.5-3B.</p>
+  <h2>Run status</h2>
+  <p>Status: <span id=status class=pill>?</span></p>
+  <table id=meta></table>
+  <p>
+    <button onclick="startRun()">▶ Start training</button>
+    <button onclick="refresh()">↻ Refresh</button>
+    <a href="/evidence" target=_blank><button>📁 Evidence index</button></a>
+    <a href="/docs" target=_blank><button>🛠 API</button></a>
+  </p>
+  <h2>Training-progress evidence</h2>
+  <p class=muted>Auto-updated as training runs. All artifacts are also saved to <code>evidence/</code> and pushed to the model repo on the Hub.</p>
+  <div class=grid>
+    <div class=card><b>Per-step training curve</b><br>
+      <img id=curve src="/evidence/training_curve.png" onerror="this.style.display='none'">
+      <div id=curve_missing class=muted style="display:none">(not yet — waiting for first GRPO step)</div>
+    </div>
+    <div class=card><b>Reward components (terminal vs shaping)</b><br>
+      <img id=components src="/evidence/reward_components.png" onerror="this.style.display='none'">
+      <div id=components_missing class=muted style="display:none">(populated after a few rollouts — watches verifier hacks)</div>
+    </div>
+    <div class=card><b>Mid-training checkpoint progression</b><br>
+      <img id=ckpt src="/evidence/checkpoint_progression.png" onerror="this.style.display='none'">
+      <div id=ckpt_missing class=muted style="display:none">(not yet — waiting for first checkpoint eval)</div>
+    </div>
+    <div class=card><b>Before vs after summary</b><br>
+      <img id=summary src="/evidence/before_after_summary.png" onerror="this.style.display='none'">
+      <div id=summary_missing class=muted style="display:none">(generated after post-train eval)</div>
+    </div>
+    <div class=card><b>Reward distribution: pre vs post</b><br>
+      <img id=dist src="/evidence/reward_distribution.png" onerror="this.style.display='none'">
+      <div id=dist_missing class=muted style="display:none">(generated after post-train eval)</div>
+    </div>
+    <div class=card><b>Decision accuracy progression</b><br>
+      <img id=decision src="/evidence/checkpoint_progression.png" onerror="this.style.display='none'">
+      <div id=decision_missing class=muted style="display:none">(progression chart includes decision accuracy line)</div>
+    </div>
+    <div class=card><b>Evidence coverage progression</b><br>
+      <img id=coverage src="/evidence/checkpoint_progression.png" onerror="this.style.display='none'">
+      <div id=coverage_missing class=muted style="display:none">(progression chart includes evidence coverage line)</div>
+    </div>
+    <div class=card><b>Warm-start (SFT)</b><br>
+      <div id=sft_card class=muted>(SFT_WARMSTART disabled — set the env var to enable)</div>
+    </div>
+  </div>
+  <h2>Before / after metrics</h2>
+  <table id=metrics_table>
+    <tr><th>metric</th><th>pre</th><th>post</th><th>Δ</th></tr>
+  </table>
+  <h2>Live logs (tail)</h2>
+  <pre id=logs>loading…</pre>
+<script>
+function fmt(v) {
+  if (v == null) return '–';
+  if (typeof v === 'number') return v.toFixed(3);
+  return v;
+}
+function fmtDelta(d) {
+  if (d == null || isNaN(d)) return '–';
+  const sign = d >= 0 ? '+' : '';
+  const cls = d >= 0 ? 'delta-pos' : 'delta-neg';
+  return `<span class="${cls}">${sign}${d.toFixed(3)}</span>`;
+}
+async function refresh() {
+  const s = await fetch('/status').then(r => r.json());
+  const pill = document.getElementById('status');
+  pill.textContent = s.status;
+  pill.className = 'pill ' + ({idle:'',running:'run',finished:'ok',failed:'fail'}[s.status] || '');
+  const meta = document.getElementById('meta');
+  meta.innerHTML = '';
+  const obj = {
+    started_at: s.started_at, finished_at: s.finished_at, error: s.last_error,
+    expected_hardware: s.expected_hardware,
+    ...(s.last_config || {}),
+  };
+  for (const [k, v] of Object.entries(obj)) {
+    if (v == null || v === '') continue;
+    const tr = document.createElement('tr');
+    tr.innerHTML = `<td><b>${k}</b></td><td><code>${v}</code></td>`;
+    meta.appendChild(tr);
+  }
+  const m = await fetch('/metrics').then(r => r.json()).catch(() => ({pre:null, post:null}));
+  const tbody = document.getElementById('metrics_table');
+  tbody.innerHTML = '<tr><th>metric</th><th>pre</th><th>post</th><th>Δ</th></tr>';
+  const fields = [
+    'mean_reward', 'success_rate', 'decision_accuracy_rate',
+    'evidence_coverage_rate', 'median_reward',
+  ];
+  for (const f of fields) {
+    const pre = m.pre && m.pre[f];
+    const post = m.post && m.post[f];
+    const delta = m.delta && m.delta[f];
+    const tr = document.createElement('tr');
+    tr.innerHTML = `<td><code>${f}</code></td><td>${fmt(pre)}</td><td>${fmt(post)}</td><td>${fmtDelta(delta)}</td>`;
+    tbody.appendChild(tr);
+  }
+  const bust = '?t=' + Date.now();
+  for (const [imgId, missingId] of [
+    ['curve', 'curve_missing'],
+    ['components', 'components_missing'],
+    ['ckpt', 'ckpt_missing'],
+    ['summary', 'summary_missing'],
+    ['dist', 'dist_missing'],
+    ['decision', 'decision_missing'],
+    ['coverage', 'coverage_missing'],
+  ]) {
+    const img = document.getElementById(imgId);
+    const miss = document.getElementById(missingId);
+    const baseSrc = img.getAttribute('src').split('?')[0];
+    const probe = new Image();
+    probe.onload  = () => { img.src = baseSrc + bust; img.style.display=''; miss.style.display='none'; };
+    probe.onerror = () => { img.style.display='none'; miss.style.display=''; };
+    probe.src = baseSrc + bust;
+  }
+  const sft_resp = await fetch('/sft_summary');
+  const sft_card = document.getElementById('sft_card');
+  if (sft_resp.ok) {
+    try {
+      const sft = await sft_resp.json();
+      sft_card.classList.remove('muted');
+      sft_card.innerHTML =
+        `<table>` +
+        `<tr><td><b>final loss</b></td><td><code>${fmt(sft.final_loss)}</code></td></tr>` +
+        `<tr><td><b>oracle success</b></td><td><code>${fmt(sft.oracle_success_rate)}</code></td></tr>` +
+        `<tr><td><b>transitions trained</b></td><td><code>${sft.num_train_rows ?? '–'}</code></td></tr>` +
+        `<tr><td><b>duration</b></td><td><code>${fmt(sft.duration_s)} s</code></td></tr>` +
+        `<tr><td><b>base → SFT dir</b></td><td><code>${sft.base_model} → ${sft.out_dir}</code></td></tr>` +
+        `</table>`;
+    } catch (e) { /* keep placeholder */ }
+  }
+  const logs = await fetch('/logs?tail=200').then(r => r.text());
+  document.getElementById('logs').textContent = logs || '(no logs yet)';
+}
+async function startRun() {
+  const r = await fetch('/train', {method:'POST'});
+  if (!r.ok) alert((await r.json()).detail || 'failed');
+  setTimeout(refresh, 500);
+}
+refresh();
+setInterval(refresh, 5000);
+</script>
+</body>
+</html>
+""".replace("__HW__", EXPECTED_HARDWARE)
+@app.get("/", response_class=HTMLResponse)
+def index() -> HTMLResponse:
+    return HTMLResponse(_HTML)
+@app.get("/health")
+def health() -> Dict[str, str]:
+    return {"status": "ok"}
+@app.get("/status")
+def status() -> JSONResponse:
+    return JSONResponse(STATE.to_dict())
+@app.get("/metrics")
+def metrics() -> JSONResponse:
+    if METRICS_FILE.exists():
+        try:
+            return JSONResponse(json.loads(METRICS_FILE.read_text()))
+        except Exception:
+            return JSONResponse({"error": "metrics file unreadable"}, status_code=500)
+    return JSONResponse({"pre": None, "post": None, "delta": None})
+@app.get("/sft_summary")
+def sft_summary() -> JSONResponse:
+    """Return the SFT warm-start summary if it exists.
+    Powers the dashboard's "Warm-start (SFT)" card.
+    """
+    path = EVIDENCE_DIR / "sft_summary.json"
+    if path.exists():
+        try:
+            return JSONResponse(json.loads(path.read_text()))
+        except Exception:
+            return JSONResponse({"error": "sft_summary unreadable"}, status_code=500)
+    return JSONResponse({}, status_code=404)
+@app.get("/evidence")
+def evidence_index() -> JSONResponse:
+    """List every evidence artifact currently on disk."""
+    files = []
+    if EVIDENCE_DIR.exists():
+        for p in sorted(EVIDENCE_DIR.iterdir()):
+            if p.is_file():
+                files.append({
+                    "name": p.name,
+                    "size": p.stat().st_size,
+                    "url": f"/evidence/{p.name}",
+                })
+    return JSONResponse({"dir": str(EVIDENCE_DIR), "files": files})
+@app.get("/evidence/{name}")
+def evidence_file(name: str):
+    """Serve a single evidence artifact (PNG/CSV/JSON/MD) by filename.
+    For ``training_curve.png`` and ``reward_components.png`` we fall back
+    to on-demand synthesis from the captured TRL stdout log when the
+    underlying file does not yet exist on disk.
+    """
+    if "/" in name or ".." in name:
+        raise HTTPException(status_code=400, detail="invalid name")
+    target = EVIDENCE_DIR / name
+    if target.exists() and target.is_file():
+        return FileResponse(target)
+    handler = _SYNTH_HANDLERS.get(name)
+    if handler is not None:
+        try:
+            png = handler()
+        except Exception as exc:  # pragma: no cover - synthesis is best-effort
+            logger.warning("on-demand synthesis of %s failed: %s", name, exc)
+            png = None
+        if png:
+            return Response(
+                content=png,
+                media_type="image/png",
+                headers={"Cache-Control": "no-store, max-age=0"},
+            )
+    raise HTTPException(status_code=404, detail=f"{name} not found")
+@app.get("/logs", response_class=PlainTextResponse)
+def logs(tail: int = 400) -> PlainTextResponse:
+    if not LOG_FILE.exists():
+        return PlainTextResponse("")
+    text = LOG_FILE.read_text()
+    lines = text.splitlines()
+    return PlainTextResponse("\n".join(lines[-max(tail, 1):]))
+@app.post("/train")
+async def train(request: Request) -> JSONResponse:
+    """Start a training run.
+    The request body (JSON) is merged into the global ``CONFIG`` for
+    *this* run only, so future API-only triggers can flip
+    ``sft_warmstart`` (or any other config key) without redeploying
+    the Space.
+    """
+    overrides: Dict[str, Any] = {}
+    try:
+        body = await request.body()
+        if body:
+            overrides = json.loads(body)
+            if not isinstance(overrides, dict):
+                raise ValueError("request body must be a JSON object")
+    except (ValueError, json.JSONDecodeError) as exc:
+        raise HTTPException(status_code=400, detail=f"bad request body: {exc}")
+    cfg = dict(CONFIG)
+    cfg.update(overrides)
+    try:
+        _start_training(cfg)
+    except RuntimeError as exc:
+        raise HTTPException(status_code=409, detail=str(exc))
+    return JSONResponse({"status": "started", "config": cfg})
+@app.on_event("startup")
+def _maybe_autostart() -> None:
+    if CONFIG["autostart"]:
+        try:
+            _start_training(dict(CONFIG))
+            logger.info("autostarted training run")
+        except RuntimeError as exc:
+            logger.warning("autostart skipped: %s", exc)

space/training/requirements.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ -r ../../requirements-train.txt