Spaces:

modelbuilderhq
/

pharma-vigilance

Sleeping

App Files Files Community

modelbuilderhq commited on 29 days ago

Commit

f2beac3

verified ·

1 Parent(s): 2a68468

Upload folder using huggingface_hub

Browse files

Files changed (25) hide show

Dockerfile +73 -0
README.md +244 -12
__init__.py +23 -0
agent.py +179 -0
client.py +131 -0
data.py +136 -0
env.py +116 -0
graders.py +33 -0
inference.py +213 -0
models.py +52 -0
openenv.yaml +113 -0
pyproject.toml +24 -0
requirements.txt +6 -0
run_demo.py +23 -0
server.py +50 -0
server/__init__.py +11 -0
server/app.py +50 -0
server/graders.py +179 -0
server/pharma_vigilance_env_environment.py +5 -0
server/requirements.txt +6 -0
server/tasks.py +27 -0
tasks.py +222 -0
tests/test_env.py +132 -0
uv.lock +0 -0
validate-submission.sh +185 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,73 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+# Multi-stage build using openenv-base
+# This Dockerfile is flexible and works for both:
+# - In-repo environments (with local OpenEnv sources)
+# - Standalone environments (with openenv from PyPI/Git)
+# The build script (openenv build) handles context detection and sets appropriate build args.
+ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
+FROM ${BASE_IMAGE} AS builder
+WORKDIR /app
+# Ensure git is available (required for installing dependencies from VCS)
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends git && \
+    rm -rf /var/lib/apt/lists/*
+# Build argument to control whether we're building standalone or in-repo
+ARG BUILD_MODE=in-repo
+ARG ENV_NAME=pharma_vigilance_env
+# Copy environment code (always at root of build context)
+COPY . /app/env
+# For in-repo builds, openenv is already vendored in the build context
+# For standalone builds, openenv will be installed via pyproject.toml
+WORKDIR /app/env
+# Ensure uv is available (for local builds where base image lacks it)
+RUN if ! command -v uv >/dev/null 2>&1; then \
+        curl -LsSf https://astral.sh/uv/install.sh | sh && \
+        mv /root/.local/bin/uv /usr/local/bin/uv && \
+        mv /root/.local/bin/uvx /usr/local/bin/uvx; \
+    fi
+# Install dependencies into a local virtualenv using the repo requirements file.
+RUN --mount=type=cache,target=/root/.cache/uv \
+    uv venv .venv && \
+    . .venv/bin/activate && \
+    uv pip install -r requirements.txt
+# Final runtime stage
+FROM ${BASE_IMAGE}
+WORKDIR /app
+# Copy the virtual environment from builder
+COPY --from=builder /app/env/.venv /app/.venv
+# Copy the environment code
+COPY --from=builder /app/env /app/env
+# Set PATH to use the virtual environment
+ENV PATH="/app/.venv/bin:$PATH"
+# Set PYTHONPATH so imports work correctly
+ENV PYTHONPATH="/app/env:$PYTHONPATH"
+ENV ENABLE_WEB_INTERFACE=true
+EXPOSE 7860
+# Health check
+HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
+    CMD curl -f http://localhost:7860/health || exit 1
+# Run the FastAPI server
+# The module path is constructed to work with this repo's package layout.
+CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 7860"]

README.md CHANGED Viewed

@@ -1,12 +1,244 @@
----
-title: Pharma Vigilance
-emoji: 🌖
-colorFrom: green
-colorTo: purple
-sdk: docker
-pinned: false
-license: mit
-short_description: OpenEnv pharmacovigilance signal detection environment
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+---
+title: Pharmacovigilance Signal Detector
+colorFrom: blue
+colorTo: green
+sdk: docker
+app_port: 7860
+pinned: false
+license: mit
+short_description: OpenEnv pharmacovigilance signal detection environment
+tags:
+  - openenv
+  - healthcare
+  - pharmacovigilance
+  - safety
+  - real-world
+base_path: /web
+---
+# Pharmacovigilance Signal Detector
+`Pharmacovigilance Signal Detector` is a real-world OpenEnv environment where an agent acts like a drug-safety analyst. The agent reviews synthetic adverse event reports, uses a hardcoded drug interaction knowledge base, and decides whether the case is a new safety signal, a known side effect, or low-value noise. This mirrors pharmacovigilance triage work performed by regulators and pharmaceutical safety teams.
+All case data in this repo is synthetic. No real patient data is used.
+## Why This Environment Matters
+Pharmacovigilance teams are responsible for detecting harmful safety patterns after a drug is already on the market. That work is operationally important, high-stakes, and difficult: analysts must distinguish expected reactions from true emerging risks, recognize confounding from polypharmacy, and escalate only when justified. This makes the domain a strong fit for agent evaluation because it tests causal reasoning, prioritization, and safety-sensitive decision making.
+## Environment Overview
+| Item | Value |
+|---|---|
+| Environment name | `pharma-vigilance` |
+| Domain | Pharmacovigilance / drug safety triage |
+| Episode length | 1 step per task |
+| Task count | 3 |
+| Difficulties | Easy, Medium, Hard |
+| Reward range | `0.0` to `1.0` |
+| API | `reset()`, `step()`, `state()` |
+| Server | FastAPI |
+The agent receives one final-decision task per episode. Each task includes one or more synthetic reports plus a hardcoded drug interaction database. The environment never exposes ground truth to the agent.
+## Action Space
+| Field | Type | Allowed values | Purpose |
+|---|---|---|---|
+| `classification` | `str` | `new_signal`, `known_side_effect`, `noise`, `duplicate` | Overall pharmacovigilance judgment |
+| `suspect_drug` | `str` | Free text | Drug or interaction the agent believes is causal |
+| `severity_assessment` | `str` | `mild`, `moderate`, `severe`, `critical` | Clinical severity assessment |
+| `recommended_action` | `str` | `escalate`, `log_and_monitor`, `dismiss`, `request_more_info` | Operational follow-up |
+| `reasoning` | `str` | Free text | Short explanation used for grading bonus on hard task |
+## Observation Space
+| Field | Type | Description |
+|---|---|---|
+| `task_id` | `str` | Current task identifier |
+| `reports` | `List[AdverseEventReport]` | Synthetic adverse event reports for the task |
+| `drug_interaction_db` | `dict` | Hardcoded safety and interaction hints |
+| `step_number` | `int` | Current step index |
+| `max_steps` | `int` | Maximum number of steps in the episode |
+| `feedback` | `Optional[str]` | Feedback message after the previous action |
+Each `AdverseEventReport` contains:
+| Field | Description |
+|---|---|
+| `report_id` | Unique synthetic report identifier |
+| `patient_age` | Patient age |
+| `patient_sex` | Patient sex |
+| `drugs` | All drugs the patient was taking |
+| `suspect_drug` | Drug named by the original reporter |
+| `reaction` | Observed adverse reaction |
+| `onset_days` | Days after drug start when reaction began |
+| `severity` | Reported severity |
+| `outcome` | Recovery status |
+| `similar_reports_last_30d` | Count of similar recent reports |
+## Tasks
+| Task | Difficulty | Scenario | Ground-truth goal | Expected baseline |
+|---|---|---|---|---|
+| `known_signal_easy` | Easy | Patient on `Lisinopril` develops persistent dry cough with many similar recent reports already known in-label | Recognize a known side effect and recommend `log_and_monitor` | Around `0.85` |
+| `cluster_signal_medium` | Medium | Four recent `Cardiovexa` cases show symptomatic bradycardia and near-syncope despite no labeled rhythm toxicity | Recognize a plausible emerging signal and `escalate` | Around `0.65` |
+| `confounded_hard` | Hard | Transplant patient with acute kidney injury is blamed on `Trimethoprim-sulfamethoxazole`, but the deeper issue is a `Voriconazole`-`Tacrolimus` interaction | Detect the interaction, classify as `new_signal`, and `escalate` | Around `0.40` |
+The hard task is intentionally more difficult because the named suspect drug is not the true cause. The agent must reason over interaction evidence and therapeutic drug-monitoring clues in the provided hardcoded drug database.
+## Reward Function
+The environment uses deterministic programmatic graders.
+| Reward component | Value |
+|---|---|
+| Correct `classification` | `+0.25` |
+| Correct `suspect_drug` | `+0.25` |
+| Correct `severity_assessment` | `+0.25` |
+| Correct `recommended_action` | `+0.25` |
+| False alarm penalty: agent says `new_signal` when truth is `noise` | `-0.10` |
+| Missed signal penalty: agent says `noise` when truth is `new_signal` | `-0.20` |
+| Hard-task reasoning bonus if explanation mentions `drug interaction`, `tacrolimus`, `voriconazole`, `azole`, `calcineurin`, or `level monitoring` | `+0.15` |
+Notes:
+- Final reward is clamped to `[0.0, 1.0]`.
+- `suspect_drug` matching is forgiving for the hard task and allows substring matches.
+- The environment is deterministic and reproducible because all tasks and grading logic are hardcoded.
+## Project Structure
+| Path | Purpose |
+|---|---|
+| `env.py` | Main environment class and Pydantic models |
+| `tasks.py` | Task definitions and grader functions |
+| `data.py` | Synthetic reports and drug interaction database |
+| `server.py` | Root FastAPI entrypoint |
+| `server/app.py` | OpenEnv-compatible app entrypoint |
+| `inference.py` | Baseline inference runner |
+| `openenv.yaml` | OpenEnv metadata |
+| `Dockerfile` | Multi-stage OpenEnv-style container build |
+| `tests/test_env.py` | Local tests |
+| `validate-submission.sh` | Pre-submission validation helper |
+## Running Locally
+### Option 1: Local virtual environment
+If you already created the local virtual environment in this repo:
+```powershell
+.\.venv\Scripts\Activate.ps1
+```
+Install dependencies if needed:
+```bash
+pip install -r requirements.txt
+```
+Start the server:
+```bash
+uvicorn server:app --host 0.0.0.0 --port 7860
+```
+### Option 2: Docker
+Build the image:
+```bash
+docker build -t pharmacovigilance-env .
+```
+Run the container:
+```bash
+docker run -p 7860:7860 pharmacovigilance-env
+```
+The health endpoint will be available at:
+```text
+http://localhost:7860/health
+```
+## API Endpoints
+| Method | Endpoint | Description |
+|---|---|---|
+| `POST` | `/reset` | Starts a task and returns the initial observation |
+| `POST` | `/step` | Submits the final agent action and returns observation, reward, done, info |
+| `GET` | `/state` | Returns internal environment state summary |
+| `GET` | `/tasks` | Lists available task ids |
+| `GET` | `/health` | Health check endpoint |
+## Baseline Inference Script
+The required baseline runner is `inference.py`.
+It:
+- reads `API_BASE_URL`, `MODEL_NAME`, `HF_TOKEN`, and optional `ENV_URL`
+- uses the OpenAI client for all model calls
+- runs all three tasks sequentially
+- emits the required `[START]`, `[STEP]`, and `[END]` lines
+- keeps stdout restricted to the judge-expected line types
+Required environment variables:
+```bash
+export API_BASE_URL=https://router.huggingface.co/v1
+export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
+export HF_TOKEN=hf_your_token_here
+export ENV_URL=http://localhost:7860
+```
+Run:
+```bash
+python inference.py
+```
+## Testing And Validation
+Run local tests:
+```bash
+pytest tests/test_env.py -q
+```
+Run OpenEnv validation:
+```bash
+openenv validate
+```
+Run the pre-submission helper:
+```bash
+chmod +x validate-submission.sh
+./validate-submission.sh https://your-space.hf.space
+```
+That script checks:
+1. your Hugging Face Space responds to `POST /reset`
+2. the Docker image builds
+3. `openenv validate` passes
+## Submission Checklist
+- `openenv validate` passes
+- `docker build` succeeds
+- `docker run` starts cleanly
+- `POST /reset` returns HTTP `200`
+- `inference.py` runs all 3 tasks successfully
+- your Hugging Face Space responds to `POST /reset`
+- replace the expected baseline values with your measured live baseline values before final submission
+## Notes
+- No external API calls are made by the environment itself.
+- The drug interaction database is hardcoded.
+- Ground truth is never exposed in the observation returned to the agent.
+- The environment is lightweight enough for a 2 vCPU / 8GB RAM target.
+- The expected baseline scores in this README are planning targets until replaced with measured live results.

__init__.py ADDED Viewed

	@@ -0,0 +1,23 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Pharmacovigilance Signal Detector Environment."""
+try:
+    from .client import PharmaVigilanceEnvClient
+    from .models import PharmaAction, PharmaObservation, PharmaReward
+except ImportError:
+    PharmaVigilanceEnvClient = None
+    from env import Action as PharmaAction
+    from env import Observation as PharmaObservation
+    from env import Reward as PharmaReward
+__all__ = [
+    "PharmaVigilanceEnvClient",
+    "PharmaAction",
+    "PharmaObservation",
+    "PharmaReward",
+]

agent.py ADDED Viewed

	@@ -0,0 +1,179 @@

+import json
+import os
+import sys
+from typing import Optional
+from openai import OpenAI
+try:
+    from .env import Action
+except ImportError:
+    from env import Action
+_cached_client: Optional[OpenAI] = None
+_cached_model = os.environ.get("MODEL_NAME", "Qwen/Qwen2.5-72B-Instruct")
+def _maybe_get_client() -> Optional[OpenAI]:
+    global _cached_client
+    if _cached_client is not None:
+        return _cached_client
+    base_url = os.environ.get("API_BASE_URL", "").strip()
+    api_key = os.environ.get("HF_TOKEN") or os.environ.get("API_KEY") or "hf-missing-token"
+    if not base_url:
+        print(
+            "[WARN] API_BASE_URL is not configured; AnalystAgent will use heuristic mode.",
+            file=sys.stderr,
+        )
+        return None
+    _cached_client = OpenAI(base_url=base_url, api_key=api_key)
+    return _cached_client
+class AnalystAgent:
+    """
+    Lightweight pharmacovigilance agent for demos and smoke testing.
+    The agent can call an OpenAI-compatible chat endpoint when configured, but
+    it also has a deterministic fallback policy for offline or local use.
+    """
+    def __init__(self) -> None:
+        self.review_memory: list[dict] = []
+    def _case_snapshot(self, observation) -> str:
+        report_lines = []
+        for report in observation.reports:
+            report_lines.append(
+                f"- {report.report_id}: suspect={report.suspect_drug}, "
+                f"reaction={report.reaction}, onset_days={report.onset_days}, "
+                f"severity={report.severity}, outcome={report.outcome}, "
+                f"similar_30d={report.similar_reports_last_30d}"
+            )
+        memory_block = ""
+        if self.review_memory:
+            memory_block = "\nRecent mistakes to avoid:\n"
+            for item in self.review_memory[-3:]:
+                memory_block += (
+                    f"- On {item['task_id']} you underperformed after choosing "
+                    f"{item['classification']} / {item['recommended_action']}. "
+                    f"Reason note: {item['note']}\n"
+                )
+        return (
+            f"Task id: {observation.task_id}\n"
+            f"Reports:\n" + "\n".join(report_lines) + "\n"
+            f"Knowledge base:\n{json.dumps(observation.drug_interaction_db, ensure_ascii=True, indent=2)}"
+            f"{memory_block}"
+        )
+    def _build_prompt(self, observation) -> str:
+        return f"""You are a pharmacovigilance case assessor.
+Review the case below and return one JSON object only.
+Return fields:
+- classification: one of new_signal, known_side_effect, noise, duplicate
+- suspect_drug: likely causal drug or interaction
+- severity_assessment: one of mild, moderate, severe, critical
+- recommended_action: one of escalate, log_and_monitor, dismiss, request_more_info
+- reasoning: concise mechanistic explanation
+Decision principles:
+- Repeated known labeled reactions should usually be known_side_effect
+- Small but coherent post-marketing clusters on a newer drug can justify new_signal
+- If the reporter blames the wrong medication, prefer the stronger causal interaction
+- Missing a serious signal is worse than overcalling a weak case
+Case:
+{self._case_snapshot(observation)}
+"""
+    def _llm_decision(self, observation) -> Optional[Action]:
+        client = _maybe_get_client()
+        if client is None:
+            return None
+        try:
+            response = client.chat.completions.create(
+                model=_cached_model,
+                messages=[{"role": "user", "content": self._build_prompt(observation)}],
+                temperature=0.0,
+                max_tokens=220,
+            )
+            raw = (response.choices[0].message.content or "").strip()
+            payload = json.loads(raw)
+            return Action(**payload)
+        except Exception as exc:
+            print(f"[WARN] AnalystAgent LLM path failed: {exc}; falling back to heuristics.", file=sys.stderr)
+            return None
+    def _heuristic_decision(self, observation) -> Action:
+        reports = observation.reports
+        report_count = len(reports)
+        report = reports[0]
+        reaction_blob = " ".join(item.reaction.lower() for item in reports)
+        db_blob = json.dumps(observation.drug_interaction_db).lower()
+        if "dry cough" in reaction_blob and "ace inhibitor" in db_blob:
+            return Action(
+                classification="known_side_effect",
+                suspect_drug="Lisinopril",
+                severity_assessment="mild",
+                recommended_action="log_and_monitor",
+                reasoning="Persistent dry cough is a classic labeled ACE inhibitor effect.",
+            )
+        if report_count >= 3 and ("brady" in reaction_blob or "syncope" in reaction_blob):
+            return Action(
+                classification="new_signal",
+                suspect_drug="Cardiovexa",
+                severity_assessment="severe",
+                recommended_action="escalate",
+                reasoning="A coherent cluster of bradycardia reports on a recently launched drug warrants escalation.",
+            )
+        if "tacrolimus" in db_blob and "voriconazole" in db_blob:
+            return Action(
+                classification="new_signal",
+                suspect_drug="Tacrolimus+Voriconazole",
+                severity_assessment="critical",
+                recommended_action="escalate",
+                reasoning="This looks like a tacrolimus exposure interaction requiring urgent escalation and level review.",
+            )
+        fallback_severity = report.severity if report.severity in {"mild", "moderate", "severe", "critical"} else "moderate"
+        return Action(
+            classification="new_signal",
+            suspect_drug=report.suspect_drug,
+            severity_assessment=fallback_severity,
+            recommended_action="request_more_info",
+            reasoning="The case is ambiguous, so additional information is needed before final triage.",
+        )
+    def act(self, observation) -> Action:
+        llm_action = self._llm_decision(observation)
+        if llm_action is not None:
+            return llm_action
+        return self._heuristic_decision(observation)
+    def learn(self, action: Action, observation) -> None:
+        reward = getattr(observation, "reward", 0.0)
+        if reward is None:
+            reward = 0.0
+        if reward < 0.5:
+            self.review_memory.append(
+                {
+                    "task_id": getattr(observation, "task_id", "unknown"),
+                    "classification": action.classification,
+                    "recommended_action": action.recommended_action,
+                    "note": getattr(observation, "feedback", "") or "weak outcome",
+                }
+            )

client.py ADDED Viewed

	@@ -0,0 +1,131 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Pharmacovigilance Signal Detector Environment Client."""
+from typing import Dict
+from openenv.core import EnvClient
+from openenv.core.client_types import StepResult
+from openenv.core.env_server.types import State
+try:
+    from .env import Action, Observation, AdverseEventReport
+except ImportError:
+    from env import Action, Observation, AdverseEventReport
+class PharmaVigilanceEnvClient(
+    EnvClient[Action, Observation, State]
+):
+    """
+    Client for the Pharmacovigilance Signal Detector environment.
+    This client maintains a persistent connection to the environment server and
+    parses server responses into strongly-typed observation models.
+    Example:
+        >>> with PharmaVigilanceEnvClient(base_url="http://localhost:7860") as env:
+        ...     result = env.reset(task_id="known_signal_easy")
+        ...     print(result.observation.task_id)
+        ...
+        ...     action = Action(
+        ...         classification="known_side_effect",
+        ...         suspect_drug="Ibuprofen",
+        ...         severity_assessment="moderate",
+        ...         recommended_action="log_and_monitor",
+        ...         reasoning="GI bleeding is a known ibuprofen adverse effect.",
+        ...     )
+        ...     result = env.step(action)
+        ...     print(result.observation.feedback)
+        ...     print(result.reward)
+    Example with Docker:
+        >>> client = PharmaVigilanceEnvClient.from_docker_image("pharmacovigilance-env:latest")
+        >>> try:
+        ...     result = client.reset(task_id="cluster_signal_medium")
+        ...     action = Action(
+        ...         classification="new_signal",
+        ...         suspect_drug="Gliptozin",
+        ...         severity_assessment="severe",
+        ...         recommended_action="escalate",
+        ...         reasoning="Clustered vision loss on a new drug warrants escalation.",
+        ...     )
+        ...     result = client.step(action)
+        ... finally:
+        ...     client.close()
+    """
+    def _step_payload(self, action: Action) -> Dict:
+        """
+        Convert an Action model into the JSON payload sent to /step.
+        Args:
+            action: Typed agent action.
+        Returns:
+            Dictionary representation suitable for JSON transport.
+        """
+        return {
+            "classification": action.classification,
+            "suspect_drug": action.suspect_drug,
+            "severity_assessment": action.severity_assessment,
+            "recommended_action": action.recommended_action,
+            "reasoning": action.reasoning,
+        }
+    def _parse_result(self, payload: Dict) -> StepResult[Observation]:
+        """
+        Parse a server /step response into StepResult[Observation].
+        Args:
+            payload: JSON response from the environment server.
+        Returns:
+            StepResult containing the typed observation, reward, and done flag.
+        """
+        obs_data = payload.get("observation", {})
+        reports = [
+            AdverseEventReport(**report)
+            for report in obs_data.get("reports", [])
+        ]
+        observation = Observation(
+            task_id=obs_data.get("task_id", ""),
+            reports=reports,
+            drug_interaction_db=obs_data.get("drug_interaction_db", {}),
+            step_number=obs_data.get("step_number", 0),
+            max_steps=obs_data.get("max_steps", 1),
+            feedback=obs_data.get("feedback"),
+        )
+        reward_payload = payload.get("reward", 0.0)
+        reward_total = (
+            reward_payload.get("total", 0.0)
+            if isinstance(reward_payload, dict)
+            else reward_payload
+        )
+        return StepResult(
+            observation=observation,
+            reward=reward_total,
+            done=payload.get("done", False),
+        )
+    def _parse_state(self, payload: Dict) -> State:
+        """
+        Parse the /state response into an OpenEnv State object.
+        Args:
+            payload: JSON response from the state endpoint.
+        Returns:
+            State with a task-derived episode identifier and current step count.
+        """
+        return State(
+            episode_id=payload.get("task_id", "pharma-vigilance"),
+            step_count=payload.get("step_number", 0),
+        )

data.py ADDED Viewed

	@@ -0,0 +1,136 @@

+TASK_DATA = {
+    "known_signal_easy": {
+        "reports": [
+            {
+                "report_id": "PV-EASY-001",
+                "patient_age": 59,
+                "patient_sex": "female",
+                "drugs": ["Lisinopril 20mg"],
+                "suspect_drug": "Lisinopril",
+                "reaction": "Persistent dry cough",
+                "onset_days": 11,
+                "severity": "mild",
+                "outcome": "not_recovered",
+                "similar_reports_last_30d": 1264,
+            }
+        ],
+        "ground_truth": {
+            "classification": "known_side_effect",
+            "suspect_drug": "Lisinopril",
+            "severity_assessment": "mild",
+            "recommended_action": "log_and_monitor",
+        },
+        "drug_interaction_db": {
+            "Lisinopril": {
+                "known_reactions": ["dry cough", "hyperkalemia", "angioedema"],
+                "class_note": "ACE inhibitors frequently cause persistent non-productive cough.",
+            }
+        },
+    },
+    "cluster_signal_medium": {
+        "reports": [
+            {
+                "report_id": "PV-MED-001",
+                "patient_age": 44,
+                "patient_sex": "female",
+                "drugs": ["Cardiovexa"],
+                "suspect_drug": "Cardiovexa",
+                "reaction": "symptomatic bradycardia with dizziness",
+                "onset_days": 9,
+                "severity": "moderate",
+                "outcome": "not_recovered",
+                "similar_reports_last_30d": 5,
+            },
+            {
+                "report_id": "PV-MED-002",
+                "patient_age": 69,
+                "patient_sex": "male",
+                "drugs": ["Cardiovexa"],
+                "suspect_drug": "Cardiovexa",
+                "reaction": "heart rate 32 with near-syncope",
+                "onset_days": 13,
+                "severity": "severe",
+                "outcome": "not_recovered",
+                "similar_reports_last_30d": 5,
+            },
+            {
+                "report_id": "PV-MED-003",
+                "patient_age": 57,
+                "patient_sex": "female",
+                "drugs": ["Cardiovexa"],
+                "suspect_drug": "Cardiovexa",
+                "reaction": "fatigue and sinus bradycardia",
+                "onset_days": 7,
+                "severity": "moderate",
+                "outcome": "not_recovered",
+                "similar_reports_last_30d": 5,
+            },
+            {
+                "report_id": "PV-MED-004",
+                "patient_age": 63,
+                "patient_sex": "male",
+                "drugs": ["Cardiovexa"],
+                "suspect_drug": "Cardiovexa",
+                "reaction": "bradyarrhythmia requiring ER evaluation",
+                "onset_days": 11,
+                "severity": "severe",
+                "outcome": "not_recovered",
+                "similar_reports_last_30d": 5,
+            },
+        ],
+        "ground_truth": {
+            "classification": "new_signal",
+            "suspect_drug": "Cardiovexa",
+            "severity_assessment": "severe",
+            "recommended_action": "escalate",
+        },
+        "drug_interaction_db": {
+            "Cardiovexa": {
+                "known_reactions": ["headache", "fatigue"],
+                "approval_date": "5 months ago",
+                "label_note": "No labeled conduction or rhythm adverse effects.",
+            }
+        },
+    },
+    "confounded_hard": {
+        "reports": [
+            {
+                "report_id": "PV-HARD-001",
+                "patient_age": 63,
+                "patient_sex": "male",
+                "drugs": [
+                    "Tacrolimus",
+                    "Prednisone",
+                    "Amlodipine",
+                    "Magnesium oxide",
+                    "Voriconazole",
+                    "Trimethoprim-sulfamethoxazole",
+                ],
+                "suspect_drug": "Trimethoprim-sulfamethoxazole",
+                "reaction": "Acute kidney injury with tacrolimus trough 4x baseline",
+                "onset_days": 6,
+                "severity": "critical",
+                "outcome": "not_recovered",
+                "similar_reports_last_30d": 1,
+            }
+        ],
+        "ground_truth": {
+            "classification": "new_signal",
+            "suspect_drug": "Tacrolimus+Voriconazole",
+            "severity_assessment": "critical",
+            "recommended_action": "escalate",
+        },
+        "drug_interaction_db": {
+            "Voriconazole": {
+                "strong_metabolic_inhibitor": True,
+                "interacts_with": ["Tacrolimus", "Cyclosporine"],
+                "interaction_note": "Markedly increases tacrolimus exposure; dose reduction and level monitoring required.",
+            },
+            "Tacrolimus": {
+                "narrow_therapeutic_index": True,
+                "known_reactions": ["nephrotoxicity", "tremor"],
+                "requires_level_monitoring": True,
+            },
+        },
+    },
+}

env.py ADDED Viewed

	@@ -0,0 +1,116 @@

+from typing import Dict, List, Optional, Tuple
+from pydantic import BaseModel, Field
+from tasks import TaskDefinition, get_task
+class AdverseEventReport(BaseModel):
+    report_id: str
+    patient_age: int
+    patient_sex: str
+    drugs: List[str]
+    suspect_drug: str
+    reaction: str
+    onset_days: int
+    severity: str
+    outcome: str
+    similar_reports_last_30d: int
+class Observation(BaseModel):
+    task_id: str
+    reports: List[AdverseEventReport]
+    drug_interaction_db: dict
+    step_number: int
+    max_steps: int
+    feedback: Optional[str] = None
+class Action(BaseModel):
+    classification: str
+    suspect_drug: str
+    severity_assessment: str
+    recommended_action: str
+    reasoning: str
+class Reward(BaseModel):
+    total: float = Field(..., ge=0.0, le=1.0)
+    breakdown: dict
+class PharmaVigilanceEnv:
+    def __init__(self):
+        self.current_task: Optional[TaskDefinition] = None
+        self.current_task_id: Optional[str] = None
+        self.step_number = 0
+        self.max_steps = 1
+        self.last_action: Optional[dict] = None
+        self.last_reward: Optional[dict] = None
+    def reset(self, task_id: str = "known_signal_easy") -> Observation:
+        self.current_task = get_task(task_id)
+        self.current_task_id = self.current_task.task_id
+        self.step_number = 0
+        self.last_action = None
+        self.last_reward = None
+        return Observation(
+            task_id=self.current_task.task_id,
+            reports=self.current_task.reports,
+            drug_interaction_db=self.current_task.drug_interaction_db,
+            step_number=self.step_number,
+            max_steps=self.max_steps,
+            feedback="Task loaded. Submit one final pharmacovigilance assessment.",
+        )
+    def step(self, action: Action) -> Tuple[Observation, Reward, bool, Dict]:
+        if self.current_task is None:
+            raise RuntimeError("Call reset() before step().")
+        reward = self.current_task.action_grader(action)
+        self.step_number += 1
+        self.last_action = action.model_dump()
+        self.last_reward = reward.model_dump()
+        done = True
+        matched = sum(
+            1
+            for field in (
+                "classification",
+                "suspect_drug",
+                "severity_assessment",
+                "recommended_action",
+            )
+            if reward.breakdown.get(field, 0.0) > 0
+        )
+        if reward.total >= 0.9:
+            feedback = "Strong assessment. The key safety judgment and follow-up were correct."
+        elif reward.total >= 0.5:
+            feedback = "Partially correct assessment. Some causal or operational details were missed."
+        else:
+            feedback = "Weak assessment. This case would need human analyst correction."
+        observation = Observation(
+            task_id=self.current_task.task_id,
+            reports=self.current_task.reports,
+            drug_interaction_db=self.current_task.drug_interaction_db,
+            step_number=self.step_number,
+            max_steps=self.max_steps,
+            feedback=feedback,
+        )
+        info = {
+            "matched_fields": matched,
+            "difficulty": self.current_task.difficulty,
+            "reward_breakdown": reward.breakdown,
+        }
+        return observation, reward, done, info
+    def state(self) -> dict:
+        return {
+            "task_id": self.current_task_id,
+            "step_number": self.step_number,
+            "last_action": self.last_action,
+            "last_reward": self.last_reward,
+        }

graders.py ADDED Viewed

	@@ -0,0 +1,33 @@

+"""Public grader entrypoints for OpenEnv validation and judging."""
+from server.graders import (
+    cluster_signal_medium_grader,
+    confounded_hard_grader,
+    easy_grader,
+    hard_grader,
+    known_signal_easy_grader,
+    medium_grader,
+)
+TASK_TO_GRADER = {
+    "known_signal_easy": known_signal_easy_grader,
+    "cluster_signal_medium": cluster_signal_medium_grader,
+    "confounded_hard": confounded_hard_grader,
+}
+TIER_TO_GRADER = {
+    "easy": easy_grader,
+    "medium": medium_grader,
+    "hard": hard_grader,
+}
+__all__ = [
+    "TASK_TO_GRADER",
+    "TIER_TO_GRADER",
+    "easy_grader",
+    "medium_grader",
+    "hard_grader",
+    "known_signal_easy_grader",
+    "cluster_signal_medium_grader",
+    "confounded_hard_grader",
+]

inference.py ADDED Viewed

	@@ -0,0 +1,213 @@

+"""
+Baseline runner for the Pharmacovigilance Signal Detector submission.
+This script queries a chat model through the OpenAI client, sends its decision
+to the environment server, and prints the exact machine-readable lines expected
+by the evaluator.
+"""
+import argparse
+import json
+import os
+from typing import Iterable, List
+import requests
+from openai import OpenAI
+from pydantic import ValidationError
+try:
+    from .models import PharmaAction
+except ImportError:
+    from models import PharmaAction
+API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
+MODEL_NAME = os.getenv("MODEL_NAME", "Qwen/Qwen2.5-72B-Instruct")
+HF_TOKEN = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
+ENV_URL = os.getenv("ENV_URL", "http://localhost:7860").rstrip("/")
+TASK_OVERRIDE = os.getenv("TASK_NAME", "").strip()
+BENCHMARK = "pharma-vigilance"
+TASK_SETS = {
+    "easy": ("known_signal_easy",),
+    "medium": ("cluster_signal_medium",),
+    "hard": ("confounded_hard",),
+    "all": ("known_signal_easy", "cluster_signal_medium", "confounded_hard"),
+}
+SYSTEM_MESSAGE = """
+You are acting as a pharmacovigilance triage analyst.
+Read the synthetic case bundle and reply with exactly one JSON object.
+Allowed keys:
+- classification
+- suspect_drug
+- severity_assessment
+- recommended_action
+- reasoning
+Allowed values:
+- classification: new_signal, known_side_effect, noise, duplicate
+- severity_assessment: mild, moderate, severe, critical
+- recommended_action: escalate, log_and_monitor, dismiss, request_more_info
+No markdown. No explanation outside the JSON object.
+""".strip()
+def emit_start(task_name: str) -> None:
+    print(f"[START] task={task_name} env={BENCHMARK} model={MODEL_NAME}", flush=True)
+def emit_step(step_no: int, action_text: str, reward: float, done: bool, error: str | None) -> None:
+    error_text = error if error else "null"
+    print(
+        f"[STEP] step={step_no} action={action_text} reward={reward:.2f} "
+        f"done={str(done).lower()} error={error_text}",
+        flush=True,
+    )
+def emit_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
+    reward_text = ",".join(f"{reward:.2f}" for reward in rewards)
+    print(
+        f"[END] success={str(success).lower()} steps={steps} "
+        f"score={score:.2f} rewards={reward_text}",
+        flush=True,
+    )
+def choose_tasks(selection: str) -> Iterable[str]:
+    if TASK_OVERRIDE:
+        return (TASK_OVERRIDE,)
+    return TASK_SETS[selection]
+def client() -> OpenAI:
+    if not HF_TOKEN:
+        raise EnvironmentError("HF_TOKEN or API_KEY must be set before running inference.py")
+    return OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)
+def fetch_reset(task_name: str) -> dict:
+    response = requests.post(
+        f"{ENV_URL}/reset",
+        json={"task_id": task_name},
+        timeout=30,
+    )
+    response.raise_for_status()
+    return response.json()
+def submit_action(action: PharmaAction) -> dict:
+    response = requests.post(
+        f"{ENV_URL}/step",
+        json=action.model_dump(),
+        timeout=30,
+    )
+    response.raise_for_status()
+    return response.json()
+def prompt_for_case(observation: dict) -> str:
+    return (
+        "Assess the following pharmacovigilance case.\n\n"
+        "Return one final structured judgment.\n\n"
+        f"{json.dumps(observation, ensure_ascii=True, indent=2)}\n\n"
+        "Focus on whether the case is novel or known, the most plausible causal "
+        "drug or interaction, the right severity band, and the operational next step."
+    )
+def ask_model(llm: OpenAI, observation: dict) -> PharmaAction:
+    completion = llm.chat.completions.create(
+        model=MODEL_NAME,
+        messages=[
+            {"role": "system", "content": SYSTEM_MESSAGE},
+            {"role": "user", "content": prompt_for_case(observation)},
+        ],
+        temperature=0.0,
+        max_tokens=260,
+        stream=False,
+    )
+    text = (completion.choices[0].message.content or "").strip()
+    payload = json.loads(text)
+    return PharmaAction(**payload)
+def compact_action(action: PharmaAction) -> str:
+    label = action.classification
+    if action.suspect_drug:
+        return f"{label}/{action.suspect_drug}"
+    return label
+def final_score(rewards: List[float]) -> float:
+    if not rewards:
+        return 0.0
+    score = sum(rewards) / len(rewards)
+    return min(max(round(score, 4), 0.0), 1.0)
+def run_one_task(llm: OpenAI, task_name: str) -> None:
+    rewards: List[float] = []
+    steps_taken = 0
+    score = 0.0
+    success = False
+    emit_start(task_name)
+    try:
+        observation = fetch_reset(task_name)
+        action = ask_model(llm, observation)
+        action_text = compact_action(action)
+        result = submit_action(action)
+        reward_payload = result.get("reward", {})
+        reward = (
+            float(reward_payload.get("total", 0.0))
+            if isinstance(reward_payload, dict)
+            else float(reward_payload)
+        )
+        done = bool(result.get("done", False))
+        rewards.append(reward)
+        steps_taken = 1
+        emit_step(1, action_text, reward, done, None)
+        score = final_score(rewards)
+        success = score >= 0.75
+    except json.JSONDecodeError:
+        rewards = [0.0]
+        steps_taken = 1
+        emit_step(1, "parse_error", 0.0, True, "parse_error")
+    except ValidationError:
+        rewards = [0.0]
+        steps_taken = 1
+        emit_step(1, "schema_error", 0.0, True, "schema_error")
+    except Exception as exc:
+        rewards = [0.0]
+        steps_taken = 1
+        emit_step(1, "error", 0.0, True, str(exc))
+    finally:
+        emit_end(success, steps_taken, score, rewards or [0.0])
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Run the pharmacovigilance baseline agent")
+    parser.add_argument(
+        "--difficulty",
+        choices=["easy", "medium", "hard", "all"],
+        default="all",
+        help="Which task subset to run",
+    )
+    args = parser.parse_args()
+    llm = client()
+    for task_name in choose_tasks(args.difficulty):
+        run_one_task(llm, task_name)
+if __name__ == "__main__":
+    main()

models.py ADDED Viewed

	@@ -0,0 +1,52 @@

+from typing import List, Optional
+from openenv.core.env_server.types import Action, Observation
+from pydantic import BaseModel, ConfigDict, Field
+class AdverseEventReport(BaseModel):
+    model_config = ConfigDict(revalidate_instances="never")
+    report_id: str = Field(..., description="Unique synthetic report identifier")
+    patient_age: int = Field(..., description="Patient age in years")
+    patient_sex: str = Field(..., description="Patient sex")
+    drugs: List[str] = Field(default_factory=list, description="All drugs the patient was taking")
+    suspect_drug: str = Field(..., description="Drug named as suspect by the original reporter")
+    reaction: str = Field(..., description="Observed adverse reaction")
+    onset_days: int = Field(..., description="Days from drug start to reaction onset")
+    severity: str = Field(..., description="Reported case severity")
+    outcome: str = Field(..., description="Clinical outcome status")
+    similar_reports_last_30d: int = Field(..., description="Count of similar reports in the last 30 days")
+class PharmaObservation(Observation):
+    model_config = ConfigDict(
+        extra="forbid",
+        validate_assignment=True,
+        arbitrary_types_allowed=True,
+        revalidate_instances="never",
+    )
+    task_id: str = Field(..., description="Current task identifier")
+    reports: List[AdverseEventReport] = Field(default_factory=list, description="Synthetic adverse event reports")
+    drug_interaction_db: dict = Field(default_factory=dict, description="Hardcoded interaction and safety lookup")
+    step_number: int = Field(default=0, description="Current step number")
+    max_steps: int = Field(default=1, description="Maximum number of steps in the episode")
+    feedback: Optional[str] = Field(default=None, description="Feedback after the previous action")
+    reward: float = Field(default=0.0, description="Reward from the last action")
+    done: bool = Field(default=False, description="Episode termination flag")
+    metadata: dict = Field(default_factory=dict, description="Additional environment metadata")
+class PharmaAction(Action):
+    classification: str = Field(..., description="new_signal | known_side_effect | noise | duplicate")
+    suspect_drug: str = Field(..., description="Drug or interaction believed to be causal")
+    severity_assessment: str = Field(..., description="mild | moderate | severe | critical")
+    recommended_action: str = Field(..., description="escalate | log_and_monitor | dismiss | request_more_info")
+    reasoning: str = Field(default="", description="Short explanation of the decision")
+class PharmaReward(BaseModel):
+    total: float = Field(..., description="Total reward in the 0.0-1.0 range")
+    breakdown: dict = Field(default_factory=dict, description="Per-component reward breakdown")

openenv.yaml ADDED Viewed

	@@ -0,0 +1,113 @@

+spec_version: 1
+name: pharma_vigilance_env
+display_name: "Pharmacovigilance Signal Detector"
+description: >
+  A real-world OpenEnv environment where an AI agent acts as a pharmacovigilance
+  analyst. The agent reviews synthetic adverse-event cases, decides whether they
+  represent known labeled effects, emerging safety signals, or low-value noise,
+  and recommends the correct operational follow-up. Tasks cover known class
+  effects, clustered post-marketing signal detection, and confounded
+  drug-drug-interaction cases that require causal reasoning rather than surface
+  blame assignment.
+type: space
+runtime: fastapi
+app: server.app:app
+port: 7860
+tags:
+  - openenv
+  - healthcare
+  - pharmacovigilance
+  - drug-safety
+  - reinforcement-learning
+action_space:
+  type: structured
+  fields:
+    - name: classification
+      type: string
+      values: [new_signal, known_side_effect, noise, duplicate]
+      description: "Top-level safety classification chosen by the agent"
+    - name: suspect_drug
+      type: string
+      description: "Drug or drug interaction the agent believes is causally responsible"
+    - name: severity_assessment
+      type: string
+      values: [mild, moderate, severe, critical]
+      description: "Agent-assigned clinical severity for the case"
+    - name: recommended_action
+      type: string
+      values: [escalate, log_and_monitor, dismiss, request_more_info]
+      description: "Operational pharmacovigilance follow-up decision"
+    - name: reasoning
+      type: string
+      description: "Brief free-text rationale used for partial credit on the hard task"
+observation_space:
+  type: structured
+  fields:
+    - name: task_id
+      type: string
+      description: "Identifier of the current pharmacovigilance task"
+    - name: reports
+      type: array
+      description: "One or more synthetic adverse-event reports included in the case"
+    - name: drug_interaction_db
+      type: object
+      description: "Hardcoded safety and interaction reference data visible to the agent"
+    - name: step_number
+      type: integer
+      description: "Current step index within the episode"
+    - name: max_steps
+      type: integer
+      description: "Maximum number of steps allowed in the episode"
+    - name: feedback
+      type: string
+      required: false
+      description: "Human-readable feedback from the previous action"
+reward:
+  min: 0.0
+  max: 1.0
+  description: >
+    Reward is built from four 0.25 components for classification correctness,
+    causal suspect selection, severity assessment, and recommended operational
+    action. A false alarm penalty of -0.10 applies when the agent escalates a
+    case that is truly noise, and a larger missed-signal penalty of -0.20
+    applies when the agent dismisses a true new signal. The hard task can earn
+    an additional +0.15 reasoning bonus when the explanation explicitly
+    references the interaction mechanism or therapeutic drug monitoring clues.
+difficulties:
+  - easy
+  - medium
+  - hard
+max_steps: 1
+tasks:
+  - id: known_signal_easy
+    difficulty: easy
+    description: >
+      Review a synthetic single-patient report in which an ACE inhibitor is
+      followed by persistent dry cough and many similar recent cases already
+      exist. The correct behavior is to recognize this as a known labeled effect
+      and recommend routine logging and monitoring rather than escalation.
+    grader: graders.known_signal_easy_grader
+  - id: cluster_signal_medium
+    difficulty: medium
+    description: >
+      Review a clustered set of recent post-marketing reports tied to a newly
+      launched cardiovascular therapy. The reports show symptomatic bradycardia
+      and near-syncope despite the label lacking rhythm-related warnings. The
+      agent should detect an emerging signal and escalate.
+    grader: graders.cluster_signal_medium_grader
+  - id: confounded_hard
+    difficulty: hard
+    description: >
+      Review a confounded transplant-medicine case in which the reporter blames
+      the wrong drug. The correct judgment requires identifying a tacrolimus and
+      voriconazole interaction, recognizing acute kidney injury risk from toxic
+      exposure, and escalating the case as a clinically serious new signal.
+    grader: graders.confounded_hard_grader

pyproject.toml ADDED Viewed

	@@ -0,0 +1,24 @@

+[build-system]
+requires = ["setuptools>=68", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "pharmacovigilance-env"
+version = "1.0.0"
+description = "Pharmacovigilance Signal Detector"
+requires-python = ">=3.10"
+dependencies = [
+    "fastapi",
+    "uvicorn",
+    "pydantic",
+    "openai",
+    "requests",
+    "openenv-core",
+]
+[project.scripts]
+server = "server.app:main"
+[tool.setuptools]
+py-modules = ["env", "tasks", "data", "inference"]
+packages = ["server"]

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+fastapi
+uvicorn
+pydantic
+openai
+requests
+openenv-core

run_demo.py ADDED Viewed

	@@ -0,0 +1,23 @@

+from agent import RuleBasedPharmaAgent
+from env import PharmaVigilanceEnv
+def main() -> None:
+    env = PharmaVigilanceEnv()
+    agent = RuleBasedPharmaAgent()
+    for task_id in ("known_signal_easy", "cluster_signal_medium", "confounded_hard"):
+        observation = env.reset(task_id)
+        action = agent.act(observation)
+        observation, reward, done, info = env.step(action)
+        print(f"\nTask: {task_id}")
+        print(f"Action: {action.classification} / {action.suspect_drug}")
+        print(f"Reward: {reward.total:.2f}")
+        print(f"Done: {done}")
+        print(f"Feedback: {observation.feedback}")
+        print(f"Info: {info}")
+if __name__ == "__main__":
+    main()

server.py ADDED Viewed

	@@ -0,0 +1,50 @@

+from fastapi import FastAPI
+from env import Action, PharmaVigilanceEnv
+app = FastAPI()
+env = PharmaVigilanceEnv()
+@app.post("/reset")
+def reset(body: dict = {}):
+    task_id = body.get("task_id", "known_signal_easy")
+    obs = env.reset(task_id)
+    return obs.model_dump()
+@app.post("/step")
+def step(action: Action):
+    obs, reward, done, info = env.step(action)
+    return {
+        "observation": obs.model_dump(),
+        "reward": reward.model_dump(),
+        "done": done,
+        "info": info,
+    }
+@app.get("/state")
+def state():
+    return env.state()
+@app.get("/tasks")
+def list_tasks():
+    return {"tasks": ["known_signal_easy", "cluster_signal_medium", "confounded_hard"]}
+@app.get("/health")
+def health():
+    return {"status": "ok"}
+def main(host: str = "0.0.0.0", port: int = 7860) -> None:
+    import uvicorn
+    uvicorn.run(app, host=host, port=port)
+if __name__ == "__main__":
+    main()

server/__init__.py ADDED Viewed

	@@ -0,0 +1,11 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Pharmacovigilance environment server components."""
+from .pharma_vigilance_env_environment import PharmaVigilanceEnv
+__all__ = ["PharmaVigilanceEnv"]

server/app.py ADDED Viewed

	@@ -0,0 +1,50 @@

+from fastapi import FastAPI
+from env import Action, PharmaVigilanceEnv
+app = FastAPI()
+env = PharmaVigilanceEnv()
+@app.post("/reset")
+def reset(body: dict = {}):
+    task_id = body.get("task_id", "known_signal_easy")
+    obs = env.reset(task_id)
+    return obs.model_dump()
+@app.post("/step")
+def step(action: Action):
+    obs, reward, done, info = env.step(action)
+    return {
+        "observation": obs.model_dump(),
+        "reward": reward.model_dump(),
+        "done": done,
+        "info": info,
+    }
+@app.get("/state")
+def state():
+    return env.state()
+@app.get("/tasks")
+def list_tasks():
+    return {"tasks": ["known_signal_easy", "cluster_signal_medium", "confounded_hard"]}
+@app.get("/health")
+def health():
+    return {"status": "ok"}
+def main(host: str = "0.0.0.0", port: int = 7860) -> None:
+    import uvicorn
+    uvicorn.run(app, host=host, port=port)
+if __name__ == "__main__":
+    main()

server/graders.py ADDED Viewed

	@@ -0,0 +1,179 @@

+"""
+Trajectory scorers for the Pharmacovigilance Signal Detector.
+These functions are intentionally pharmacovigilance-specific rather than
+generic "reward bucket" adapters. The scoring rubric emphasizes:
+1. Signal sensitivity: missing a true novel safety signal is costly.
+2. Operational judgment: escalation/log/dismiss choices matter independently.
+3. Causal calibration: high scores should reflect not just suspicion, but
+   identifying the right drug or interaction.
+All public grader outputs are forced into the judge-safe interval (0.01, 0.99).
+"""
+from typing import Any, Iterable, List
+STRICT_MIN = 0.01
+STRICT_MAX = 0.99
+def _bounded(value: float) -> float:
+    return min(max(round(value, 4), STRICT_MIN), STRICT_MAX)
+def _as_reward_list(trajectory: dict | None) -> List[float]:
+    payload = trajectory or {}
+    rewards = payload.get("rewards")
+    if isinstance(rewards, list) and rewards:
+        return [float(item) for item in rewards]
+    if "score" in payload:
+        return [float(payload["score"])]
+    reward = payload.get("reward")
+    if isinstance(reward, dict) and "total" in reward:
+        return [float(reward["total"])]
+    if reward is not None:
+        return [float(reward)]
+    return []
+def _reward_profile(reward: float) -> str:
+    """
+    Translate a step reward into a pharmacovigilance interpretation bucket.
+    This keeps the grader coupled to the meaning of the environment rather than
+    to borrowed labels from a different domain.
+    """
+    if reward <= 0.05:
+        return "unsafe_miss"
+    if reward <= 0.20:
+        return "bad_call"
+    if reward < 0.50:
+        return "weak_triage"
+    if reward < 0.80:
+        return "workable_triage"
+    if reward < 0.95:
+        return "strong_triage"
+    return "expert_triage"
+def _mean(values: Iterable[float]) -> float:
+    items = list(values)
+    if not items:
+        return 0.5
+    return sum(items) / len(items)
+def _score_episode(
+    rewards: List[float],
+    *,
+    miss_cost: float,
+    overcall_cost: float,
+    stability_gain: float,
+    expertise_gain: float,
+) -> float:
+    if not rewards:
+        return 0.5
+    labels = [_reward_profile(reward) for reward in rewards]
+    mean_reward = _mean(rewards)
+    total_steps = len(rewards)
+    unsafe_miss_count = labels.count("unsafe_miss")
+    bad_call_count = labels.count("bad_call")
+    weak_count = labels.count("weak_triage")
+    strong_count = labels.count("strong_triage") + labels.count("expert_triage")
+    expert_count = labels.count("expert_triage")
+    downward_pressure = (
+        min(unsafe_miss_count * miss_cost, 0.35)
+        + min(bad_call_count * overcall_cost, 0.15)
+        + min(weak_count * 0.015, 0.06)
+    )
+    upward_pressure = 0.0
+    if strong_count / total_steps >= 0.80:
+        upward_pressure += stability_gain
+    if expert_count / total_steps >= 0.60:
+        upward_pressure += expertise_gain
+    return _bounded(mean_reward - downward_pressure + upward_pressure)
+def easy_grader(trajectory: dict = None) -> float:
+    """
+    Easy tier: obvious known-signal recognition and straightforward handling.
+    The scorer expects high reliability here. Weak or missed judgments are
+    penalized more sharply because these are the least ambiguous cases.
+    """
+    rewards = _as_reward_list(trajectory)
+    return _score_episode(
+        rewards,
+        miss_cost=0.12,
+        overcall_cost=0.03,
+        stability_gain=0.05,
+        expertise_gain=0.01,
+    )
+def medium_grader(trajectory: dict = None) -> float:
+    """
+    Medium tier: cluster recognition and escalation readiness.
+    These cases reward agents that can move from single-case thinking to
+    population-level signal interpretation.
+    """
+    rewards = _as_reward_list(trajectory)
+    return _score_episode(
+        rewards,
+        miss_cost=0.09,
+        overcall_cost=0.04,
+        stability_gain=0.03,
+        expertise_gain=0.02,
+    )
+def hard_grader(trajectory: dict = None) -> float:
+    """
+    Hard tier: confounding, blame reassignment, and interaction reasoning.
+    The hard scorer gives extra value to near-expert trajectories because this
+    tier is specifically designed to separate shallow pattern matching from
+    mechanistic causal reasoning.
+    """
+    rewards = _as_reward_list(trajectory)
+    return _score_episode(
+        rewards,
+        miss_cost=0.07,
+        overcall_cost=0.03,
+        stability_gain=0.02,
+        expertise_gain=0.04,
+    )
+def known_signal_easy_grader(trajectory: dict = None) -> float:
+    return easy_grader(trajectory)
+def cluster_signal_medium_grader(trajectory: dict = None) -> float:
+    return medium_grader(trajectory)
+def confounded_hard_grader(trajectory: dict = None) -> float:
+    return hard_grader(trajectory)
+__all__ = [
+    "easy_grader",
+    "medium_grader",
+    "hard_grader",
+    "known_signal_easy_grader",
+    "cluster_signal_medium_grader",
+    "confounded_hard_grader",
+]

server/pharma_vigilance_env_environment.py ADDED Viewed

	@@ -0,0 +1,5 @@

+"""Compatibility wrapper exposing the main environment class under server/."""
+from env import PharmaVigilanceEnv
+__all__ = ["PharmaVigilanceEnv"]

server/requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+fastapi
+uvicorn[standard]
+pydantic>=2.0
+openai
+requests
+openenv-core

server/tasks.py ADDED Viewed

	@@ -0,0 +1,27 @@

+"""Server-side task exports for the pharmacovigilance environment."""
+from tasks import (
+    GroundTruth,
+    TaskDefinition,
+    cluster_signal_medium_action_grader,
+    cluster_signal_medium_grader,
+    confounded_hard_action_grader,
+    confounded_hard_grader,
+    get_task,
+    get_tasks,
+    known_signal_easy_action_grader,
+    known_signal_easy_grader,
+)
+__all__ = [
+    "GroundTruth",
+    "TaskDefinition",
+    "get_task",
+    "get_tasks",
+    "known_signal_easy_action_grader",
+    "cluster_signal_medium_action_grader",
+    "confounded_hard_action_grader",
+    "known_signal_easy_grader",
+    "cluster_signal_medium_grader",
+    "confounded_hard_grader",
+]

tasks.py ADDED Viewed

	@@ -0,0 +1,222 @@

+import random
+from typing import Any, Callable, Dict, List, Optional
+from pydantic import BaseModel, ConfigDict, Field
+from data import TASK_DATA
+class GroundTruth(BaseModel):
+    classification: str
+    suspect_drug: str
+    severity_assessment: str
+    recommended_action: str
+class TaskDefinition(BaseModel):
+    model_config = ConfigDict(arbitrary_types_allowed=True, revalidate_instances="never")
+    task_id: str = Field(..., description="Unique pharmacovigilance task identifier")
+    difficulty: str = Field(..., description="easy | medium | hard")
+    reports: List[Any] = Field(default_factory=list, description="Synthetic adverse event reports")
+    drug_interaction_db: dict = Field(default_factory=dict, description="Hardcoded interaction and safety context")
+    ground_truth: GroundTruth
+    action_grader: Callable[[Any], Any]
+    description: str = Field(default="", description="Human-readable task summary")
+    @property
+    def id(self) -> str:
+        return self.task_id
+def _base_breakdown(action: Any, ground_truth: GroundTruth) -> dict:
+    action_suspect = action.suspect_drug.strip().lower()
+    truth_suspect = ground_truth.suspect_drug.strip().lower()
+    suspect_match = (
+        action_suspect == truth_suspect
+        or action_suspect in truth_suspect
+        or truth_suspect in action_suspect
+    )
+    breakdown = {
+        "classification": 0.25 if action.classification == ground_truth.classification else 0.0,
+        "suspect_drug": 0.25 if suspect_match else 0.0,
+        "severity_assessment": 0.25 if action.severity_assessment == ground_truth.severity_assessment else 0.0,
+        "recommended_action": 0.25 if action.recommended_action == ground_truth.recommended_action else 0.0,
+        "false_alarm_penalty": 0.0,
+        "missed_signal_penalty": 0.0,
+        "reasoning_bonus": 0.0,
+    }
+    if action.classification == "new_signal" and ground_truth.classification == "noise":
+        breakdown["false_alarm_penalty"] = -0.10
+    if action.classification == "noise" and ground_truth.classification == "new_signal":
+        breakdown["missed_signal_penalty"] = -0.20
+    return breakdown
+def _reward_from_breakdown(breakdown: dict):
+    from env import Reward
+    total = round(sum(breakdown.values()), 4)
+    return Reward(total=max(0.0, min(1.0, total)), breakdown=breakdown)
+def known_signal_easy_action_grader(action: Any):
+    truth = GroundTruth(**TASK_DATA["known_signal_easy"]["ground_truth"])
+    breakdown = _base_breakdown(action, truth)
+    return _reward_from_breakdown(breakdown)
+def cluster_signal_medium_action_grader(action: Any):
+    truth = GroundTruth(**TASK_DATA["cluster_signal_medium"]["ground_truth"])
+    breakdown = _base_breakdown(action, truth)
+    return _reward_from_breakdown(breakdown)
+def confounded_hard_action_grader(action: Any):
+    truth = GroundTruth(**TASK_DATA["confounded_hard"]["ground_truth"])
+    breakdown = _base_breakdown(action, truth)
+    reasoning = action.reasoning.lower()
+    if any(
+        term in reasoning
+        for term in ("drug interaction", "tacrolimus", "voriconazole", "azole", "calcineurin", "level monitoring")
+    ):
+        breakdown["reasoning_bonus"] = 0.15
+    return _reward_from_breakdown(breakdown)
+def _grader_score_from_trajectory(trajectory: Any = None) -> float:
+    trajectory = trajectory or {}
+    raw_score = 0.5
+    if isinstance(trajectory, dict):
+        if "score" in trajectory:
+            raw_score = float(trajectory["score"])
+        elif "rewards" in trajectory and trajectory["rewards"]:
+            raw_score = float(trajectory["rewards"][-1])
+        elif "reward" in trajectory:
+            reward_val = trajectory["reward"]
+            if isinstance(reward_val, dict) and "total" in reward_val:
+                raw_score = float(reward_val["total"])
+            else:
+                raw_score = float(reward_val)
+    return max(0.01, min(0.99, round(raw_score, 4)))
+def known_signal_easy_grader(trajectory: Any = None) -> float:
+    from server.graders import known_signal_easy_grader as _delegate
+    return _delegate(trajectory)
+def cluster_signal_medium_grader(trajectory: Any = None) -> float:
+    from server.graders import cluster_signal_medium_grader as _delegate
+    return _delegate(trajectory)
+def confounded_hard_grader(trajectory: Any = None) -> float:
+    from server.graders import confounded_hard_grader as _delegate
+    return _delegate(trajectory)
+def _task_definition(
+    task_id: str,
+    difficulty: str,
+    description: str,
+    action_grader: Callable[[Any], Any],
+) -> TaskDefinition:
+    from env import AdverseEventReport
+    task_data = TASK_DATA[task_id]
+    return TaskDefinition(
+        task_id=task_id,
+        difficulty=difficulty,
+        reports=[AdverseEventReport(**report) for report in task_data["reports"]],
+        drug_interaction_db=task_data["drug_interaction_db"],
+        ground_truth=GroundTruth(**task_data["ground_truth"]),
+        action_grader=action_grader,
+        description=description,
+    )
+def _build_all_tasks() -> Dict[str, List[TaskDefinition]]:
+    """Build and return the complete task pool grouped by difficulty."""
+    return {
+        "easy": [
+            _task_definition(
+                "known_signal_easy",
+                "easy",
+                "Known ACE-inhibitor cough case that should be logged and monitored rather than escalated.",
+                known_signal_easy_action_grader,
+            ),
+        ],
+        "medium": [
+            _task_definition(
+                "cluster_signal_medium",
+                "medium",
+                "Cluster of bradycardia reports around a newly approved therapy that should be escalated as a signal.",
+                cluster_signal_medium_action_grader,
+            ),
+        ],
+        "hard": [
+            _task_definition(
+                "confounded_hard",
+                "hard",
+                "Confounded transplant case where the blamed drug is wrong and the real problem is a tacrolimus interaction.",
+                confounded_hard_action_grader,
+            ),
+        ],
+    }
+def get_tasks(
+    difficulty: Optional[str] = None,
+    seed: Optional[int] = None,
+    n: int = 5,
+    grouped: bool = False,
+):
+    """
+    Return tasks either as a flat task-id map or a difficulty-filtered list.
+    Args:
+        difficulty: Optional difficulty tier to select from.
+        seed: Optional seed for reproducible shuffling within a difficulty pool.
+        n: Maximum number of tasks to return when selecting by difficulty.
+        grouped: When True and difficulty is None, return the difficulty-grouped dict.
+    Returns:
+        If grouped=True and difficulty is None:
+            Dict[str, List[TaskDefinition]]
+        If difficulty is None:
+            Dict[str, TaskDefinition]
+        Otherwise:
+            List[TaskDefinition]
+    """
+    all_tasks = _build_all_tasks()
+    if difficulty is None:
+        if grouped:
+            return {level: tasks[:n] for level, tasks in all_tasks.items()}
+        return {
+            task.task_id: task
+            for tasks in all_tasks.values()
+            for task in tasks[:n]
+        }
+    pool = list(all_tasks.get(difficulty, []))
+    if seed is not None:
+        rng = random.Random(seed)
+        rng.shuffle(pool)
+    return pool[:n]
+def get_task(task_id: str) -> TaskDefinition:
+    tasks = get_tasks()
+    if task_id not in tasks:
+        raise KeyError(f"Unknown task_id: {task_id}")
+    return tasks[task_id]

tests/test_env.py ADDED Viewed

	@@ -0,0 +1,132 @@

+import sys
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+from env import Action, PharmaVigilanceEnv
+from tasks import (
+    cluster_signal_medium_action_grader,
+    cluster_signal_medium_grader,
+    confounded_hard_action_grader,
+    confounded_hard_grader,
+    get_task,
+    get_tasks,
+    known_signal_easy_action_grader,
+    known_signal_easy_grader,
+)
+def test_reset_loads_easy_task():
+    env = PharmaVigilanceEnv()
+    obs = env.reset("known_signal_easy")
+    assert obs.task_id == "known_signal_easy"
+    assert obs.step_number == 0
+    assert len(obs.reports) == 1
+def test_known_signal_grader_full_credit():
+    reward = known_signal_easy_action_grader(
+        Action(
+            classification="known_side_effect",
+            suspect_drug="Lisinopril",
+            severity_assessment="mild",
+            recommended_action="log_and_monitor",
+            reasoning="Known reaction pattern.",
+        )
+    )
+    assert reward.total == 1.0
+def test_medium_cluster_grader_partial_credit():
+    reward = cluster_signal_medium_action_grader(
+        Action(
+            classification="new_signal",
+            suspect_drug="Cardiovexa",
+            severity_assessment="moderate",
+            recommended_action="escalate",
+            reasoning="A cluster is forming.",
+        )
+    )
+    assert reward.total == 0.75
+def test_hard_grader_reasoning_bonus():
+    reward = confounded_hard_action_grader(
+        Action(
+            classification="new_signal",
+            suspect_drug="Tacrolimus+Voriconazole",
+            severity_assessment="critical",
+            recommended_action="escalate",
+            reasoning="This looks like a tacrolimus-voriconazole drug interaction with toxic levels.",
+        )
+    )
+    assert reward.total == 1.0
+    assert reward.breakdown["reasoning_bonus"] == 0.15
+def test_hard_grader_substring_suspect_match():
+    reward = confounded_hard_action_grader(
+        Action(
+            classification="new_signal",
+            suspect_drug="Tacrolimus",
+            severity_assessment="critical",
+            recommended_action="escalate",
+            reasoning="Voriconazole likely raised tacrolimus exposure.",
+        )
+    )
+    assert reward.breakdown["suspect_drug"] == 0.25
+def test_env_step_returns_done():
+    env = PharmaVigilanceEnv()
+    env.reset("confounded_hard")
+    obs, reward, done, info = env.step(
+        Action(
+            classification="new_signal",
+            suspect_drug="Tacrolimus+Voriconazole",
+            severity_assessment="critical",
+            recommended_action="escalate",
+            reasoning="Tacrolimus toxicity from an azole interaction.",
+        )
+    )
+    assert done is True
+    assert obs.step_number == 1
+    assert "reward_breakdown" in info
+    assert reward.total >= 0.85
+def test_state_tracks_last_action():
+    env = PharmaVigilanceEnv()
+    env.reset("known_signal_easy")
+    env.step(
+        Action(
+            classification="known_side_effect",
+            suspect_drug="Lisinopril",
+            severity_assessment="mild",
+            recommended_action="log_and_monitor",
+            reasoning="Known adverse effect.",
+        )
+    )
+    state = env.state()
+    assert state["step_number"] == 1
+    assert state["last_action"]["classification"] == "known_side_effect"
+def test_all_tasks_available():
+    tasks = get_tasks()
+    assert set(tasks.keys()) == {
+        "known_signal_easy",
+        "cluster_signal_medium",
+        "confounded_hard",
+    }
+def test_get_task_returns_hard_truth():
+    task = get_task("confounded_hard")
+    assert task.ground_truth.suspect_drug == "Tacrolimus+Voriconazole"
+def test_public_graders_are_strictly_bounded():
+    assert known_signal_easy_grader({"rewards": [1.0]}) == 0.99
+    assert cluster_signal_medium_grader({"rewards": [0.0]}) == 0.01
+    assert confounded_hard_grader({"score": 1.5}) == 0.99

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff

validate-submission.sh ADDED Viewed

	@@ -0,0 +1,185 @@

+#!/usr/bin/env bash
+#
+# validate-submission.sh — OpenEnv Submission Validator
+#
+# Checks that your HF Space is live, Docker image builds, and openenv validate passes.
+#
+# Prerequisites:
+#   - Docker:       https://docs.docker.com/get-docker/
+#   - openenv-core: pip install openenv-core
+#   - curl (usually pre-installed)
+#
+# Run:
+#   curl -fsSL https://raw.githubusercontent.com/<owner>/<repo>/main/scripts/validate-submission.sh | bash -s -- <ping_url> [repo_dir]
+#
+#   Or download and run locally:
+#     chmod +x validate-submission.sh
+#     ./validate-submission.sh <ping_url> [repo_dir]
+#
+# Arguments:
+#   ping_url   Your HuggingFace Space URL (e.g. https://your-space.hf.space)
+#   repo_dir   Path to your repo (default: current directory)
+#
+# Examples:
+#   ./validate-submission.sh https://my-team.hf.space
+#   ./validate-submission.sh https://my-team.hf.space ./my-repo
+#
+set -uo pipefail
+DOCKER_BUILD_TIMEOUT=600
+if [ -t 1 ]; then
+  RED='\033[0;31m'
+  GREEN='\033[0;32m'
+  YELLOW='\033[1;33m'
+  BOLD='\033[1m'
+  NC='\033[0m'
+else
+  RED='' GREEN='' YELLOW='' BOLD='' NC=''
+fi
+run_with_timeout() {
+  local secs="$1"; shift
+  if command -v timeout &>/dev/null; then
+    timeout "$secs" "$@"
+  elif command -v gtimeout &>/dev/null; then
+    gtimeout "$secs" "$@"
+  else
+    "$@" &
+    local pid=$!
+    ( sleep "$secs" && kill "$pid" 2>/dev/null ) &
+    local watcher=$!
+    wait "$pid" 2>/dev/null
+    local rc=$?
+    kill "$watcher" 2>/dev/null
+    wait "$watcher" 2>/dev/null
+    return $rc
+  fi
+}
+portable_mktemp() {
+  local prefix="${1:-validate}"
+  mktemp "${TMPDIR:-/tmp}/${prefix}-XXXXXX" 2>/dev/null || mktemp
+}
+CLEANUP_FILES=()
+cleanup() { rm -f "${CLEANUP_FILES[@]+"${CLEANUP_FILES[@]}"}"; }
+trap cleanup EXIT
+PING_URL="${1:-}"
+REPO_DIR="${2:-.}"
+if [ -z "$PING_URL" ]; then
+  printf "Usage: %s <ping_url> [repo_dir]\n" "$0"
+  printf "\n"
+  printf "  ping_url   Your HuggingFace Space URL (e.g. https://your-space.hf.space)\n"
+  printf "  repo_dir   Path to your repo (default: current directory)\n"
+  exit 1
+fi
+if ! REPO_DIR="$(cd "$REPO_DIR" 2>/dev/null && pwd)"; then
+  printf "Error: directory '%s' not found\n" "${2:-.}"
+  exit 1
+fi
+PING_URL="${PING_URL%/}"
+export PING_URL
+PASS=0
+log()  { printf "[%s] %b\n" "$(date -u +%H:%M:%S)" "$*"; }
+pass() { log "${GREEN}PASSED${NC} -- $1"; PASS=$((PASS + 1)); }
+fail() { log "${RED}FAILED${NC} -- $1"; }
+hint() { printf "  ${YELLOW}Hint:${NC} %b\n" "$1"; }
+stop_at() {
+  printf "\n"
+  printf "${RED}${BOLD}Validation stopped at %s.${NC} Fix the above before continuing.\n" "$1"
+  exit 1
+}
+printf "\n"
+printf "${BOLD}========================================${NC}\n"
+printf "${BOLD}  OpenEnv Submission Validator${NC}\n"
+printf "${BOLD}========================================${NC}\n"
+log "Repo:     $REPO_DIR"
+log "Ping URL: $PING_URL"
+printf "\n"
+log "${BOLD}Step 1/3: Pinging HF Space${NC} ($PING_URL/reset) ..."
+CURL_OUTPUT=$(portable_mktemp "validate-curl")
+CLEANUP_FILES+=("$CURL_OUTPUT")
+HTTP_CODE=$(curl -s -o "$CURL_OUTPUT" -w "%{http_code}" -X POST \
+  -H "Content-Type: application/json" -d '{}' \
+  "$PING_URL/reset" --max-time 30 2>"$CURL_OUTPUT" || printf "000")
+if [ "$HTTP_CODE" = "200" ]; then
+  pass "HF Space is live and responds to /reset"
+elif [ "$HTTP_CODE" = "000" ]; then
+  fail "HF Space not reachable (connection failed or timed out)"
+  hint "Check your network connection and that the Space is running."
+  hint "Try: curl -s -o /dev/null -w '%%{http_code}' -X POST $PING_URL/reset"
+  stop_at "Step 1"
+else
+  fail "HF Space /reset returned HTTP $HTTP_CODE (expected 200)"
+  hint "Make sure your Space is running and the URL is correct."
+  hint "Try opening $PING_URL in your browser first."
+  stop_at "Step 1"
+fi
+log "${BOLD}Step 2/3: Running docker build${NC} ..."
+if ! command -v docker &>/dev/null; then
+  fail "docker command not found"
+  hint "Install Docker: https://docs.docker.com/get-docker/"
+  stop_at "Step 2"
+fi
+if [ -f "$REPO_DIR/Dockerfile" ]; then
+  DOCKER_CONTEXT="$REPO_DIR"
+elif [ -f "$REPO_DIR/server/Dockerfile" ]; then
+  DOCKER_CONTEXT="$REPO_DIR/server"
+else
+  fail "No Dockerfile found in repo root or server/ directory"
+  stop_at "Step 2"
+fi
+log "  Found Dockerfile in $DOCKER_CONTEXT"
+BUILD_OK=false
+BUILD_OUTPUT=$(run_with_timeout "$DOCKER_BUILD_TIMEOUT" docker build "$DOCKER_CONTEXT" 2>&1) && BUILD_OK=true
+if [ "$BUILD_OK" = true ]; then
+  pass "Docker build succeeded"
+else
+  fail "Docker build failed (timeout=${DOCKER_BUILD_TIMEOUT}s)"
+  printf "%s\n" "$BUILD_OUTPUT" | tail -20
+  stop_at "Step 2"
+fi
+log "${BOLD}Step 3/3: Running openenv validate${NC} ..."
+if ! command -v openenv &>/dev/null; then
+  fail "openenv command not found"
+  hint "Install it: pip install openenv-core"
+  stop_at "Step 3"
+fi
+VALIDATE_OK=false
+VALIDATE_OUTPUT=$(cd "$REPO_DIR" && openenv validate 2>&1) && VALIDATE_OK=true
+if [ "$VALIDATE_OK" = true ]; then
+  pass "openenv validate passed"
+  [ -n "$VALIDATE_OUTPUT" ] && log "  $VALIDATE_OUTPUT"
+else
+  fail "openenv validate failed"
+  printf "%s\n" "$VALIDATE_OUTPUT"
+  stop_at "Step 3"
+fi
+printf "\n"
+printf "${BOLD}========================================${NC}\n"
+printf "${GREEN}${BOLD}  All 3/3 checks passed!${NC}\n"
+printf "${GREEN}${BOLD}  Your submission is ready to submit.${NC}\n"
+printf "${BOLD}========================================${NC}\n"
+printf "\n"
+exit 0