Spaces:
Sleeping
Sleeping
| spec_version: 1 | |
| name: pharma_vigilance_env | |
| display_name: "Pharmacovigilance Signal Detector" | |
| description: > | |
| A real-world OpenEnv environment where an AI agent acts as a pharmacovigilance | |
| analyst. The agent reviews synthetic adverse-event cases, decides whether they | |
| represent known labeled effects, emerging safety signals, or low-value noise, | |
| and recommends the correct operational follow-up. Tasks cover known class | |
| effects, clustered post-marketing signal detection, and confounded | |
| drug-drug-interaction cases that require causal reasoning rather than surface | |
| blame assignment. | |
| type: space | |
| runtime: fastapi | |
| app: server.app:app | |
| port: 7860 | |
| tags: | |
| - openenv | |
| - healthcare | |
| - pharmacovigilance | |
| - drug-safety | |
| - reinforcement-learning | |
| action_space: | |
| type: structured | |
| fields: | |
| - name: classification | |
| type: string | |
| values: [new_signal, known_side_effect, noise, duplicate] | |
| description: "Top-level safety classification chosen by the agent" | |
| - name: suspect_drug | |
| type: string | |
| description: "Drug or drug interaction the agent believes is causally responsible" | |
| - name: severity_assessment | |
| type: string | |
| values: [mild, moderate, severe, critical] | |
| description: "Agent-assigned clinical severity for the case" | |
| - name: recommended_action | |
| type: string | |
| values: [escalate, log_and_monitor, dismiss, request_more_info] | |
| description: "Operational pharmacovigilance follow-up decision" | |
| - name: reasoning | |
| type: string | |
| description: "Brief free-text rationale used for partial credit on the hard task" | |
| - name: confidence | |
| type: integer | |
| required: false | |
| description: "Optional analyst confidence from 0 to 100 used for calibration-aware reward shaping" | |
| observation_space: | |
| type: structured | |
| fields: | |
| - name: task_id | |
| type: string | |
| description: "Identifier of the current pharmacovigilance task" | |
| - name: reports | |
| type: array | |
| description: "One or more synthetic adverse-event reports included in the case" | |
| - name: drug_interaction_db | |
| type: object | |
| description: "Hardcoded safety and interaction reference data visible to the agent" | |
| - name: step_number | |
| type: integer | |
| description: "Current step index within the episode" | |
| - name: max_steps | |
| type: integer | |
| description: "Maximum number of steps allowed in the episode" | |
| - name: feedback | |
| type: string | |
| required: false | |
| description: "Human-readable feedback from the previous action" | |
| reward: | |
| min: -0.25 | |
| max: 1.0 | |
| description: > | |
| Reward is computed over a staged pharmacovigilance decision pipeline: | |
| classification, causal suspect selection, severity assessment, and | |
| operational action. The environment then adds a consistency bonus when the | |
| full chain of sub-decisions is internally coherent, and applies a | |
| calibration-aware confidence adjustment for well-calibrated or dangerously | |
| overconfident answers. A false alarm penalty of -0.10 applies when the | |
| agent escalates a case that is truly noise, and a larger missed-signal | |
| penalty of -0.20 applies when the agent dismisses a true new signal. The | |
| hard task can earn an additional +0.05 reasoning bonus when the | |
| explanation explicitly references the interaction mechanism or therapeutic | |
| drug monitoring clues. Step-level rewards may dip slightly below zero for | |
| clearly unsafe or suboptimal behavior, while final grader scores remain | |
| deterministic and normalized for evaluation. | |
| difficulties: | |
| - easy | |
| - medium | |
| - hard | |
| max_steps: 2 | |
| tasks: | |
| - id: known_signal_easy | |
| difficulty: easy | |
| description: > | |
| Review a synthetic single-patient report in which an ACE inhibitor is | |
| followed by persistent dry cough and many similar recent cases already | |
| exist. The correct behavior is to recognize this as a known labeled effect | |
| and recommend routine logging and monitoring rather than escalation. | |
| grader: graders.known_signal_easy_grader | |
| - id: cluster_signal_medium | |
| difficulty: medium | |
| description: > | |
| Review a clustered set of recent post-marketing reports tied to a newly | |
| launched cardiovascular therapy. The reports show symptomatic bradycardia | |
| and near-syncope despite the label lacking rhythm-related warnings. The | |
| agent should detect an emerging signal and escalate. | |
| grader: graders.cluster_signal_medium_grader | |
| - id: confounded_hard | |
| difficulty: hard | |
| description: > | |
| Review a confounded transplant-medicine case in which the reporter blames | |
| the wrong drug. The correct judgment requires identifying a tacrolimus and | |
| voriconazole interaction, recognizing acute kidney injury risk from toxic | |
| exposure, and escalating the case as a clinically serious new signal. | |
| grader: graders.confounded_hard_grader | |