File size: 4,963 Bytes
f2beac3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60c0453
 
 
 
 
 
 
f2beac3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9ab33d8
 
 
60c0453
 
 
 
 
 
 
 
 
 
 
9ab33d8
 
 
f2beac3
 
 
 
 
 
60c0453
f2beac3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
spec_version: 1
name: pharma_vigilance_env
display_name: "Pharmacovigilance Signal Detector"
description: >
  A real-world OpenEnv environment where an AI agent acts as a pharmacovigilance
  analyst. The agent reviews synthetic adverse-event cases, decides whether they
  represent known labeled effects, emerging safety signals, or low-value noise,
  and recommends the correct operational follow-up. Tasks cover known class
  effects, clustered post-marketing signal detection, and confounded
  drug-drug-interaction cases that require causal reasoning rather than surface
  blame assignment.
type: space
runtime: fastapi
app: server.app:app
port: 7860
tags:
  - openenv
  - healthcare
  - pharmacovigilance
  - drug-safety
  - reinforcement-learning

action_space:
  type: structured
  fields:
    - name: classification
      type: string
      values: [new_signal, known_side_effect, noise, duplicate]
      description: "Top-level safety classification chosen by the agent"
    - name: suspect_drug
      type: string
      description: "Drug or drug interaction the agent believes is causally responsible"
    - name: severity_assessment
      type: string
      values: [mild, moderate, severe, critical]
      description: "Agent-assigned clinical severity for the case"
    - name: recommended_action
      type: string
      values: [escalate, log_and_monitor, dismiss, request_more_info]
      description: "Operational pharmacovigilance follow-up decision"
    - name: reasoning
      type: string
      description: "Brief free-text rationale used for partial credit on the hard task"
    - name: confidence
      type: integer
      required: false
      description: "Optional analyst confidence from 0 to 100 used for calibration-aware reward shaping"

observation_space:
  type: structured
  fields:
    - name: task_id
      type: string
      description: "Identifier of the current pharmacovigilance task"
    - name: reports
      type: array
      description: "One or more synthetic adverse-event reports included in the case"
    - name: drug_interaction_db
      type: object
      description: "Hardcoded safety and interaction reference data visible to the agent"
    - name: step_number
      type: integer
      description: "Current step index within the episode"
    - name: max_steps
      type: integer
      description: "Maximum number of steps allowed in the episode"
    - name: feedback
      type: string
      required: false
      description: "Human-readable feedback from the previous action"

reward:
  min: -0.25
  max: 1.0
  description: >
    Reward is computed over a staged pharmacovigilance decision pipeline:
    classification, causal suspect selection, severity assessment, and
    operational action. The environment then adds a consistency bonus when the
    full chain of sub-decisions is internally coherent, and applies a
    calibration-aware confidence adjustment for well-calibrated or dangerously
    overconfident answers. A false alarm penalty of -0.10 applies when the
    agent escalates a case that is truly noise, and a larger missed-signal
    penalty of -0.20 applies when the agent dismisses a true new signal. The
    hard task can earn an additional +0.05 reasoning bonus when the
    explanation explicitly references the interaction mechanism or therapeutic
    drug monitoring clues. Step-level rewards may dip slightly below zero for
    clearly unsafe or suboptimal behavior, while final grader scores remain
    deterministic and normalized for evaluation.

difficulties:
  - easy
  - medium
  - hard

max_steps: 2

tasks:
  - id: known_signal_easy
    difficulty: easy
    description: >
      Review a synthetic single-patient report in which an ACE inhibitor is
      followed by persistent dry cough and many similar recent cases already
      exist. The correct behavior is to recognize this as a known labeled effect
      and recommend routine logging and monitoring rather than escalation.
    grader: graders.known_signal_easy_grader

  - id: cluster_signal_medium
    difficulty: medium
    description: >
      Review a clustered set of recent post-marketing reports tied to a newly
      launched cardiovascular therapy. The reports show symptomatic bradycardia
      and near-syncope despite the label lacking rhythm-related warnings. The
      agent should detect an emerging signal and escalate.
    grader: graders.cluster_signal_medium_grader

  - id: confounded_hard
    difficulty: hard
    description: >
      Review a confounded transplant-medicine case in which the reporter blames
      the wrong drug. The correct judgment requires identifying a tacrolimus and
      voriconazole interaction, recognizing acute kidney injury risk from toxic
      exposure, and escalating the case as a clinically serious new signal.
    grader: graders.confounded_hard_grader