πŸ«€ SepsisPilot β€” OpenEnv

Reinforcement learning environment for optimal sepsis treatment sequencing
Meta PyTorch OpenEnv Hackathon 2026 β€” Submission

OpenEnv HF Spaces Python 3.11 FastAPI


Environment Description & Motivation

Sepsis kills ~11 million people per year β€” yet optimal treatment sequencing remains one of the hardest challenges in critical care. The right antibiotic at the right time, combined with precise vasopressor dosing, can mean the difference between survival and multi-organ failure.

SepsisPilot simulates an ICU sepsis patient at hourly resolution. An AI agent observes real clinical vitals (MAP, lactate, WBC, temperature, heart rate, creatinine) and decides which antibiotic and vasopressor combination to administer each hour. The environment models realistic physiology including:

  • Gram-specific antibiotic efficacy β€” broad-spectrum covers gram-negative; narrow-spectrum (vancomycin) covers gram-positive
  • Antibiotic resistance accumulation β€” repeated suboptimal antibiotic use degrades efficacy
  • Haemodynamic-metabolic coupling β€” low MAP causes tissue ischaemia (rising lactate), compensatory tachycardia
  • Renal vasoconstriction β€” high-dose vasopressors raise MAP but risk acute kidney injury

This is a real clinical problem with life-or-death stakes, well-defined physiology, meaningful partial progress signals, and a genuinely hard exploration challenge β€” making it an ideal RL environment.


Action Space

ID Name Description
0 no_treatment Watchful waiting β€” no intervention
1 broad_antibiotics Piperacillin-tazobactam β€” gram-negative coverage
2 narrow_antibiotics Vancomycin β€” gram-positive / MRSA coverage
3 low_vasopressor Norepinephrine 0.1 mcg/kg/min β€” raises MAP
4 high_vasopressor Norepinephrine 0.3 mcg/kg/min β€” raises MAP more; ⚠ renal risk
5 broad_plus_low_vaso Broad-spectrum AB + low-dose vasopressor
6 broad_plus_high_vaso Broad-spectrum AB + high-dose vasopressor
7 narrow_plus_low_vaso Narrow-spectrum AB + low-dose vasopressor
8 narrow_plus_high_vaso Narrow-spectrum AB + high-dose vasopressor

Type: Discrete Β· n: 9


Observation Space

Field Unit Normal Range Clinical Meaning
map_mmhg mmHg 70–100 Mean Arterial Pressure β€” sepsis goal β‰₯ 65
lactate mmol/L 0.5–2.0 Tissue ischaemia marker β€” target < 2.0
wbc k/uL 4–11 White blood cells β€” infection proxy
temperature Β°C 36.5–37.5 Fever indicates active infection
heart_rate bpm 60–100 Tachycardia in sepsis
creatinine mg/dL 0.6–1.2 Renal function β€” rises with AKI
sofa_score 0–24 0–2 Multi-organ failure composite
resistance 0–1 0.0 Antibiotic resistance index (hard task)
step_fraction 0–1 β€” Fraction of episode elapsed

Type: Continuous Β· Shape: [9]


Task Descriptions

Task 1 β€” mild_sepsis Β· Easy

  • Scenario: Mild sepsis secondary to gram-negative urinary tract infection
  • Initial state: MAP 65, Lactate 2.5, WBC 14, Temp 38.2Β°C
  • Optimal strategy: Broad-spectrum antibiotics; vasopressors only if MAP drops below 65
  • Max steps: 24 (24 hours)
  • Expected baseline score: 0.55–0.75
  • Key challenge: Learning that broad-spectrum is the right antibiotic class

Task 2 β€” septic_shock Β· Medium

  • Scenario: Septic shock from gram-positive bacteraemia (MRSA suspected)
  • Initial state: MAP 52 ⚠, Lactate 4.2, WBC 18, Temp 38.9Β°C
  • Optimal strategy: Immediate vasopressors + narrow-spectrum antibiotics (vancomycin). Every delayed hour increases organ failure risk.
  • Max steps: 48 (48 hours)
  • Expected baseline score: 0.35–0.60
  • Key challenge: Correctly identifying gram-positive infection; mandatory haemodynamic support

Task 3 β€” severe_mods Β· Hard

  • Scenario: Severe sepsis with Multi-Organ Dysfunction Syndrome (MODS). Mixed drug-resistant infection.
  • Initial state: MAP 42 ⚠⚠, Lactate 7.0, WBC 22, Temp 39.6Β°C, Creatinine 2.2
  • Optimal strategy: Broad-spectrum first (2 steps) β†’ switch to narrow-spectrum β†’ maintain MAP β‰₯ 65 with lowest effective vasopressor dose
  • Max steps: 72 (72 hours)
  • Expected baseline score: 0.20–0.45
  • Key challenge: Precise antibiotic sequencing to manage resistance; renal protection; multi-objective optimisation

Reward Function

Dense rewards at every timestep (not just episode end):

Per step:
  +0.35  MAP β‰₯ 65 mmHg                  (haemodynamic stability)
  +0.30  Lactate < 2.0 mmol/L           (tissue perfusion restored)
  +0.10  WBC in 4–12 k/uL              (infection controlled)
  +0.08  Temperature 36–38Β°C            (fever resolved)
  +0.05  Creatinine improving            (renal protection)
  βˆ’0.15  Resistance increasing           (wrong antibiotic penalty)
  βˆ’0.025 Per step                        (time pressure)

Terminal:
  +5.00  All vitals stable               (full stabilisation bonus)
  βˆ’8.00  Patient death                   (MAP < 35 or Lactate > 15)

Range: approximately βˆ’8.0 to +5.775 per step.


Grader (0.0 β†’ 1.0)

Each completed episode is scored by a task-specific grader:

Component Easy Medium Hard
Survival 40% 30% 25%
MAP normalisation 25% 20% β€”
Lactate clearance 20% 15% β€”
Vital combo (MAP + lactate) β€” β€” 20%
WBC / temperature 10% 5% β€”
Correct antibiotic class β€” 15% β€”
Vasopressor usage β€” 5% β€”
Antibiotic sequencing β€” β€” 15%
Resistance management β€” β€” 15%
Renal protection β€” β€” 15%
Speed bonus 5% 5% 10%

Baseline Scores

Baseline LLM agent (Nemotron 3 Super via NVIDIA API, seed=42):

Task Baseline Score Notes
mild_sepsis ~0.62 LLM correctly identifies broad-spectrum; moderate speed
septic_shock ~0.44 Often misses narrow-spectrum; vasopressors applied correctly
severe_mods ~0.31 Sequencing rarely optimal; resistance accumulates

Random agent (action sampled uniformly):

Task Random Score
mild_sepsis ~0.35
septic_shock ~0.18
severe_mods ~0.08

Setup & Usage

Quick Start (Docker)

# Build
docker build -t sepsispilot .

# Run
docker run -p 7860:7860 sepsispilot

# Test
curl http://localhost:7860/health

Local Development

# Install dependencies
pip install -r requirements.txt

# Start server
uvicorn app:app --host 0.0.0.0 --port 7860 --reload

# Open dashboard
open http://localhost:7860

Running the Baseline Agent

export OPENAI_API_KEY="your-api-key"
export API_BASE_URL="https://integrate.api.nvidia.com/v1"
export MODEL_NAME="nvidia/llama-3.1-nemotron-70b-instruct"
export ENV_BASE_URL="http://localhost:7860"

python inference.py
# Runs all 3 tasks, 1 episode each (seed=42)

python inference.py --episodes 3 --seed 42
# 3 episodes per task

python inference.py --task mild_sepsis
# Single task only

Running Tests

pip install pytest
pytest tests/ -v

Pre-Submission Validation

# With server running:
python validate.py --url http://localhost:7860

API Reference

Method Endpoint Description
POST /reset Start new episode {"task": "mild_sepsis", "seed": 42}
POST /step Take action {"action": 5}
GET /state Current patient state
GET /grade Score completed episode (0.0–1.0)
GET /tasks List all tasks
GET /health Server health check
GET / Interactive visual dashboard

Interactive API docs: http://localhost:7860/docs


Project Structure

sepsispilot/
β”œβ”€β”€ openenv.yaml          ← OpenEnv spec config
β”œβ”€β”€ Dockerfile            ← HF Spaces container
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
β”œβ”€β”€ inference.py          ← Baseline LLM agent (mandatory)
β”œβ”€β”€ validate.py           ← Pre-submission validation
β”œβ”€β”€ app.py                ← FastAPI HTTP server + dashboard
β”œβ”€β”€ environment/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ models.py         ← Pydantic typed models
β”‚   β”œβ”€β”€ patient_sim.py    ← Physiology simulation engine
β”‚   β”œβ”€β”€ graders.py        ← Episode scoring (0.0–1.0)
β”‚   └── env.py            ← OpenEnv class (reset/step/state/grade)
└── tests/
    └── test_env.py       ← Unit tests (pytest)

Environment Variables

Variable Required Default Description
OPENAI_API_KEY βœ… β€” API key for LLM endpoint
API_BASE_URL βœ… https://integrate.api.nvidia.com/v1 LLM endpoint
MODEL_NAME βœ… nvidia/llama-3.1-nemotron-70b-instruct Model name
HF_TOKEN βœ… β€” Hugging Face token
ENV_BASE_URL ❌ http://localhost:7860 Environment server URL

Design Decisions

Why sepsis? It's a genuine, high-stakes clinical problem where optimal treatment sequencing has enormous impact. The physiology is well-understood but the decision-making is hard β€” perfect for RL.

Why synthetic simulation instead of direct MIMIC-IV replay? MIMIC-IV access requires credentialing. The simulation is calibrated to match MIMIC-IV population statistics, making it accessible while remaining medically realistic. An RL agent trained here can be evaluated against real MIMIC-IV data in future work.

Why dense rewards? Sepsis treatment spans 24–72 hours. Episode-end-only rewards create too sparse a signal for meaningful learning. Per-step vital sign improvements provide rich learning signal throughout.


Built with ❀️ for the Meta PyTorch OpenEnv Hackathon 2026

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support