πŸ₯ Clinical Records Processor β€” OpenEnv

An AI agent environment for processing real-world clinical records β€” a genuine daily operational challenge in healthcare. Agents must extract, structure, and reconcile medical data from unstructured clinical notes.

openenv


Why this environment?

Every clinic and hospital deals with unstructured clinical notes. Converting these to structured data enables:

  • EMR/EHR integration (ABDM/FHIR compatibility)
  • Clinical decision support
  • Insurance pre-authorization
  • Population health analytics

This environment simulates three core tasks that healthcare AI systems must perform daily.


Tasks

Task 1: Demographics Extraction (Easy)

Objective: Extract structured patient demographics from a free-text clinical note.

Fields to extract: name, age, gender, chief_complaint, duration, pain_score, blood_pressure, known_allergies

Difficulty: Easy β€” note is well-structured with clear field values. Agent must locate and normalize each field.

Expected baseline score: ~0.75–0.85


Task 2: SOAP Note Structuring (Medium)

Objective: Parse a narrative clinical encounter into the standard SOAP format (Subjective, Objective, Assessment, Plan).

The note intentionally mixes content from all sections, and the agent must correctly classify each piece of information.

Difficulty: Medium β€” requires understanding of clinical terminology to avoid cross-contamination between sections. Grader penalizes content placed in wrong sections.

Expected baseline score: ~0.55–0.70


Task 3: Medication Reconciliation (Hard)

Objective: Reconcile a complex 3-encounter medication history. Identify current medications (with correct doses), discontinued medications, dose changes, drug interactions, and patient non-adherence events.

Difficulty: Hard β€” involves tracking changes across multiple encounters, recognizing drug interactions (Warfarin+Ibuprofen, Glipizide+Ibuprofen), and detecting patient self-discontinuation.

Expected baseline score: ~0.35–0.55


Action & Observation Spaces

Observation

{
  "task_id": "string",
  "task_description": "string β€” full extraction schema and instructions",
  "clinical_note": "string β€” the raw clinical note",
  "attempt_number": "integer β€” current step number",
  "last_feedback": "string | null β€” grader feedback from previous step",
  "max_attempts": "integer β€” max steps for this task"
}

Action

{
  "extraction": "string β€” valid JSON matching the task schema",
  "done": "boolean β€” set true to end episode early"
}

Reward Function

Rewards are non-sparse and shaped to encourage iterative improvement:

  • Base reward: Grader score (0.0–1.0) on the submitted extraction
  • Improvement bonus: +0.05 when agent improves on its best score
  • Stagnation penalty: –0.02 per step (after step 2) when agent fails to improve
  • Final score: Best raw grader score achieved across all steps
  • All rewards clamped to [0.0, 1.0]

Each grader uses weighted field scoring with partial credit β€” agents receive signal for each correctly extracted field even if the overall answer is incomplete.


API Endpoints

Method Path Description
POST /reset Start a new episode. Body: {"task_id": "..."}
POST /step Submit extraction. Body: {"extraction": "...", "done": false}
GET /state Get current environment state
GET /health Health check
GET / Metadata

Setup & Usage

Docker

docker build -t clinical-records-processor .
docker run -p 7860:7860 clinical-records-processor

Local development

pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 7860 --reload

Run baseline inference

export API_BASE_URL="https://api.openai.com/v1"
export MODEL_NAME="gpt-4o-mini"
export HF_TOKEN="your-api-key"
export ENV_URL="http://localhost:7860"

python inference.py

Validate submission

openenv validate
./validate-submission.sh https://your-space.hf.space

Baseline Scores (gpt-4o-mini)

Task Score Notes
demographics_extraction ~0.80 Strong on structured fields
soap_structuring ~0.62 Occasional section contamination
medication_reconciliation ~0.45 Struggles with multi-encounter tracking
Average ~0.62

Environment Variables (required for inference)

Variable Description
API_BASE_URL LLM API endpoint
MODEL_NAME Model identifier
HF_TOKEN API key (Hugging Face or OpenAI)

Project Structure

healthx-openenv/
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ openenv.yaml
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ inference.py          # Baseline agent script
β”œβ”€β”€ README.md
└── app/
    β”œβ”€β”€ main.py           # FastAPI server
    β”œβ”€β”€ models.py         # Pydantic models (OpenEnv spec)
    β”œβ”€β”€ env.py            # Environment logic + reward shaping
    └── tasks/
        β”œβ”€β”€ task1_demographics.py   # Easy task + grader
        β”œβ”€β”€ task2_soap.py           # Medium task + grader
        └── task3_medication.py     # Hard task + grader

Relevance to ABDM / Indian Healthtech

This environment directly addresses challenges in India's ABDM ecosystem:

  • PHR integration requires structured patient demographics
  • FHIR R4 compliance requires SOAP-structured notes
  • Medication management is core to HIP/HIU workflows

Agents trained on this environment can accelerate EMR integration across the 500K+ clinics in India's digital health network.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support