π₯ Clinical Records Processor β OpenEnv
An AI agent environment for processing real-world clinical records β a genuine daily operational challenge in healthcare. Agents must extract, structure, and reconcile medical data from unstructured clinical notes.
Why this environment?
Every clinic and hospital deals with unstructured clinical notes. Converting these to structured data enables:
- EMR/EHR integration (ABDM/FHIR compatibility)
- Clinical decision support
- Insurance pre-authorization
- Population health analytics
This environment simulates three core tasks that healthcare AI systems must perform daily.
Tasks
Task 1: Demographics Extraction (Easy)
Objective: Extract structured patient demographics from a free-text clinical note.
Fields to extract: name, age, gender, chief_complaint, duration, pain_score, blood_pressure, known_allergies
Difficulty: Easy β note is well-structured with clear field values. Agent must locate and normalize each field.
Expected baseline score: ~0.75β0.85
Task 2: SOAP Note Structuring (Medium)
Objective: Parse a narrative clinical encounter into the standard SOAP format (Subjective, Objective, Assessment, Plan).
The note intentionally mixes content from all sections, and the agent must correctly classify each piece of information.
Difficulty: Medium β requires understanding of clinical terminology to avoid cross-contamination between sections. Grader penalizes content placed in wrong sections.
Expected baseline score: ~0.55β0.70
Task 3: Medication Reconciliation (Hard)
Objective: Reconcile a complex 3-encounter medication history. Identify current medications (with correct doses), discontinued medications, dose changes, drug interactions, and patient non-adherence events.
Difficulty: Hard β involves tracking changes across multiple encounters, recognizing drug interactions (Warfarin+Ibuprofen, Glipizide+Ibuprofen), and detecting patient self-discontinuation.
Expected baseline score: ~0.35β0.55
Action & Observation Spaces
Observation
{
"task_id": "string",
"task_description": "string β full extraction schema and instructions",
"clinical_note": "string β the raw clinical note",
"attempt_number": "integer β current step number",
"last_feedback": "string | null β grader feedback from previous step",
"max_attempts": "integer β max steps for this task"
}
Action
{
"extraction": "string β valid JSON matching the task schema",
"done": "boolean β set true to end episode early"
}
Reward Function
Rewards are non-sparse and shaped to encourage iterative improvement:
- Base reward: Grader score (0.0β1.0) on the submitted extraction
- Improvement bonus: +0.05 when agent improves on its best score
- Stagnation penalty: β0.02 per step (after step 2) when agent fails to improve
- Final score: Best raw grader score achieved across all steps
- All rewards clamped to [0.0, 1.0]
Each grader uses weighted field scoring with partial credit β agents receive signal for each correctly extracted field even if the overall answer is incomplete.
API Endpoints
| Method | Path | Description |
|---|---|---|
POST |
/reset |
Start a new episode. Body: {"task_id": "..."} |
POST |
/step |
Submit extraction. Body: {"extraction": "...", "done": false} |
GET |
/state |
Get current environment state |
GET |
/health |
Health check |
GET |
/ |
Metadata |
Setup & Usage
Docker
docker build -t clinical-records-processor .
docker run -p 7860:7860 clinical-records-processor
Local development
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 7860 --reload
Run baseline inference
export API_BASE_URL="https://api.openai.com/v1"
export MODEL_NAME="gpt-4o-mini"
export HF_TOKEN="your-api-key"
export ENV_URL="http://localhost:7860"
python inference.py
Validate submission
openenv validate
./validate-submission.sh https://your-space.hf.space
Baseline Scores (gpt-4o-mini)
| Task | Score | Notes |
|---|---|---|
| demographics_extraction | ~0.80 | Strong on structured fields |
| soap_structuring | ~0.62 | Occasional section contamination |
| medication_reconciliation | ~0.45 | Struggles with multi-encounter tracking |
| Average | ~0.62 |
Environment Variables (required for inference)
| Variable | Description |
|---|---|
API_BASE_URL |
LLM API endpoint |
MODEL_NAME |
Model identifier |
HF_TOKEN |
API key (Hugging Face or OpenAI) |
Project Structure
healthx-openenv/
βββ Dockerfile
βββ openenv.yaml
βββ requirements.txt
βββ inference.py # Baseline agent script
βββ README.md
βββ app/
βββ main.py # FastAPI server
βββ models.py # Pydantic models (OpenEnv spec)
βββ env.py # Environment logic + reward shaping
βββ tasks/
βββ task1_demographics.py # Easy task + grader
βββ task2_soap.py # Medium task + grader
βββ task3_medication.py # Hard task + grader
Relevance to ABDM / Indian Healthtech
This environment directly addresses challenges in India's ABDM ecosystem:
- PHR integration requires structured patient demographics
- FHIR R4 compliance requires SOAP-structured notes
- Medication management is core to HIP/HIU workflows
Agents trained on this environment can accelerate EMR integration across the 500K+ clinics in India's digital health network.