Aman Khare
fix: add HF Spaces YAML metadata to README
63e17b8
metadata
title: Clinical Note Scribe
emoji: πŸ₯
colorFrom: blue
colorTo: green
sdk: docker
pinned: false

Clinical Note Scribe

An OpenEnv-compliant environment for evaluating AI agents on clinical SOAP-note generation from doctor–patient transcripts.

Built for the Meta Γ— Hugging Face OpenEnv Hackathon.

#Clinical Note Scribe Medical documentation is one of the most time-consuming parts of a doctor's day. After every patient visit, clinicians spend significant time converting spoken conversations into structured clinical notes β€” time that could be spent on patient care instead. Clinical Note Scribe is a reinforcement learning training environment where an AI agent learns to do exactly that: listen to a doctor-patient conversation and produce a well-structured, accurate, and safe clinical note in SOAP format.

Interesting part

The interesting part about this project is that we have created a frontend which shiws live what number of reward is given on the beasis of submition.

#Three levels of difficulty Tasks range from a routine check-up all the way to a chaotic ER visit with overlapping symptoms and urgent orders β€” each with its own grader and reward logic.


Environment Description

A doctor–patient conversation is recorded as a text transcript. The agent's goal is to read the transcript along with structured patient context (demographics, medications, labs) and produce a clinically accurate, concise SOAP note (Subjective, Objective, Assessment, Plan).

The agent interacts through a standard reset() / step() / state() API. Three action types are available: submit a full note, request clarification, or revise a single section. A multi-signal reward function scores each submission on clinical accuracy, conciseness, safe language, and structural validity, with penalties for excessive steps or invalid actions.


Observation Space

Field Type Description
transcript str Full doctor–patient transcript for the current task
task_id str Unique identifier for the active task
patient_context dict[str, Any] Structured patient demographics, conditions, medications, allergies, and labs
current_draft Optional[str] The agent's most recent SOAP-note draft (null until first submission or revision)
errors_so_far list[str] Accumulated error/feedback messages from prior invalid actions
step_count int Number of steps taken so far in the current episode (0-indexed at reset)

Action Space

Field Type Description
action_type Literal["submit_note", "request_clarify", "revise_section"] Required. The kind of action the agent is taking
soap_note Optional[SOAPNote] Complete SOAP note β€” required when action_type == "submit_note"
section Optional[Literal["S", "O", "A", "P"]] Which SOAP section to revise β€” required when action_type == "revise_section"
revision_text Optional[str] Replacement text for the section β€” required when action_type == "revise_section"
clarify_question Optional[str] Free-text question β€” required when action_type == "request_clarify"

SOAPNote Schema

Field Type Description
subjective str Patient's self-reported symptoms, history, and concerns
objective str Clinician's measurable findings β€” vitals, exam, labs, imaging
assessment str Differential diagnoses and clinical reasoning
plan str Treatment plan, medications, follow-ups, referrals

Tasks

🟒 Easy β€” Routine Check-Up

Task ID: easy_routine_checkup Β· Max steps: 5

A 6-turn dialogue about a common cold and blood pressure screening for a 34-year-old female. Straightforward clinical picture with no complications.

🟑 Medium β€” Chronic Disease Follow-Up

Task ID: medium_chronic_disease_followup Β· Max steps: 8

A 14-turn follow-up visit for a 58-year-old male with Type 2 Diabetes and Hypertension. Includes HbA1c lab review (7.2% β†’ 7.8%), medication adjustments (adding glipizide 5 mg, uptitrating lisinopril 20 β†’ 40 mg), a 2-week statin gap, and dietary counselling around restaurant meals.

πŸ”΄ Hard β€” Complex ER Visit

Task ID: hard_complex_er_visit Β· Max steps: 10

A rapid 20-turn emergency-room encounter for a 72-year-old female with CAD, AFib, and CKD Stage 3. Overlapping chest pain and shortness of breath with a dual ACS vs PE differential. Includes a patient self-contradiction (denied then admitted nitroglycerin use at home), contrast dye allergy complicating CT-PA workup (V/Q scan ordered instead), elevated D-dimer (1840 ng/mL), and Cardiac ICU admission.


Reward Function

value = clamp(weighted_sum βˆ’ deductions, 0.0, 1.0)
Signal Weight Criteria
grader_score Γ— 0.60 Clinical accuracy from task-specific grader
conciseness_bonus Γ— 0.10 1.0 if total SOAP note ≀ 400 words
safe_language_score Γ— 0.15 1.0 if no unsafe-certainty phrases detected
format_valid Γ— 0.15 1.0 if all four SOAP fields are non-empty
Deduction Rate Trigger
Step penalty βˆ’0.05 Per step beyond 3 (penalises excessive clarification)
Error penalty βˆ’0.10 Per invalid action in errors_so_far

Installation

Prerequisites

  • Python 3.11+
  • An OpenAI-compatible API key (set as HF_TOKEN)

Local Setup

# Clone the repository
git clone https://github.com/<your-org>/meta-huggingface-hackathon-team-silver-orca.git
cd meta-huggingface-hackathon-team-silver-orca

# Create a virtual environment (optional but recommended)
python -m venv venv
# On Linux/macOS:
source venv/bin/activate
# On Windows:
# venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Docker Setup

docker build -t meta-huggingface-hackathon-team-silver-orca .

Usage

1. Start the Environment Server

The environment runs as a REST API. Start the server first before running the agent.

Using Python

uvicorn server.app:app --host 0.0.0.0 --port 7860

Using Docker

docker run -p 7860:7860 meta-huggingface-hackathon-team-silver-orca

2. Run the Agent (Inference)

In another terminal, run the baseline inference script which will interact with the running environment:

On Linux/macOS:

export HF_TOKEN="sk-..."
export MODEL_NAME="gpt-4o-mini"           # or any OpenAI-compatible model
export API_BASE_URL="https://api.openai.com/v1"
python inference.py

On Windows (PowerShell):

$env:HF_TOKEN="sk-..."
$env:MODEL_NAME="gpt-4o-mini"
$env:API_BASE_URL="https://api.openai.com/v1"
python inference.py

API Endpoints

Method Path Description
GET /health Liveness probe β†’ {"status": "ok"}
POST /reset Start a new episode β†’ Observation
POST /step Submit an action β†’ {observation, reward, done, info}
GET /state Inspect environment state β†’ EnvironmentState

Baseline Scores

Scores obtained using gpt-4o-mini with temperature=0.2 via inference.py:

Task Difficulty Score
easy_routine_checkup 🟒 Easy 0.8520
medium_chronic_disease_followup 🟑 Medium 0.7450
hard_complex_er_visit πŸ”΄ Hard 0.5110
Average 0.7026

Note: These baseline scores use dynamic clinical graders that check for explicit diagnoses and strict formatting. Scores will accurately vary based on the specific LLM used.


Structured Logging

Every episode emits JSON log lines to stdout, scraped by the OpenEnv validator:

{"event": "START", "task_id": "easy_routine_checkup", "timestamp": 1700000000.0}
{"event": "STEP",  "step": 1, "action_type": "submit_note", "reward": 0.82}
{"event": "END",   "task_id": "easy_routine_checkup", "final_score": 0.82}

Project Structure

meta-huggingface-hackathon-team-silver-orca/
β”œβ”€β”€ openenv.yaml              ← OpenEnv spec metadata + graders
β”œβ”€β”€ inference.py              ← Baseline inference (OpenAI client, all 3 tasks)
β”œβ”€β”€ Dockerfile                ← Containerised server (port 7860)
β”œβ”€β”€ README.md                 ← This file
β”œβ”€β”€ requirements.txt
β”‚
β”œβ”€β”€ environment/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ models.py             ← Pydantic v2 models (Observation, Action, Reward, …)
β”‚   β”œβ”€β”€ env.py                ← ClinicalNoteScribeEnv (reset/step/state)
β”‚   β”œβ”€β”€ reward.py             ← Multi-signal reward function
β”‚   └── tasks/
β”‚       β”œβ”€β”€ __init__.py       ← Task & grader registries
β”‚       β”œβ”€β”€ task_easy.py      ← Routine check-up + grader stub
β”‚       β”œβ”€β”€ task_medium.py    ← Chronic disease follow-up + grader stub
β”‚       └── task_hard.py      ← Complex ER visit + grader stub
β”‚
β”œβ”€β”€ server/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ app.py                ← FastAPI application
β”‚   └── routes.py             ← API route definitions
β”‚
└── data/
    β”œβ”€β”€ transcripts/
    β”‚   β”œβ”€β”€ easy.txt           ← 6-turn routine check-up transcript
    β”‚   β”œβ”€β”€ medium.txt         ← 14-turn chronic disease follow-up transcript
    β”‚   └── hard.txt           ← 20-turn complex ER visit transcript
    └── clarify_answers.json   ← Clarification Q&A lookup (10 entries)

License

MIT