license: mit
language:
- en
base_model:
- Qwen/Qwen3.6-27B
- Qwen/Qwen3.6-35B-A3B
pipeline_tag: image-to-text
tags:
- medical
π₯ MediAgent
Autonomous Multi-Agent Medical Imaging Analysis System
Five specialized AI agents. One radiological verdict. Running entirely on AMD.
AMD Developer Hackathon 2026 Β· Track: Vision & Multimodal AI
Built by Ramyar β Security researcher & full-stack developer, Sulaymaniyah, Iraq
What Is MediAgent?
MediAgent is a production-grade autonomous AI system that analyzes medical images β X-rays, MRI scans, CT scans β through a five-agent pipeline and generates structured, peer-reviewed clinical radiology reports in real time.
Upload an image. Watch five AI agents execute live. Get a formal radiology report with differential diagnoses, ICD-10 codes, a quality score, and a FHIR R4 export ready for any EMR system.
No cloud APIs. No OpenAI. No Nvidia. Pure AMD MI300X inference. Local. Private. Fast.
The Pipeline
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β IMAGE UPLOAD β
β PNG / JPG / DICOM (.dcm) β up to 20 MB β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββ΄βββββββββββββββββ
β PARALLEL STAGE β
βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ
β INTAKE AGENT β β VISION AGENT β
β β β β
β β’ Validates β β β’ Multimodal β
β image payload β β Qwen analysis β
β β’ Normalizes β β β’ Anatomical β
β clinical text β β findings β
β β’ Extracts β β β’ Severity per β
β demographics β β region β
β β’ Safety triage β β β’ Confidence β
β (16 keywords) β β scoring β
β β’ Modality hint β β β’ Anomaly flags β
ββββββββββ¬βββββββββ ββββββββββ¬βββββββββ
ββββββββββββββββ¬βββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββ
β RESEARCH AGENT β
β β
β β’ KB cross-reference β
β (15 conditions) β
β β’ Demographic weight β
β β’ Ranked differentialsβ
β β’ ICD-10 codes β
β β’ Match probabilities β
βββββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββββ
β REPORT AGENT β
β β
β β’ ACR/NICE format β
β β’ Clinical history β
β β’ Technique section β
β β’ Findings narrative β
β β’ Impression + top Dx β
β β’ Recommendations β
βββββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββββ
β CRITIC AGENT β
β β
β β’ Cross-validates β
β report vs findings β
β β’ Quality score 0-100 β
β β’ Uncertainty flags β
β β’ Disclaimer enforce β
βββββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FINAL REPORT β
β Structured JSON Β· PDF Export Β· FHIR R4 DiagnosticReport β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
INTAKE and VISION execute concurrently β cutting wall-clock latency by running the two most expensive operations in parallel. Everything downstream sequences after both complete.
AMD Hardware Stack
| Component | Technology |
|---|---|
| GPU | AMD Instinct MI300X |
| GPU Software | ROCm β AMD's open-source GPU compute platform |
| Inference Server | vLLM (ROCm build) at localhost:8000/v1 |
| Model | Qwen multimodal β native vision + text |
| Backend | FastAPI 0.115 + Uvicorn |
| Frontend | Vanilla JS + Tailwind CSS + SSE streaming |
This project is a direct proof of concept that AMD's ROCm stack is production-viable for real-world medical AI. Every inference call β vision analysis, clinical normalization, report synthesis, peer review, post-report chat β runs on AMD MI300X. Zero CUDA dependency. Zero cloud API calls.
Key Features
π΄ Real-Time SSE Streaming
Watch the pipeline execute live, agent by agent. Every status transition β WAITING β RUNNING β DONE β streams to the dashboard as it happens via Server-Sent Events. Per-agent runtime counters track exactly how long each step takes.
ποΈ Multimodal Vision Analysis
Qwen processes the raw medical image natively. It returns structured JSON: detected modality, technical quality assessment, per-region findings with anatomical names, radiological descriptions, severity levels (NORMAL / INCIDENTAL / SIGNIFICANT / CRITICAL), confidence scores (0β100), and anomaly flags.
π¬ Medical Knowledge Base + ICD-10 Mapping
The Research Agent cross-references vision findings against 15 curated clinical conditions spanning pulmonary, neurological, abdominal, musculoskeletal, and vascular pathology. Every differential diagnosis comes with an ICD-10 code, match probability, and a sentence explaining exactly why the condition matches the findings.
π‘οΈ Critic Agent QA
Every report goes through a peer-review pass before delivery. The Critic checks that all anomalies from the Vision Agent appear in the report, flags low-confidence findings, assigns a quality score (completeness 30% + accuracy 40% + safety 20% + compliance 10%), and hard-caps the score at 40/100 if a core agent failed.
π₯ DICOM Support
Upload real .dcm files. MediAgent extracts 20+ metadata fields β patient name, study date, institution, modality, body part, KVP, slice thickness, pixel spacing, image dimensions β and pre-populates the intake form automatically. MONOCHROME1 inversion and multi-frame handling included.
π FHIR R4 Export
Every report can be exported as a fully conformant HL7 FHIR R4 DiagnosticReport resource. Includes an inline Patient resource, Observation resources, LOINC and SNOMED CT codes, severity mapping, full report text in presentedForm, and custom extensions for AI quality score and pipeline status. Ready to import into Epic, Cerner, or any FHIR-capable EMR.
π¬ Post-Report Clinical Chat
After the report is delivered, a ClinicalAdvisorAgent is available for follow-up questions. It answers in 2β4 sentences with direct reference to the report findings. Qwen's thinking/reasoning mode is explicitly disabled β answers are fast, direct, and clinical.
π Hard Safety Enforcement
- 16 deterministic safety keywords β chest pain, stroke symptoms, acute trauma, hemoptysis, sepsis, spinal trauma, and more β trigger urgent flags regardless of LLM output.
- Age-based alerts β pediatric (<18) and geriatric (>75) cases are automatically flagged for expert review.
- Mandatory AI disclaimer β enforced at two independent layers (Report Agent + Critic Agent) and cannot be bypassed or modified by the LLM.
- Graceful degradation β the pipeline produces a report even if individual agents fail, always marking what succeeded and what didn't.
π Client-Side PDF Export
Full radiology report exported as a formatted PDF directly in the browser using jsPDF β severity color banner, all six report sections, DICOM metadata, QA score. No server round-trip needed.
Agent Architecture
IntakeAgent
Validates the image payload (minimum size, valid base64), applies deterministic safety triage, and normalizes clinical language. For simple inputs under 120 characters it skips the LLM entirely and uses a built-in layman-to-medical term map (22 entries: "can't breathe" β "dyspnea", "lump" β "mass/nodule", "dizzy" β "dizziness/vertigo", etc.). Only calls the LLM for complex clinical narratives with comorbidities or medical history. Falls back cleanly to raw input preservation if the LLM is unavailable.
VisionAgent
Sends the base64 image and clinical context to Qwen at temperature 0.0 with a strict JSON schema enforced via system prompt. Handles malformed enum values from the LLM with safe conversion fallbacks β a single bad field never drops a finding. Tracks token usage and anomaly counts in the output metadata.
ResearchAgent
Pre-filters the knowledge base to only conditions compatible with the detected modality before sending to the LLM β reducing prompt size and improving accuracy. Enforces strict output rules: only conditions from the KB, 2β4 differentials maximum, 5% minimum probability, exact ICD-10 codes, and evidence sentences that actually explain the match.
ReportAgent
Builds a structured prompt with clearly labeled sections β clinical history, imaging technique, findings block, differentials block β and asks the LLM to synthesize them into a formal ACR/NICE radiology report. The disclaimer is overwritten to the exact regulatory string after LLM generation, unconditionally.
CriticAgent
Operates at temperature 0.0 for fully deterministic QA. Receives the draft report and the full pipeline state including raw vision findings. Checks every anomaly is accounted for, flags low-confidence observations, and appends a [QUALITY ASSESSMENT] block to the recommendations section with score, issues, and uncertainty warnings.
ClinicalAdvisorAgent
Activated only after report delivery, scoped to the specific report's findings. Strips all Qwen thinking output via multi-layer regex before returning the answer β handles <think> XML blocks, markdown think fences, and plain-text reasoning preambles.
LLM Client
The LLMClient wraps the OpenAI Python SDK pointed at the local vLLM endpoint. It handles:
- Text completions with optional JSON mode enforcement
- Multimodal completions with base64 image injection
- Token-level streaming with an
on_tokencallback - 3-attempt retry loop with 1-second flat backoff
- 90-second timeout per call
- Dual-strategy JSON extraction: direct parse first, then character-by-character brace-matching fallback for responses where the LLM adds conversational padding
Medical Knowledge Base
15 conditions covering the most common radiological findings across all supported modalities:
| Condition | ICD-10 | Modalities | Severity |
|---|---|---|---|
| Community-Acquired Pneumonia | J18.9 | X-RAY, CT | SIGNIFICANT |
| Cardiogenic Pulmonary Edema | J81.0 | X-RAY, CT | CRITICAL |
| Pleural Effusion | J90 | X-RAY, CT, MRI | SIGNIFICANT |
| Spontaneous Pneumothorax | J93.9 | X-RAY, CT | CRITICAL |
| Intracerebral Hemorrhage | I61.9 | CT, MRI | CRITICAL |
| Ischemic Stroke | I63.9 | CT, MRI | CRITICAL |
| Intracranial Neoplasm | C71.9 | MRI, CT | SIGNIFICANT |
| Abdominal Aortic Aneurysm | I71.4 | CT, MRI | CRITICAL |
| Nephrolithiasis | N20.0 | CT, X-RAY | SIGNIFICANT |
| Small Bowel Obstruction | K56.6 | X-RAY, CT | SIGNIFICANT |
| Long Bone Fracture | S82.902 | X-RAY, CT | SIGNIFICANT |
| Degenerative Joint Disease | M19.90 | X-RAY, MRI | INCIDENTAL |
| Hepatic Steatosis | K76.0 | CT, MRI | INCIDENTAL |
| Herniated Disc | M51.16 | MRI, CT | SIGNIFICANT |
| Pulmonary Nodule | R91.1 | X-RAY, CT | SIGNIFICANT |
API Reference
| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
Clinical dashboard UI |
GET |
/health |
System health, version, active sessions |
GET |
/metrics/gpu |
Live AMD GPU metrics (util, VRAM, temp, power) |
POST |
/analyze |
Synchronous pipeline β full JSON report |
POST |
/analyze/stream |
Real-time SSE streaming pipeline |
GET |
/status/{report_id} |
Poll live pipeline state |
POST |
/chat/{report_id} |
Post-report clinical Q&A |
GET |
/api/docs |
Swagger UI |
GET |
/api/redoc |
ReDoc UI |
/analyze/stream β SSE Event Types
// Agent status update (emitted on every state transition)
{"agent": "VISION", "status": "RUNNING"}
{"agent": "VISION", "status": "DONE"}
// Final report (emitted when pipeline completes)
{"type": "report", "data": {...}, "report_id": "REP-A3F9C2D1B4E7"}
// Error
{"type": "error", "message": "Pipeline produced no report"}
Form Fields (/analyze, /analyze/stream)
| Field | Type | Required | Notes |
|---|---|---|---|
image |
File | β | PNG, JPG, or DICOM (.dcm), max 20 MB |
symptoms |
string | β | Free-text chief complaint |
age |
integer | β | 0β120 |
sex |
string | β | M, F, or O |
clinical_context |
string | β | Medical history, referral details |
Data Models
PatientInput
βββ image_base64, symptoms, age, sex, clinical_context
PipelineState
βββ agent_statuses: {INTAKE, VISION, RESEARCH, REPORT, CRITIC}
βββ intake_output: IntakeOutput
βββ vision_output: VisionOutput
β βββ findings: [VisionFinding, ...]
β βββ anatomical_region, description, severity,
β confidence, confidence_score, is_anomaly
βββ research_output: ResearchOutput
β βββ differential_diagnoses: [KnowledgeMatch, ...]
β βββ condition_name, match_probability,
β supporting_evidence, differential_rank, icd10_code
βββ report_draft: ReportSection
β βββ clinical_history, technique, findings, impression,
β recommendations, disclaimer
βββ final_report: FinalReport
βββ report_id, patient_metadata, sections, vision_summary,
research_summary, overall_severity, agent_pipeline_status,
generation_timestamp
Project Structure
mediagent/
βββ main.py β FastAPI server, all routes, SSE orchestration
βββ core/
β βββ llm.py β LLM client (retry, vision, streaming, JSON extraction)
β βββ models.py β All Pydantic v2 data models
β βββ pipeline.py β Parallel pipeline orchestrator
β βββ dicom.py β DICOM parser (pydicom + numpy + Pillow)
β βββ fhir.py β FHIR R4 DiagnosticReport builder
βββ agents/
β βββ intake.py β Input validation, normalization, safety triage
β βββ vision.py β Multimodal image analysis
β βββ research.py β KB matching, ICD-10, differential diagnosis
β βββ report.py β ACR/NICE radiology report synthesis
β βββ critic.py β QA validation, quality scoring
β βββ advisor.py β Post-report clinical Q&A
βββ static/
β βββ index.html β Full dashboard (Tailwind + Chart.js + SSE)
βββ requirements.txt
βββ .env.example
Getting Started
Prerequisites
- Python 3.12+
- vLLM running a Qwen multimodal model on ROCm, accessible at
http://localhost:8000/v1 - ROCm-compatible AMD GPU (MI300X recommended)
Installation
# Clone the repository
git clone https://github.com/Ramyar2007/mediagent
cd mediagent
# Install Python dependencies
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env and set LLM_BASE_URL to your vLLM endpoint
Environment Variables
LLM_BASE_URL=http://localhost:8000/v1 # vLLM OpenAI-compatible endpoint
LLM_MODEL=/model # Model path served by vLLM
APP_PORT=8090 # Server port
Run
python main.py
Dashboard available at http://localhost:8090
Swagger docs at http://localhost:8090/api/docs
Dependencies
| Package | Version | Purpose |
|---|---|---|
fastapi |
0.115.6 | Web framework |
uvicorn[standard] |
0.34.0 | ASGI server |
openai |
1.58.1 | SDK for vLLM OpenAI-compatible API |
python-multipart |
0.0.20 | Multipart form / file upload |
pydantic |
2.10.5 | Data validation and serialization |
Pillow |
11.1.0 | Image processing for DICOM conversion |
pydicom |
2.4.4 | DICOM file parsing and metadata extraction |
numpy |
1.26.4 | Pixel array normalization for DICOM |
Optional: amdsmi Python library β used automatically when available for more accurate GPU metrics than the rocm-smi CLI fallback.
Clinical Safety
MediAgent is built with clinical safety as a first-class concern, not an afterthought.
Mandatory disclaimer β enforced at two independent code layers and cannot be overridden by any LLM output:
"This analysis is AI-generated and must be reviewed by a licensed radiologist before any clinical decisions are made."
Hard safety rules that run deterministically, without LLM involvement:
- 16 urgent clinical keywords trigger immediate flags before any AI processing
- Pediatric and geriatric age thresholds auto-flag for specialist review
- Quality score is hard-capped at 40/100 if core agents (Vision, Report) fail
- Low-confidence findings are always flagged with confirmatory imaging recommendations
- The disclaimer is re-enforced after every LLM call, unconditionally
This system is a decision support tool, not a clinical decision maker. Every output is intended to assist, not replace, a licensed radiologist.
Dashboard Preview
The single-page clinical dashboard provides:
- Live pipeline panel β real-time agent status cards with per-step runtime counters
- Analytics tab β severity distribution donut chart, differential diagnosis confidence bar chart, agent timing bar chart β all populated from structured model output
- Report panel β severity banner, safety flags, all six report sections, finding cards color-coded by severity
- DICOM metadata card β study date, institution, modality, body part, technical parameters
- PDF export β full formatted report generated client-side
- Clinical chat β slide-up Q&A panel backed by the ClinicalAdvisorAgent
- AMD GPU panel β live util %, VRAM used/total, temperature, power draw β polling every 3 seconds
Built For
AMD Developer Hackathon 2026 Track: Vision & Multimodal AI
This project demonstrates that AMD's ROCm ecosystem is a complete, production-viable alternative for serious AI workloads. Medical imaging analysis β with real multimodal vision, structured clinical reasoning, and standards-compliant output β running fully on AMD MI300X without a single NVIDIA or cloud dependency.
Built by Ramyar Β· Sulaymaniyah, Iraq
#AMDDevChallenge Β· AMD Instinct MI300X Β· ROCm Β· vLLM Β· Qwen