Spaces:

HarshCode
/

ICH-Detection-Pipeline

Running

App Files Files Community

ICH-Detection-Pipeline / System_Architecture.md

Harshit Ghosh

refactor: migrate to modular architecture, add system documentation, and enhance batch processing stability

c4436fb 13 days ago

preview code

raw

history blame

2.91 kB

System Architecture

Our pipeline is designed for multi-tenant medical data isolation, asynchronous batch processing, and scalable inference.

graph TD
    %% User Interfaces
    User([Clinician/User]) -->|Upload DICOMs or ZIPs| UI[Web Interface]
    
    %% Backend Pipeline
    subgraph "Ingestion & API Layer"
        UI -->|HTTP POST| Flask[Flask API Gateway]
        Flask -->|Rate Limiting & Auth| Security[Security & Auth Module]
    end

    %% Storage Layer
    subgraph "Storage Layer"
        Security -->|Save Raw Scans| Storage[(Local/Cloud File Storage)]
        Security -->|Log Audit & Metadata| NeonDB[(Neon PostgreSQL DB)]
    end

    %% Processing Layer
    subgraph "Processing Layer (Asynchronous)"
        Flask -->|Spawn Task| Worker[Batch Processing Worker]
        Worker -->|Read DICOM| Storage
        Worker -->|Inference| PyTorch[PyTorch / EffNet B4]
        PyTorch -->|Output Grad-CAM & JSON| Storage
        Worker -->|Save Inference Results| NeonDB
    end

    %% Insights Layer
    subgraph "Insights Layer"
        NeonDB -->|Query Aggregated Stats| Dashboard[Analytics Dashboard]
        Dashboard --> User
    end

Key Code Components

If you are exploring the codebase, here are the key modules that power this pipeline:

Multi-Tenant Security & Data Pipeline (data_isolation.py): I built a UserDataManager to ensure strict data isolation between medical professionals. This guarantees that users can only access and view their own reports and uploaded DICOMs, maintaining strict privacy.
Database Schema & History (models.py): I designed a normalized PostgreSQL schema hosted on Neon. The ScreeningReport table tracks everything from raw probabilities to triage urgency, allowing the system to query historical trends and generate dashboard analytics.
Asynchronous Batch Processing (app_new.py): Instead of blocking the UI during heavy ML inference, I built an asynchronous worker (see _start_batch and _run_batch_worker) that processes entire directories or ZIP files of DICOMs in the background, updating the frontend batch status in real-time.
AI Integration (run_interface.py): This acts as the adapter layer that translates web requests into PyTorch tensor operations, generates the Grad-CAM visual heatmaps, and applies isotonic temperature calibration to the model's output probabilities.

System Screenshots

(Note: Add your actual images here)

Analytics Dashboard: Displays the insights layer, including Total Cases, Positivity Rate, and Average Confidence across the user's history.
Batch Processing UI: Shows the asynchronous pipeline handling a queue of multiple DICOM files without freezing the app.
Visual Report: Displays a specific patient report featuring the generated Grad-CAM heatmap alongside Urgency and Confidence metrics.