clinical-deidentify / README.md
Nithins03's picture
V1.2: Expanded document support (.docx), added result download feature, and repository cleanup
d651fef
metadata
title: Clinical Deidentify
emoji: πŸ₯
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 8000

πŸ₯ Clinical-Deidentify: Secure PHI Removal

CI

Fast, regex + transformer hybrid PHI removal for clinical text and documents. Protect patient privacy with clinical-grade accuracy.

UI Mockup

πŸš€ Features

  • Hybrid Pipeline: Combines deterministic regex for structured PHI (dates, IDs, phones) with state-of-the-art transformers for contextual PHI (patient names, locations).
  • Expanded Document Support: De-identify PDFs, Word (.docx), and TXT files with a unified interface.
  • Download Feature: Instantly download de-identified results as a .txt file for safe storage.
  • Premium Dashboard: A sleek, dark-mode web UI for real-time de-identification and file uploads.
  • HIPAA Compliant: Docker-native service ensuring all data stays on your infrastructure.
  • Active Learning: Built-in feedback loop for clinical correction storage.

πŸš€ Quick Start (Docker)

  1. Build:
    docker build -t clinical-deidentify .
    
  2. Run:
    docker run -d -p 8001:8000 --name clinical-deid-service clinical-deidentify
    
    Dashboard available at: http://localhost:8001

Local Installation

  1. Clone & Setup:
    python -m venv venv
    source venv/bin/activate
    pip install -r requirements.txt
    
  2. Run Server:
    uvicorn app.main:app --reload
    

Usage

De-identify Single Note

curl -X POST "http://localhost:8000/deidentify" \
     -H "Content-Type: application/json" \
     -d '{"text": "Patient John Doe was admitted on 01/01/2023."}'

Response:

{
  "original": "Patient John Doe was admitted on 01/01/2023.",
  "deidentified": "Patient [PATIENT] was admitted on [DATE].",
  "entities": [...]
}

Evaluation

Run the mock benchmarking script:

python eval/evaluate.py

Dataset Benchmarking

The pipeline is designed to be compatible with the 2014 i2b2 de-identification shared task format. You can load i2b2 XML files and map them to the EvalRequest schema within eval/evaluate.py.