clinical-deidentify / README.md
Nithins03's picture
V1.2: Expanded document support (.docx), added result download feature, and repository cleanup
d651fef
---
title: Clinical Deidentify
emoji: πŸ₯
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 8000
---
# πŸ₯ Clinical-Deidentify: Secure PHI Removal
[![CI](https://github.com/sarvanithin/clinical-deidentify/actions/workflows/ci.yml/badge.svg)](https://github.com/sarvanithin/clinical-deidentify/actions)
Fast, regex + transformer hybrid PHI removal for clinical text and documents. Protect patient privacy with clinical-grade accuracy.
![UI Mockup](/Users/nithinsarva/.gemini/antigravity/brain/b50698cd-8087-453a-be22-ce45586e1767/ui_demo_mockup_1773587336810.png)
## πŸš€ Features
- **Hybrid Pipeline**: Combines deterministic regex for structured PHI (dates, IDs, phones) with state-of-the-art transformers for contextual PHI (patient names, locations).
- **Expanded Document Support**: De-identify **PDFs**, **Word (.docx)**, and **TXT** files with a unified interface.
- **Download Feature**: Instantly download de-identified results as a `.txt` file for safe storage.
- **Premium Dashboard**: A sleek, dark-mode web UI for real-time de-identification and file uploads.
- **HIPAA Compliant**: Docker-native service ensuring all data stays on your infrastructure.
- **Active Learning**: Built-in feedback loop for clinical correction storage.
## πŸš€ Quick Start (Docker)
1. **Build**:
```bash
docker build -t clinical-deidentify .
```
2. **Run**:
```bash
docker run -d -p 8001:8000 --name clinical-deid-service clinical-deidentify
```
*Dashboard available at: [http://localhost:8001](http://localhost:8001)*
## Local Installation
1. **Clone & Setup**:
```bash
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
2. **Run Server**:
```bash
uvicorn app.main:app --reload
```
## Usage
### De-identify Single Note
```bash
curl -X POST "http://localhost:8000/deidentify" \
-H "Content-Type: application/json" \
-d '{"text": "Patient John Doe was admitted on 01/01/2023."}'
```
**Response**:
```json
{
"original": "Patient John Doe was admitted on 01/01/2023.",
"deidentified": "Patient [PATIENT] was admitted on [DATE].",
"entities": [...]
}
```
## Evaluation
Run the mock benchmarking script:
```bash
python eval/evaluate.py
```
## Dataset Benchmarking
The pipeline is designed to be compatible with the **2014 i2b2 de-identification shared task** format. You can load i2b2 XML files and map them to the `EvalRequest` schema within `eval/evaluate.py`.