File size: 2,456 Bytes
42a8999
 
 
 
 
 
 
 
 
66f2405
e48117a
66f2405
e48117a
66f2405
 
 
 
 
e48117a
d651fef
 
66f2405
 
 
e48117a
5d9b32b
e48117a
 
 
 
 
 
5d9b32b
e48117a
5d9b32b
e48117a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
title: Clinical Deidentify
emoji: 🏥
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 8000
---

# 🏥 Clinical-Deidentify: Secure PHI Removal

[![CI](https://github.com/sarvanithin/clinical-deidentify/actions/workflows/ci.yml/badge.svg)](https://github.com/sarvanithin/clinical-deidentify/actions)

Fast, regex + transformer hybrid PHI removal for clinical text and documents. Protect patient privacy with clinical-grade accuracy.

![UI Mockup](/Users/nithinsarva/.gemini/antigravity/brain/b50698cd-8087-453a-be22-ce45586e1767/ui_demo_mockup_1773587336810.png)

## 🚀 Features
- **Hybrid Pipeline**: Combines deterministic regex for structured PHI (dates, IDs, phones) with state-of-the-art transformers for contextual PHI (patient names, locations).
- **Expanded Document Support**: De-identify **PDFs**, **Word (.docx)**, and **TXT** files with a unified interface.
- **Download Feature**: Instantly download de-identified results as a `.txt` file for safe storage.
- **Premium Dashboard**: A sleek, dark-mode web UI for real-time de-identification and file uploads.
- **HIPAA Compliant**: Docker-native service ensuring all data stays on your infrastructure.
- **Active Learning**: Built-in feedback loop for clinical correction storage.

## 🚀 Quick Start (Docker)
1. **Build**:
   ```bash
   docker build -t clinical-deidentify .
   ```
2. **Run**:
   ```bash
   docker run -d -p 8001:8000 --name clinical-deid-service clinical-deidentify
   ```
   *Dashboard available at: [http://localhost:8001](http://localhost:8001)*

## Local Installation
1. **Clone & Setup**:
   ```bash
   python -m venv venv
   source venv/bin/activate
   pip install -r requirements.txt
   ```
2. **Run Server**:
   ```bash
   uvicorn app.main:app --reload
   ```

## Usage
### De-identify Single Note
```bash
curl -X POST "http://localhost:8000/deidentify" \
     -H "Content-Type: application/json" \
     -d '{"text": "Patient John Doe was admitted on 01/01/2023."}'
```
**Response**:
```json
{
  "original": "Patient John Doe was admitted on 01/01/2023.",
  "deidentified": "Patient [PATIENT] was admitted on [DATE].",
  "entities": [...]
}
```

## Evaluation
Run the mock benchmarking script:
```bash
python eval/evaluate.py
```

## Dataset Benchmarking
The pipeline is designed to be compatible with the **2014 i2b2 de-identification shared task** format. You can load i2b2 XML files and map them to the `EvalRequest` schema within `eval/evaluate.py`.