Spaces:
Sleeping
Sleeping
| title: Clinical Deidentify | |
| emoji: π₯ | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: docker | |
| app_port: 8000 | |
| # π₯ Clinical-Deidentify: Secure PHI Removal | |
| [](https://github.com/sarvanithin/clinical-deidentify/actions) | |
| Fast, regex + transformer hybrid PHI removal for clinical text and documents. Protect patient privacy with clinical-grade accuracy. | |
|  | |
| ## π Features | |
| - **Hybrid Pipeline**: Combines deterministic regex for structured PHI (dates, IDs, phones) with state-of-the-art transformers for contextual PHI (patient names, locations). | |
| - **Expanded Document Support**: De-identify **PDFs**, **Word (.docx)**, and **TXT** files with a unified interface. | |
| - **Download Feature**: Instantly download de-identified results as a `.txt` file for safe storage. | |
| - **Premium Dashboard**: A sleek, dark-mode web UI for real-time de-identification and file uploads. | |
| - **HIPAA Compliant**: Docker-native service ensuring all data stays on your infrastructure. | |
| - **Active Learning**: Built-in feedback loop for clinical correction storage. | |
| ## π Quick Start (Docker) | |
| 1. **Build**: | |
| ```bash | |
| docker build -t clinical-deidentify . | |
| ``` | |
| 2. **Run**: | |
| ```bash | |
| docker run -d -p 8001:8000 --name clinical-deid-service clinical-deidentify | |
| ``` | |
| *Dashboard available at: [http://localhost:8001](http://localhost:8001)* | |
| ## Local Installation | |
| 1. **Clone & Setup**: | |
| ```bash | |
| python -m venv venv | |
| source venv/bin/activate | |
| pip install -r requirements.txt | |
| ``` | |
| 2. **Run Server**: | |
| ```bash | |
| uvicorn app.main:app --reload | |
| ``` | |
| ## Usage | |
| ### De-identify Single Note | |
| ```bash | |
| curl -X POST "http://localhost:8000/deidentify" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"text": "Patient John Doe was admitted on 01/01/2023."}' | |
| ``` | |
| **Response**: | |
| ```json | |
| { | |
| "original": "Patient John Doe was admitted on 01/01/2023.", | |
| "deidentified": "Patient [PATIENT] was admitted on [DATE].", | |
| "entities": [...] | |
| } | |
| ``` | |
| ## Evaluation | |
| Run the mock benchmarking script: | |
| ```bash | |
| python eval/evaluate.py | |
| ``` | |
| ## Dataset Benchmarking | |
| The pipeline is designed to be compatible with the **2014 i2b2 de-identification shared task** format. You can load i2b2 XML files and map them to the `EvalRequest` schema within `eval/evaluate.py`. | |