Spaces:
Sleeping
Sleeping
metadata
title: Clinical Deidentify
emoji: π₯
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 8000
π₯ Clinical-Deidentify: Secure PHI Removal
Fast, regex + transformer hybrid PHI removal for clinical text and documents. Protect patient privacy with clinical-grade accuracy.
π Features
- Hybrid Pipeline: Combines deterministic regex for structured PHI (dates, IDs, phones) with state-of-the-art transformers for contextual PHI (patient names, locations).
- Expanded Document Support: De-identify PDFs, Word (.docx), and TXT files with a unified interface.
- Download Feature: Instantly download de-identified results as a
.txtfile for safe storage. - Premium Dashboard: A sleek, dark-mode web UI for real-time de-identification and file uploads.
- HIPAA Compliant: Docker-native service ensuring all data stays on your infrastructure.
- Active Learning: Built-in feedback loop for clinical correction storage.
π Quick Start (Docker)
- Build:
docker build -t clinical-deidentify . - Run:
Dashboard available at: http://localhost:8001docker run -d -p 8001:8000 --name clinical-deid-service clinical-deidentify
Local Installation
- Clone & Setup:
python -m venv venv source venv/bin/activate pip install -r requirements.txt - Run Server:
uvicorn app.main:app --reload
Usage
De-identify Single Note
curl -X POST "http://localhost:8000/deidentify" \
-H "Content-Type: application/json" \
-d '{"text": "Patient John Doe was admitted on 01/01/2023."}'
Response:
{
"original": "Patient John Doe was admitted on 01/01/2023.",
"deidentified": "Patient [PATIENT] was admitted on [DATE].",
"entities": [...]
}
Evaluation
Run the mock benchmarking script:
python eval/evaluate.py
Dataset Benchmarking
The pipeline is designed to be compatible with the 2014 i2b2 de-identification shared task format. You can load i2b2 XML files and map them to the EvalRequest schema within eval/evaluate.py.
