| --- |
| license: cc-by-nc-4.0 |
| language: |
| - en |
| library_name: onnxruntime |
| pipeline_tag: object-detection |
| tags: |
| - pii |
| - privacy |
| - redaction |
| - object-detection |
| - rf-detr |
| - screen-capture |
| - accessibility |
| - computer-use |
| - agentic |
| - screenpipe |
| metrics: |
| - zero-leak |
| - oversmash |
| - precision |
| - recall |
| extra_gated_prompt: >- |
| This model is licensed CC BY-NC 4.0 (non-commercial). For commercial |
| use — production deployment, SaaS / API embedding, agent privacy |
| middleware, custom fine-tunes — contact louis@screenpi.pe. |
| --- |
| |
| # screenpipe-pii-image-redactor |
|
|
| > A [screenpipe](https://screenpi.pe) project. The image-modality |
| > companion to [`screenpipe/pii-redactor`](https://huggingface.co/screenpipe/pii-redactor). |
|
|
| A fine-tuned **image PII detector** for the same three surfaces an AI |
| agent sees a user's machine through: |
|
|
| 1. **Screen captures** — JPGs / PNGs of the user's screen, rendered |
| text and structured chrome (Slack, Outlook, Cursor, Terminal, |
| Confluence, GitHub, 1Password, calendars, browsers). |
| 2. **Computer-use traces** — the visual frames an agentic model |
| (Claude Computer Use, GPT operator, etc.) reads when it controls a |
| desktop. |
| 3. **Accessibility-tree visualizations** — when an agent screenshots |
| what it inferred from the AX tree to debug a tool call. |
|
|
| These surfaces are **dense, multi-PII, semi-structured** in ways no |
| prose-trained PII detector handles well. Returns pixel-space bounding |
| boxes for 12 canonical PII categories. |
|
|
| ONNX, ~108 MB. Same `.onnx` ships across macOS / Windows / Linux — |
| the user's ONNX Runtime selects the Execution Provider at load time |
| (CoreML, DirectML, CUDA, or CPU baseline). |
|
|
| > **License: CC BY-NC 4.0** (non-commercial). For commercial use — |
| > production redaction, SaaS / API embedding, AI-agent privacy |
| > middleware, custom fine-tunes — contact **louis@screenpi.pe**. See |
| > [`LICENSE`](LICENSE). |
|
|
| ## Headline numbers |
|
|
| `rfdetr_v8` on a held-out 221-image validation split (190 PII-bearing, |
| 31 hard negatives) of the [screenpipe-pii-bench-image](https://github.com/screenpipe/screenpipe-pii-bench-image) |
| corpus, IoU ≥ 0.30: |
|
|
| | metric | this model | regex+OCR floor | Microsoft Presidio (published OSS) | |
| |---|---:|---:|---:| |
| | **zero-leak** (every gold span caught) | **95.3%** | 2.6% | 0.5% | |
| | **oversmash** (false-fire on negatives) | **0.0%** | 3.2% | 48.4% | |
| | micro-precision | 99% | 87% | 47% | |
| | micro-recall | 97% | 26% | 42% | |
| | macro-F1 | 0.871 | 0.318 | 0.190 | |
|
|
| Per-label recall (a few highlights): `private_person` 0.99 · |
| `private_company` 1.00 · `private_repo` 1.00 · `private_url` 1.00 · |
| `secret` 0.99 · `private_email` 0.98 · `private_phone` 0.92 · |
| `private_address` 0.92. |
|
|
| ### Latency (rfdetr_v8, 320×320 input, FP32) |
| |
| | platform | EP | p50 | |
| |-------------------------------|-----------|----------:| |
| | macOS Apple Silicon (M-series) | CoreML | **66 ms** ([real-screen sample](https://github.com/screenpipe/screenpipe-pii-bench-image)) | |
| | macOS Apple Silicon (M-series) | CPU | 163 ms | |
| | Windows + DX12 GPU | DirectML | ~30-60 ms (estimated) | |
| | Linux + NVIDIA | CUDA | ~10-20 ms (estimated) | |
| | Linux/Windows CPU-only | CPU | ~140 ms | |
| |
| Same `.onnx` everywhere — Execution Provider is selected at load time |
| by the user's ONNX Runtime build. **No CUDA / Vulkan / GPU vendor SDKs |
| required at the consumer.** |
| |
| ## Why this exists (vs Presidio Image Redactor and friends) |
| |
| The published baselines are trained on prose / generic-document |
| imagery. A typical screenpipe frame looks nothing like that: |
| |
| - A Slack channel sidebar with 8 names, 12 channel mentions, 3 emails, |
| and 1 pasted AWS key — all in 1440×900 px at 14 px font. |
| - A 1Password vault entry with structured `[Username | Password | |
| Server | One-time password]` rows, half of which are masked dots. |
| - A Cursor workspace open on `.env.production` with five secret-shaped |
| values stacked top-to-bottom. |
| |
| These images are **dense** (10-20 PII spans per frame), **structured** |
| (rows / columns / aligned chrome), and **layout-cued** (a thing in the |
| "Username" cell is a username regardless of its surface text). A |
| generic NER-on-OCR pipeline misfires by over-redacting UI chrome |
| (48% false-fire on negatives in our bench, vs. 0% for this model). |
| |
| If you're building an **agentic system that reads screen state** — a |
| desktop-control agent, a memory layer for browsing, anything that |
| streams screen captures into an LLM — this is the redactor designed |
| for that pipe. |
| |
| ## What it does |
| |
| Per-image **object detection**. Given a JPG or PNG, returns |
| `[(bbox, label, score)]` where each detection is a region the model |
| thinks is PII, classified into one of the 12 canonical categories |
| shared with [`screenpipe/pii-redactor`](https://huggingface.co/screenpipe/pii-redactor): |
| |
| ``` |
| private_person, private_email, private_phone, private_address, |
| private_url, private_company, private_repo, private_handle, |
| private_channel, private_id, private_date, secret |
| ``` |
| |
| `secret` covers passwords, API keys, JWTs, DB connection strings, |
| PRIVATE-KEY block markers, etc. — same coverage as the text model. |
| |
| ## Inference |
| |
| ```python |
| # pip install onnxruntime pillow numpy |
| import numpy as np |
| import onnxruntime as ort |
| from PIL import Image |
|
|
| CLASSES = [ |
| "private_person", "private_email", "private_phone", |
| "private_address", "private_url", "private_company", |
| "private_repo", "private_handle", "private_channel", |
| "private_id", "private_date", "secret", |
| ] |
| INPUT_SIZE = 320 # rfdetr_v8 was exported at 320x320 |
| THRESHOLD = 0.30 |
| |
| sess = ort.InferenceSession( |
| "rfdetr_v8.onnx", |
| providers=["CoreMLExecutionProvider", "CPUExecutionProvider"], |
| ) |
| |
| img = Image.open("screenshot.png").convert("RGB") |
| W, H = img.size |
| resized = img.resize((INPUT_SIZE, INPUT_SIZE), Image.BILINEAR) |
| arr = np.asarray(resized, dtype=np.float32) / 255.0 |
| arr = (arr - [0.485, 0.456, 0.406]) / [0.229, 0.224, 0.225] |
| arr = arr.transpose(2, 0, 1)[None].astype(np.float32) # NCHW |
|
|
| boxes, logits = sess.run(None, {sess.get_inputs()[0].name: arr}) |
| boxes = boxes[0] # (300, 4) cx, cy, w, h normalized |
| logits = logits[0] # (300, 13) — last channel is "no-object" |
| |
| probs = 1.0 / (1.0 + np.exp(-logits[:, :12])) # per-class sigmoid |
| best_class = probs.argmax(axis=1) |
| best_score = probs[np.arange(300), best_class] |
| keep = best_score >= THRESHOLD |
| |
| for q in np.where(keep)[0]: |
| cx, cy, bw, bh = boxes[q] |
| x1 = (cx - bw / 2) * W |
| y1 = (cy - bh / 2) * H |
| print(f" {CLASSES[best_class[q]]:18} score={best_score[q]:.2f} " |
| f"bbox=[{int(x1)}, {int(y1)}, {int(bw*W)}, {int(bh*H)}]") |
| ``` |
| |
| Full example with image overlay → `examples/inference.py`. |
| |
| For Rust integration via the `ort` crate, see the |
| [`rust_smoke/`](https://github.com/screenpipe/screenpipe-pii-bench-image/tree/main/rust_smoke) |
| prototype and the production wiring in PR |
| [`screenpipe/screenpipe#3188`](https://github.com/screenpipe/screenpipe/pull/3188). |
| |
| ## Redacting the image (vs. just detecting) |
| |
| This model **detects**. To actually remove the PII, draw a solid |
| rectangle over each detected bbox. Solid black, **not blur** — blur |
| is reversible by super-resolution attacks; opaque rectangles aren't. |
| |
| ```python |
| from PIL import ImageDraw |
| draw = ImageDraw.Draw(img) |
| for det in detections: # from the snippet above |
| x, y, w, h = det.bbox |
| draw.rectangle([x, y, x + w, y + h], fill=(0, 0, 0)) |
| img.save("screenshot_redacted.png") |
| ``` |
| |
| That's the entire redactor wrapper. ~5 lines. |
| |
| ## Architecture |
| |
| - Base: [RF-DETR-Nano](https://github.com/roboflow/rf-detr) (Roboflow, |
| ICLR 2026) — DINOv2-backbone real-time detection transformer, ~25 M |
| params, claims first real-time model to break 60 mAP on COCO. |
| - Fine-tuned at 320×320 input on a 2,833-image synthetic + WebPII |
| union (synthetic via DOM-truth bbox extraction; WebPII via the |
| [arxiv 2603.17357 release](https://arxiv.org/abs/2603.17357)). |
| - Output head: 300 detection queries × 13 channels (12 PII classes + |
| no-object). Per-class sigmoid (NOT softmax — RF-DETR uses |
| independent classification per query). |
| - Trained on a single A100 80 GB; ~100 minutes wall-clock for the |
| best-EMA epoch. |
| |
| ## What was the training data |
| |
| | source | size | labels | notes | |
| |---|---:|---|---| |
| | **synthetic bench** | 2,206 imgs | DOM-truth bboxes (pixel-perfect) | 9 templates rendered via headless Chromium with `data-span` attributes — labels come from the same DOM tree the browser laid out. | |
| | **WebPII** | 500 imgs (balanced sample) | bbox-labeled by the original authors | March 2026 release, e-commerce screenshots. Class-imbalance capped at 2× our synthetic frequency. | |
| | **cascade auto-labels** | 100 imgs | OCR + text-PII model alignment | Old screenshots from this project's own bench, weakly labeled. | |
| |
| **No real user data was used during fine-tuning.** Membership |
| inference attacks recover no real-user content because no real-user |
| content was in the training set. If you discover a failure mode on |
| your real screens, the project's recipe is to add a new SYNTHETIC |
| template that reproduces it — the screenshot becomes a bug report, |
| never a training row. |
| |
| ## Limitations |
| |
| 1. **Hand-curated gold set is small** — bench `data/` has 5 |
| manually-built cases. Larger-scale held-out evaluation depends on |
| the synthetic corpus, which is in-distribution by construction. |
| 2. **`private_handle` and `private_id` recall are 0%** in the |
| reference numbers because the val split has only 2 and 1 examples |
| respectively. Don't deploy without a domain-specific eval pass. |
| 3. **Synthetic-template ceiling.** 95.3% zero-leak is the bench's |
| stable ceiling at this corpus size. Gains beyond come from training |
| on more real-screen failure modes (tracked in the bench's backlog). |
| 4. **WebPII is e-commerce-heavy.** Adding the full WebPII split |
| actually *hurt* dev-app accuracy in our experiments (rfdetr_v4 at |
| 90.5% zero-leak vs. v8's 95.3%). The 500-image balanced sample is |
| our best-of-both compromise. |
| 5. **CPU-only floors at ~140 ms p50.** INT8 quantization (planned) |
| gets that under 100 ms, but the FP32 release is what's on this |
| page today. |
| 6. **English-only.** Synthetic templates render Latin-script text; |
| the WebPII supplement is English. CJK / Arabic / Cyrillic not |
| evaluated — don't deploy without a locale-specific eval. |
| 7. **Adversarial robustness not tested.** A user who knows the |
| detector exists could craft layouts that confuse it (handwritten |
| PII, embedded-image PII, partial occlusion). Use this for |
| honest-user privacy, not as a security boundary. |
| |
| ## Files |
| |
| ``` |
| rfdetr_v8.onnx 108 MB · the model · sha256 below |
| README.md this file |
| LICENSE CC BY-NC 4.0 |
| NOTICE attribution to base model + datasets |
| examples/ |
| inference.py the snippet above, runnable |
| ``` |
| |
| SHA-256 of `rfdetr_v8.onnx`: |
| `431acc0f0beb22a39572b7a50af4fc446e799840fb71320dc124fbd79a121eb3` |
|
|
| ## Reproducing inference |
|
|
| ```bash |
| git clone https://huggingface.co/screenpipe/pii-image-redactor |
| cd pii-image-redactor |
| git lfs pull |
| pip install onnxruntime pillow numpy |
| python examples/inference.py path/to/your_screenshot.png |
| ``` |
|
|
| Reproducing the eval scores requires the screenpipe-pii-bench-image |
| benchmark, which is not redistributed (it's the training corpus). |
| Contact **louis@screenpi.pe** for benchmark access or commercial |
| licensing. |
|
|
| ## License |
|
|
| [CC BY-NC 4.0](LICENSE) — non-commercial use only. The base model |
| (RF-DETR) is Apache-2.0; obligations are preserved (see |
| [`NOTICE`](NOTICE)). |
|
|
| For commercial licensing (production deployment, redistribution |
| rights, SaaS / API embedding, custom fine-tunes for your domain): |
| **louis@screenpi.pe**. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{screenpipe-pii-image-redactor-2026, |
| title = {screenpipe-pii-image-redactor: a screen-PII detector for |
| accessibility-aware agents}, |
| author = {{screenpipe}}, |
| year = {2026}, |
| url = {https://huggingface.co/screenpipe/pii-image-redactor} |
| } |
| ``` |
|
|