screenpipe
/

pii-image-redactor

+---
+license: cc-by-nc-4.0
+language:
+  - en
+library_name: onnxruntime
+pipeline_tag: object-detection
+tags:
+  - pii
+  - privacy
+  - redaction
+  - object-detection
+  - rf-detr
+  - screen-capture
+  - accessibility
+  - computer-use
+  - agentic
+  - screenpipe
+metrics:
+  - zero-leak
+  - oversmash
+  - precision
+  - recall
+extra_gated_prompt: >-
+  This model is licensed CC BY-NC 4.0 (non-commercial). For commercial
+  use — production deployment, SaaS / API embedding, agent privacy
+  middleware, custom fine-tunes — contact louis@screenpi.pe.
+---
+# screenpipe-pii-image-redactor
+> A [screenpipe](https://screenpi.pe) project. The image-modality
+> companion to [`screenpipe/pii-redactor`](https://huggingface.co/screenpipe/pii-redactor).
+A fine-tuned **image PII detector** for the same three surfaces an AI
+agent sees a user's machine through:
+1. **Screen captures** — JPGs / PNGs of the user's screen, rendered
+   text and structured chrome (Slack, Outlook, Cursor, Terminal,
+   Confluence, GitHub, 1Password, calendars, browsers).
+2. **Computer-use traces** — the visual frames an agentic model
+   (Claude Computer Use, GPT operator, etc.) reads when it controls a
+   desktop.
+3. **Accessibility-tree visualizations** — when an agent screenshots
+   what it inferred from the AX tree to debug a tool call.
+These surfaces are **dense, multi-PII, semi-structured** in ways no
+prose-trained PII detector handles well. Returns pixel-space bounding
+boxes for 12 canonical PII categories.
+ONNX, ~108 MB. Same `.onnx` ships across macOS / Windows / Linux —
+the user's ONNX Runtime selects the Execution Provider at load time
+(CoreML, DirectML, CUDA, or CPU baseline).
+> **License: CC BY-NC 4.0** (non-commercial). For commercial use —
+> production redaction, SaaS / API embedding, AI-agent privacy
+> middleware, custom fine-tunes — contact **louis@screenpi.pe**. See
+> [`LICENSE`](LICENSE).
+## Headline numbers
+`rfdetr_v8` on a held-out 221-image validation split (190 PII-bearing,
+31 hard negatives) of the [screenpipe-pii-bench-image](https://github.com/screenpipe/screenpipe-pii-bench-image)
+corpus, IoU ≥ 0.30:
+| metric | this model | regex+OCR floor | Microsoft Presidio (published OSS) |
+|---|---:|---:|---:|
+| **zero-leak** (every gold span caught) | **95.3%** | 2.6% | 0.5% |
+| **oversmash** (false-fire on negatives) | **0.0%** | 3.2% | 48.4% |
+| micro-precision | 99% | 87% | 47% |
+| micro-recall | 97% | 26% | 42% |
+| macro-F1 | 0.871 | 0.318 | 0.190 |
+Per-label recall (a few highlights): `private_person` 0.99 ·
+`private_company` 1.00 · `private_repo` 1.00 · `private_url` 1.00 ·
+`secret` 0.99 · `private_email` 0.98 · `private_phone` 0.92 ·
+`private_address` 0.92.
+### Latency (rfdetr_v8, 320×320 input, FP32)
+| platform                      | EP        | p50       |
+|-------------------------------|-----------|----------:|
+| macOS Apple Silicon (M-series) | CoreML    | **66 ms** ([real-screen sample](https://github.com/screenpipe/screenpipe-pii-bench-image)) |
+| macOS Apple Silicon (M-series) | CPU       | 163 ms |
+| Windows + DX12 GPU            | DirectML  | ~30-60 ms (estimated) |
+| Linux + NVIDIA                | CUDA      | ~10-20 ms (estimated) |
+| Linux/Windows CPU-only        | CPU       | ~140 ms |
+Same `.onnx` everywhere — Execution Provider is selected at load time
+by the user's ONNX Runtime build. **No CUDA / Vulkan / GPU vendor SDKs
+required at the consumer.**
+## Why this exists (vs Presidio Image Redactor and friends)
+The published baselines are trained on prose / generic-document
+imagery. A typical screenpipe frame looks nothing like that:
+- A Slack channel sidebar with 8 names, 12 channel mentions, 3 emails,
+  and 1 pasted AWS key — all in 1440×900 px at 14 px font.
+- A 1Password vault entry with structured `[Username | Password |
+  Server | One-time password]` rows, half of which are masked dots.
+- A Cursor workspace open on `.env.production` with five secret-shaped
+  values stacked top-to-bottom.
+These images are **dense** (10-20 PII spans per frame), **structured**
+(rows / columns / aligned chrome), and **layout-cued** (a thing in the
+"Username" cell is a username regardless of its surface text). A
+generic NER-on-OCR pipeline misfires by over-redacting UI chrome
+(48% false-fire on negatives in our bench, vs. 0% for this model).
+If you're building an **agentic system that reads screen state** — a
+desktop-control agent, a memory layer for browsing, anything that
+streams screen captures into an LLM — this is the redactor designed
+for that pipe.
+## What it does
+Per-image **object detection**. Given a JPG or PNG, returns
+`[(bbox, label, score)]` where each detection is a region the model
+thinks is PII, classified into one of the 12 canonical categories
+shared with [`screenpipe/pii-redactor`](https://huggingface.co/screenpipe/pii-redactor):
+```
+private_person, private_email, private_phone, private_address,
+private_url, private_company, private_repo, private_handle,
+private_channel, private_id, private_date, secret
+```
+`secret` covers passwords, API keys, JWTs, DB connection strings,
+PRIVATE-KEY block markers, etc. — same coverage as the text model.
+## Inference
+```python
+# pip install onnxruntime pillow numpy
+import numpy as np
+import onnxruntime as ort
+from PIL import Image
+CLASSES = [
+    "private_person",   "private_email",   "private_phone",
+    "private_address",  "private_url",     "private_company",
+    "private_repo",     "private_handle",  "private_channel",
+    "private_id",       "private_date",    "secret",
+]
+INPUT_SIZE = 320  # rfdetr_v8 was exported at 320x320
+THRESHOLD  = 0.30
+sess = ort.InferenceSession(
+    "rfdetr_v8.onnx",
+    providers=["CoreMLExecutionProvider", "CPUExecutionProvider"],
+)
+img = Image.open("screenshot.png").convert("RGB")
+W, H = img.size
+resized = img.resize((INPUT_SIZE, INPUT_SIZE), Image.BILINEAR)
+arr = np.asarray(resized, dtype=np.float32) / 255.0
+arr = (arr - [0.485, 0.456, 0.406]) / [0.229, 0.224, 0.225]
+arr = arr.transpose(2, 0, 1)[None].astype(np.float32)  # NCHW
+boxes, logits = sess.run(None, {sess.get_inputs()[0].name: arr})
+boxes  = boxes[0]    # (300, 4) cx, cy, w, h normalized
+logits = logits[0]   # (300, 13) — last channel is "no-object"
+probs = 1.0 / (1.0 + np.exp(-logits[:, :12]))   # per-class sigmoid
+best_class = probs.argmax(axis=1)
+best_score = probs[np.arange(300), best_class]
+keep = best_score >= THRESHOLD
+for q in np.where(keep)[0]:
+    cx, cy, bw, bh = boxes[q]
+    x1 = (cx - bw / 2) * W
+    y1 = (cy - bh / 2) * H
+    print(f"  {CLASSES[best_class[q]]:18} score={best_score[q]:.2f} "
+          f"bbox=[{int(x1)}, {int(y1)}, {int(bw*W)}, {int(bh*H)}]")
+```
+Full example with image overlay → `examples/inference.py`.
+For Rust integration via the `ort` crate, see the
+[`rust_smoke/`](https://github.com/screenpipe/screenpipe-pii-bench-image/tree/main/rust_smoke)
+prototype and the production wiring in PR
+[`screenpipe/screenpipe#3188`](https://github.com/screenpipe/screenpipe/pull/3188).
+## Redacting the image (vs. just detecting)
+This model **detects**. To actually remove the PII, draw a solid
+rectangle over each detected bbox. Solid black, **not blur** — blur
+is reversible by super-resolution attacks; opaque rectangles aren't.
+```python
+from PIL import ImageDraw
+draw = ImageDraw.Draw(img)
+for det in detections:        # from the snippet above
+    x, y, w, h = det.bbox
+    draw.rectangle([x, y, x + w, y + h], fill=(0, 0, 0))
+img.save("screenshot_redacted.png")
+```
+That's the entire redactor wrapper. ~5 lines.
+## Architecture
+- Base: [RF-DETR-Nano](https://github.com/roboflow/rf-detr) (Roboflow,
+  ICLR 2026) — DINOv2-backbone real-time detection transformer, ~25 M
+  params, claims first real-time model to break 60 mAP on COCO.
+- Fine-tuned at 320×320 input on a 2,833-image synthetic + WebPII
+  union (synthetic via DOM-truth bbox extraction; WebPII via the
+  [arxiv 2603.17357 release](https://arxiv.org/abs/2603.17357)).
+- Output head: 300 detection queries × 13 channels (12 PII classes +
+  no-object). Per-class sigmoid (NOT softmax — RF-DETR uses
+  independent classification per query).
+- Trained on a single A100 80 GB; ~100 minutes wall-clock for the
+  best-EMA epoch.
+## What was the training data
+| source | size | labels | notes |
+|---|---:|---|---|
+| **synthetic bench** | 2,206 imgs | DOM-truth bboxes (pixel-perfect) | 9 templates rendered via headless Chromium with `data-span` attributes — labels come from the same DOM tree the browser laid out. |
+| **WebPII** | 500 imgs (balanced sample) | bbox-labeled by the original authors | March 2026 release, e-commerce screenshots. Class-imbalance capped at 2× our synthetic frequency. |
+| **cascade auto-labels** | 100 imgs | OCR + text-PII model alignment | Old screenshots from this project's own bench, weakly labeled. |
+**No real user data was used during fine-tuning.** Membership
+inference attacks recover no real-user content because no real-user
+content was in the training set. If you discover a failure mode on
+your real screens, the project's recipe is to add a new SYNTHETIC
+template that reproduces it — the screenshot becomes a bug report,
+never a training row.
+## Limitations
+1. **Hand-curated gold set is small** — bench `data/` has 5
+   manually-built cases. Larger-scale held-out evaluation depends on
+   the synthetic corpus, which is in-distribution by construction.
+2. **`private_handle` and `private_id` recall are 0%** in the
+   reference numbers because the val split has only 2 and 1 examples
+   respectively. Don't deploy without a domain-specific eval pass.
+3. **Synthetic-template ceiling.** 95.3% zero-leak is the bench's
+   stable ceiling at this corpus size. Gains beyond come from training
+   on more real-screen failure modes (tracked in the bench's backlog).
+4. **WebPII is e-commerce-heavy.** Adding the full WebPII split
+   actually *hurt* dev-app accuracy in our experiments (rfdetr_v4 at
+   90.5% zero-leak vs. v8's 95.3%). The 500-image balanced sample is
+   our best-of-both compromise.
+5. **CPU-only floors at ~140 ms p50.** INT8 quantization (planned)
+   gets that under 100 ms, but the FP32 release is what's on this
+   page today.
+6. **English-only.** Synthetic templates render Latin-script text;
+   the WebPII supplement is English. CJK / Arabic / Cyrillic not
+   evaluated — don't deploy without a locale-specific eval.
+7. **Adversarial robustness not tested.** A user who knows the
+   detector exists could craft layouts that confuse it (handwritten
+   PII, embedded-image PII, partial occlusion). Use this for
+   honest-user privacy, not as a security boundary.
+## Files
+```
+rfdetr_v8.onnx                108 MB · the model · sha256 below
+README.md                      this file
+LICENSE                        CC BY-NC 4.0
+NOTICE                         attribution to base model + datasets
+examples/
+  inference.py                 the snippet above, runnable
+```
+SHA-256 of `rfdetr_v8.onnx`:
+`431acc0f0beb22a39572b7a50af4fc446e799840fb71320dc124fbd79a121eb3`
+## Reproducing inference
+```bash
+git clone https://huggingface.co/screenpipe/pii-image-redactor
+cd pii-image-redactor
+git lfs pull
+pip install onnxruntime pillow numpy
+python examples/inference.py path/to/your_screenshot.png
+```
+Reproducing the eval scores requires the screenpipe-pii-bench-image
+benchmark, which is not redistributed (it's the training corpus).
+Contact **louis@screenpi.pe** for benchmark access or commercial
+licensing.
+## License
+[CC BY-NC 4.0](LICENSE) — non-commercial use only. The base model
+(RF-DETR) is Apache-2.0; obligations are preserved (see
+[`NOTICE`](NOTICE)).
+For commercial licensing (production deployment, redistribution
+rights, SaaS / API embedding, custom fine-tunes for your domain):
+**louis@screenpi.pe**.
+## Citation
+```bibtex
+@misc{screenpipe-pii-image-redactor-2026,
+  title  = {screenpipe-pii-image-redactor: a screen-PII detector for
+            accessibility-aware agents},
+  author = {{screenpipe}},
+  year   = {2026},
+  url    = {https://huggingface.co/screenpipe/pii-image-redactor}
+}
+```