File size: 12,088 Bytes
f1943f2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 | ---
license: cc-by-nc-4.0
language:
- en
library_name: onnxruntime
pipeline_tag: object-detection
tags:
- pii
- privacy
- redaction
- object-detection
- rf-detr
- screen-capture
- accessibility
- computer-use
- agentic
- screenpipe
metrics:
- zero-leak
- oversmash
- precision
- recall
extra_gated_prompt: >-
This model is licensed CC BY-NC 4.0 (non-commercial). For commercial
use — production deployment, SaaS / API embedding, agent privacy
middleware, custom fine-tunes — contact louis@screenpi.pe.
---
# screenpipe-pii-image-redactor
> A [screenpipe](https://screenpi.pe) project. The image-modality
> companion to [`screenpipe/pii-redactor`](https://huggingface.co/screenpipe/pii-redactor).
A fine-tuned **image PII detector** for the same three surfaces an AI
agent sees a user's machine through:
1. **Screen captures** — JPGs / PNGs of the user's screen, rendered
text and structured chrome (Slack, Outlook, Cursor, Terminal,
Confluence, GitHub, 1Password, calendars, browsers).
2. **Computer-use traces** — the visual frames an agentic model
(Claude Computer Use, GPT operator, etc.) reads when it controls a
desktop.
3. **Accessibility-tree visualizations** — when an agent screenshots
what it inferred from the AX tree to debug a tool call.
These surfaces are **dense, multi-PII, semi-structured** in ways no
prose-trained PII detector handles well. Returns pixel-space bounding
boxes for 12 canonical PII categories.
ONNX, ~108 MB. Same `.onnx` ships across macOS / Windows / Linux —
the user's ONNX Runtime selects the Execution Provider at load time
(CoreML, DirectML, CUDA, or CPU baseline).
> **License: CC BY-NC 4.0** (non-commercial). For commercial use —
> production redaction, SaaS / API embedding, AI-agent privacy
> middleware, custom fine-tunes — contact **louis@screenpi.pe**. See
> [`LICENSE`](LICENSE).
## Headline numbers
`rfdetr_v8` on a held-out 221-image validation split (190 PII-bearing,
31 hard negatives) of the [screenpipe-pii-bench-image](https://github.com/screenpipe/screenpipe-pii-bench-image)
corpus, IoU ≥ 0.30:
| metric | this model | regex+OCR floor | Microsoft Presidio (published OSS) |
|---|---:|---:|---:|
| **zero-leak** (every gold span caught) | **95.3%** | 2.6% | 0.5% |
| **oversmash** (false-fire on negatives) | **0.0%** | 3.2% | 48.4% |
| micro-precision | 99% | 87% | 47% |
| micro-recall | 97% | 26% | 42% |
| macro-F1 | 0.871 | 0.318 | 0.190 |
Per-label recall (a few highlights): `private_person` 0.99 ·
`private_company` 1.00 · `private_repo` 1.00 · `private_url` 1.00 ·
`secret` 0.99 · `private_email` 0.98 · `private_phone` 0.92 ·
`private_address` 0.92.
### Latency (rfdetr_v8, 320×320 input, FP32)
| platform | EP | p50 |
|-------------------------------|-----------|----------:|
| macOS Apple Silicon (M-series) | CoreML | **66 ms** ([real-screen sample](https://github.com/screenpipe/screenpipe-pii-bench-image)) |
| macOS Apple Silicon (M-series) | CPU | 163 ms |
| Windows + DX12 GPU | DirectML | ~30-60 ms (estimated) |
| Linux + NVIDIA | CUDA | ~10-20 ms (estimated) |
| Linux/Windows CPU-only | CPU | ~140 ms |
Same `.onnx` everywhere — Execution Provider is selected at load time
by the user's ONNX Runtime build. **No CUDA / Vulkan / GPU vendor SDKs
required at the consumer.**
## Why this exists (vs Presidio Image Redactor and friends)
The published baselines are trained on prose / generic-document
imagery. A typical screenpipe frame looks nothing like that:
- A Slack channel sidebar with 8 names, 12 channel mentions, 3 emails,
and 1 pasted AWS key — all in 1440×900 px at 14 px font.
- A 1Password vault entry with structured `[Username | Password |
Server | One-time password]` rows, half of which are masked dots.
- A Cursor workspace open on `.env.production` with five secret-shaped
values stacked top-to-bottom.
These images are **dense** (10-20 PII spans per frame), **structured**
(rows / columns / aligned chrome), and **layout-cued** (a thing in the
"Username" cell is a username regardless of its surface text). A
generic NER-on-OCR pipeline misfires by over-redacting UI chrome
(48% false-fire on negatives in our bench, vs. 0% for this model).
If you're building an **agentic system that reads screen state** — a
desktop-control agent, a memory layer for browsing, anything that
streams screen captures into an LLM — this is the redactor designed
for that pipe.
## What it does
Per-image **object detection**. Given a JPG or PNG, returns
`[(bbox, label, score)]` where each detection is a region the model
thinks is PII, classified into one of the 12 canonical categories
shared with [`screenpipe/pii-redactor`](https://huggingface.co/screenpipe/pii-redactor):
```
private_person, private_email, private_phone, private_address,
private_url, private_company, private_repo, private_handle,
private_channel, private_id, private_date, secret
```
`secret` covers passwords, API keys, JWTs, DB connection strings,
PRIVATE-KEY block markers, etc. — same coverage as the text model.
## Inference
```python
# pip install onnxruntime pillow numpy
import numpy as np
import onnxruntime as ort
from PIL import Image
CLASSES = [
"private_person", "private_email", "private_phone",
"private_address", "private_url", "private_company",
"private_repo", "private_handle", "private_channel",
"private_id", "private_date", "secret",
]
INPUT_SIZE = 320 # rfdetr_v8 was exported at 320x320
THRESHOLD = 0.30
sess = ort.InferenceSession(
"rfdetr_v8.onnx",
providers=["CoreMLExecutionProvider", "CPUExecutionProvider"],
)
img = Image.open("screenshot.png").convert("RGB")
W, H = img.size
resized = img.resize((INPUT_SIZE, INPUT_SIZE), Image.BILINEAR)
arr = np.asarray(resized, dtype=np.float32) / 255.0
arr = (arr - [0.485, 0.456, 0.406]) / [0.229, 0.224, 0.225]
arr = arr.transpose(2, 0, 1)[None].astype(np.float32) # NCHW
boxes, logits = sess.run(None, {sess.get_inputs()[0].name: arr})
boxes = boxes[0] # (300, 4) cx, cy, w, h normalized
logits = logits[0] # (300, 13) — last channel is "no-object"
probs = 1.0 / (1.0 + np.exp(-logits[:, :12])) # per-class sigmoid
best_class = probs.argmax(axis=1)
best_score = probs[np.arange(300), best_class]
keep = best_score >= THRESHOLD
for q in np.where(keep)[0]:
cx, cy, bw, bh = boxes[q]
x1 = (cx - bw / 2) * W
y1 = (cy - bh / 2) * H
print(f" {CLASSES[best_class[q]]:18} score={best_score[q]:.2f} "
f"bbox=[{int(x1)}, {int(y1)}, {int(bw*W)}, {int(bh*H)}]")
```
Full example with image overlay → `examples/inference.py`.
For Rust integration via the `ort` crate, see the
[`rust_smoke/`](https://github.com/screenpipe/screenpipe-pii-bench-image/tree/main/rust_smoke)
prototype and the production wiring in PR
[`screenpipe/screenpipe#3188`](https://github.com/screenpipe/screenpipe/pull/3188).
## Redacting the image (vs. just detecting)
This model **detects**. To actually remove the PII, draw a solid
rectangle over each detected bbox. Solid black, **not blur** — blur
is reversible by super-resolution attacks; opaque rectangles aren't.
```python
from PIL import ImageDraw
draw = ImageDraw.Draw(img)
for det in detections: # from the snippet above
x, y, w, h = det.bbox
draw.rectangle([x, y, x + w, y + h], fill=(0, 0, 0))
img.save("screenshot_redacted.png")
```
That's the entire redactor wrapper. ~5 lines.
## Architecture
- Base: [RF-DETR-Nano](https://github.com/roboflow/rf-detr) (Roboflow,
ICLR 2026) — DINOv2-backbone real-time detection transformer, ~25 M
params, claims first real-time model to break 60 mAP on COCO.
- Fine-tuned at 320×320 input on a 2,833-image synthetic + WebPII
union (synthetic via DOM-truth bbox extraction; WebPII via the
[arxiv 2603.17357 release](https://arxiv.org/abs/2603.17357)).
- Output head: 300 detection queries × 13 channels (12 PII classes +
no-object). Per-class sigmoid (NOT softmax — RF-DETR uses
independent classification per query).
- Trained on a single A100 80 GB; ~100 minutes wall-clock for the
best-EMA epoch.
## What was the training data
| source | size | labels | notes |
|---|---:|---|---|
| **synthetic bench** | 2,206 imgs | DOM-truth bboxes (pixel-perfect) | 9 templates rendered via headless Chromium with `data-span` attributes — labels come from the same DOM tree the browser laid out. |
| **WebPII** | 500 imgs (balanced sample) | bbox-labeled by the original authors | March 2026 release, e-commerce screenshots. Class-imbalance capped at 2× our synthetic frequency. |
| **cascade auto-labels** | 100 imgs | OCR + text-PII model alignment | Old screenshots from this project's own bench, weakly labeled. |
**No real user data was used during fine-tuning.** Membership
inference attacks recover no real-user content because no real-user
content was in the training set. If you discover a failure mode on
your real screens, the project's recipe is to add a new SYNTHETIC
template that reproduces it — the screenshot becomes a bug report,
never a training row.
## Limitations
1. **Hand-curated gold set is small** — bench `data/` has 5
manually-built cases. Larger-scale held-out evaluation depends on
the synthetic corpus, which is in-distribution by construction.
2. **`private_handle` and `private_id` recall are 0%** in the
reference numbers because the val split has only 2 and 1 examples
respectively. Don't deploy without a domain-specific eval pass.
3. **Synthetic-template ceiling.** 95.3% zero-leak is the bench's
stable ceiling at this corpus size. Gains beyond come from training
on more real-screen failure modes (tracked in the bench's backlog).
4. **WebPII is e-commerce-heavy.** Adding the full WebPII split
actually *hurt* dev-app accuracy in our experiments (rfdetr_v4 at
90.5% zero-leak vs. v8's 95.3%). The 500-image balanced sample is
our best-of-both compromise.
5. **CPU-only floors at ~140 ms p50.** INT8 quantization (planned)
gets that under 100 ms, but the FP32 release is what's on this
page today.
6. **English-only.** Synthetic templates render Latin-script text;
the WebPII supplement is English. CJK / Arabic / Cyrillic not
evaluated — don't deploy without a locale-specific eval.
7. **Adversarial robustness not tested.** A user who knows the
detector exists could craft layouts that confuse it (handwritten
PII, embedded-image PII, partial occlusion). Use this for
honest-user privacy, not as a security boundary.
## Files
```
rfdetr_v8.onnx 108 MB · the model · sha256 below
README.md this file
LICENSE CC BY-NC 4.0
NOTICE attribution to base model + datasets
examples/
inference.py the snippet above, runnable
```
SHA-256 of `rfdetr_v8.onnx`:
`431acc0f0beb22a39572b7a50af4fc446e799840fb71320dc124fbd79a121eb3`
## Reproducing inference
```bash
git clone https://huggingface.co/screenpipe/pii-image-redactor
cd pii-image-redactor
git lfs pull
pip install onnxruntime pillow numpy
python examples/inference.py path/to/your_screenshot.png
```
Reproducing the eval scores requires the screenpipe-pii-bench-image
benchmark, which is not redistributed (it's the training corpus).
Contact **louis@screenpi.pe** for benchmark access or commercial
licensing.
## License
[CC BY-NC 4.0](LICENSE) — non-commercial use only. The base model
(RF-DETR) is Apache-2.0; obligations are preserved (see
[`NOTICE`](NOTICE)).
For commercial licensing (production deployment, redistribution
rights, SaaS / API embedding, custom fine-tunes for your domain):
**louis@screenpi.pe**.
## Citation
```bibtex
@misc{screenpipe-pii-image-redactor-2026,
title = {screenpipe-pii-image-redactor: a screen-PII detector for
accessibility-aware agents},
author = {{screenpipe}},
year = {2026},
url = {https://huggingface.co/screenpipe/pii-image-redactor}
}
```
|