public-ready: vague methodology, headline numbers only
Browse files
README.md
CHANGED
|
@@ -21,8 +21,6 @@ tags:
|
|
| 21 |
- screenpipe
|
| 22 |
base_model:
|
| 23 |
- openai/privacy-filter
|
| 24 |
-
datasets:
|
| 25 |
-
- ai4privacy/pii-masking-300k
|
| 26 |
metrics:
|
| 27 |
- f1
|
| 28 |
- recall
|
|
@@ -47,44 +45,39 @@ sees a user's machine through**:
|
|
| 47 |
screen recordings. Mix of window-title-shaped artifacts, app chrome,
|
| 48 |
and occasional long-form (emails, docs).
|
| 49 |
3. **Computer-use traces** — what an agentic model (Claude Computer Use,
|
| 50 |
-
GPT operator, etc.) reads when it controls a desktop.
|
| 51 |
-
the above plus interaction-trace metadata.
|
| 52 |
|
| 53 |
These surfaces are short, sparse-context, and full of identifiers that
|
| 54 |
slip past redactors trained on chat-style prose. This model is fine-tuned
|
| 55 |
-
specifically for them — while still handling long-form text
|
| 56 |
-
|
| 57 |
|
| 58 |
Built on top of the [OpenAI Privacy Filter](https://github.com/openai/privacy-filter)
|
| 59 |
-
(1.5B parameters, 50M active).
|
| 60 |
-
synthetic accessibility / window-title / OCR data, a slice of
|
| 61 |
-
[ai4privacy/pii-masking-300k](https://huggingface.co/datasets/ai4privacy/pii-masking-300k),
|
| 62 |
-
and targeted secret-shape augmentation (API keys, JWTs, DB connection
|
| 63 |
-
strings, private-key block markers, password prompts).
|
| 64 |
|
| 65 |
> **License: CC BY-NC 4.0** (non-commercial). For commercial use —
|
| 66 |
> production redaction, SaaS / API embedding, AI-agent privacy
|
| 67 |
> middleware, custom fine-tunes — contact **hi@louis030195.com**. See
|
| 68 |
> [`LICENSE`](LICENSE).
|
| 69 |
|
| 70 |
-
##
|
| 71 |
|
| 72 |
-
| | base OPF | **this model** |
|
| 73 |
-
|---|---:|---:|
|
| 74 |
-
| Accessibility / window-title PII zero-leak
|
| 75 |
-
| Long-form
|
| 76 |
-
|
|
| 77 |
-
| Targeted secret-redaction
|
| 78 |
-
| p50 inference latency (CUDA) | ~23 ms | ~23 ms |
|
| 79 |
|
| 80 |
-
|
|
|
|
| 81 |
|
| 82 |
## Why this exists (vs the base Privacy Filter)
|
| 83 |
|
| 84 |
The OpenAI Privacy Filter (and most other public PII redactors) is
|
| 85 |
-
trained on prose-shaped data
|
| 86 |
-
|
| 87 |
-
log line looks nothing like that:
|
| 88 |
|
| 89 |
```
|
| 90 |
AXButton[Send to marcus@helios-ai.io]
|
|
@@ -95,14 +88,12 @@ Welcome | Acme Corp | xAI Console
|
|
| 95 |
These are 30-character strings with one or two PII tokens and almost
|
| 96 |
no surrounding context. A model trained on chat corpora will conflate
|
| 97 |
brand names with people, miss `Arc | Marcus Chen` because it expects
|
| 98 |
-
sentence context, and tag `Raycast` and `Claude` as people.
|
| 99 |
-
Privacy Filter scored 38.6 % zero-leak on this surface; this model
|
| 100 |
-
scores 79.1 %.
|
| 101 |
|
| 102 |
If you're building an **agentic system that reads screen state** — a
|
| 103 |
desktop-control agent, a memory layer for browsing, anything that
|
| 104 |
-
streams accessibility/OCR/screen-capture data into an LLM — this
|
| 105 |
-
the redactor designed for that pipe.
|
| 106 |
|
| 107 |
## What it does
|
| 108 |
|
|
@@ -117,28 +108,7 @@ private_channel, private_id, private_date, secret
|
|
| 117 |
```
|
| 118 |
|
| 119 |
`secret` covers passwords, API keys, JWTs, DB connection strings,
|
| 120 |
-
PRIVATE-KEY block markers, etc.
|
| 121 |
-
model catches 31 of 34 realistic secret shapes — see Limitations for
|
| 122 |
-
the lone known miss.
|
| 123 |
-
|
| 124 |
-
## Architecture
|
| 125 |
-
|
| 126 |
-
Identical to the upstream Privacy Filter. We did not modify the model
|
| 127 |
-
architecture. We re-initialized the output head for our 12-label space
|
| 128 |
-
(49 output classes after BIOES tagging + O), fine-tuned on a mixed
|
| 129 |
-
corpus, with `n_ctx` raised from 128 → 256 to accommodate sentence-level
|
| 130 |
-
context.
|
| 131 |
-
|
| 132 |
-
| | |
|
| 133 |
-
|---|---|
|
| 134 |
-
| Base | OpenAI Privacy Filter (1.5B params, 50M active) |
|
| 135 |
-
| Output head | 49-class (12 × BIOES + O), 29 rows copied exactly from base, 20 fallback (zero-init) |
|
| 136 |
-
| Dtype | bfloat16 |
|
| 137 |
-
| Encoding | `o200k_base` |
|
| 138 |
-
| Training | 3 epochs, batch_size 4, lr 1e-4, n_ctx 256 |
|
| 139 |
-
| Hardware | 1 × NVIDIA A100 SXM4 40GB |
|
| 140 |
-
| Training time | ~11 minutes |
|
| 141 |
-
| Best epoch | 2 (val_loss 0.106) |
|
| 142 |
|
| 143 |
## Inference
|
| 144 |
|
|
@@ -154,147 +124,63 @@ for span in out.detected_spans:
|
|
| 154 |
```
|
| 155 |
|
| 156 |
See [`examples/inference.py`](examples/inference.py) for a longer example
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
##
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
|
| 168 |
-
|
|
| 169 |
-
|
|
| 170 |
-
|
|
| 171 |
-
|
|
| 172 |
-
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
##
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
|---|---:|---:|---:|---:|
|
| 180 |
-
| **this model** | **77.5% (74.5–80.3)** | 16.5% | **0.934** | **0.933** |
|
| 181 |
-
| previous internal version | 74.5% (71.8–77.5) | 9.1% | 0.763 | 0.932 |
|
| 182 |
-
| OpenAI Privacy Filter (base) | 14.0% (11.7–16.2) | 16.5% | 0.591 | 0.579 |
|
| 183 |
-
|
| 184 |
-
> **What "14% zero-leak" for the base actually means** (read this before
|
| 185 |
-
> citing the gap). Zero-leak is a strict, taxonomy-coupled metric: a
|
| 186 |
-
> single example counts as "leaked" if the model misses ANY gold span
|
| 187 |
-
> in it under our 12-class label mapping. The published OpenAI Privacy
|
| 188 |
-
> Filter result is **F1 ≈ 96 %** on PII-Masking-300k under THEIR
|
| 189 |
-
> 49-class taxonomy — that's a much more lenient setup. The base scores
|
| 190 |
-
> 14 % zero-leak under our metric for two compounding reasons:
|
| 191 |
-
>
|
| 192 |
-
> 1. **Label-space mismatch** dominates. We map 28 source 300k labels
|
| 193 |
-
> into our 12 classes; the base model can't predict our label names.
|
| 194 |
-
> On categories where the base's native taxonomy DOES align with ours
|
| 195 |
-
> (`private_email`, `private_phone`, `private_url`, `secret`), the
|
| 196 |
-
> base scores **0.90–1.00 recall** — strong. On categories where it
|
| 197 |
-
> doesn't (`private_id` covering IDCARD/SOCIALNUMBER/PASSPORT,
|
| 198 |
-
> `private_handle` covering USERNAME), it scores **0.00** by
|
| 199 |
-
> definition because it never emits the right label.
|
| 200 |
-
> 2. **Zero-leak is all-or-nothing per example.** With ~6 spans per
|
| 201 |
-
> 300k example and any unmappable category present, base fails the
|
| 202 |
-
> whole example. Token-level F1 (0.591 above) is the more honest
|
| 203 |
-
> cross-comparison number.
|
| 204 |
-
>
|
| 205 |
-
> The +63 pp claim **is** real and useful for the deployment context
|
| 206 |
-
> (anyone shipping a system that needs the screenpipe 12-class
|
| 207 |
-
> taxonomy gets +63 pp out of the box vs the base). It would be
|
| 208 |
-
> misleading to read it as "this model is 5× more accurate at PII
|
| 209 |
-
> detection" — that's not what the metric measures.
|
| 210 |
-
|
| 211 |
-
### Multilingual generalization (n=200 per language)
|
| 212 |
-
|
| 213 |
-
This model was trained on English-only data. Cross-language transfer:
|
| 214 |
-
|
| 215 |
-
| Language | this model zero-leak | base zero-leak | Δ vs base |
|
| 216 |
-
|---|---:|---:|---:|
|
| 217 |
-
| English | 76.8% (70.1–83.1) | 14.0% (11.7–16.2) | +62.8 |
|
| 218 |
-
| Spanish | 73.2% (66.5–79.3) | — | — |
|
| 219 |
-
| Italian | 70.8% (64.3–77.4) | — | — |
|
| 220 |
-
| German | 70.6% (63.5–77.1) | 11.8% (7.6–16.5) | +58.8 |
|
| 221 |
-
| French | 68.1% (61.5–75.3) | 14.8% (9.9–20.3) | +53.3 |
|
| 222 |
-
| Dutch | 56.1% (48.9–63.3) | — | — |
|
| 223 |
-
|
| 224 |
-
Romance + Germanic languages drop −3 to −9 pp from English. **Dutch is
|
| 225 |
-
the weakest at −20.7 pp** — flagged as a known gap.
|
| 226 |
-
|
| 227 |
-
### Per-category recall (English, n=1000)
|
| 228 |
-
|
| 229 |
-
| Category | base | this model |
|
| 230 |
-
|---|---:|---:|
|
| 231 |
-
| `private_address` | 0.65 | 0.93 |
|
| 232 |
-
| `private_date` | 0.54 | 0.96 |
|
| 233 |
-
| `private_email` | 1.00 | 0.97 |
|
| 234 |
-
| `private_handle` | 0.00 | 0.82 |
|
| 235 |
-
| `private_id` | 0.00 | 0.95 |
|
| 236 |
-
| `private_person` | 0.71 | 0.93 |
|
| 237 |
-
| `private_phone` | 0.97 | 0.93 |
|
| 238 |
-
| `private_url` | 0.98 | 1.00 |
|
| 239 |
-
| `secret` | 0.90 | 0.90 |
|
| 240 |
-
|
| 241 |
-
## Limitations and known failure modes
|
| 242 |
-
|
| 243 |
-
1. **Sudo / login password prompts leak.** A pattern like `[sudo]
|
| 244 |
password for alice: hunter2` results in the username being redacted
|
| 245 |
-
but the password surviving.
|
| 246 |
-
|
| 247 |
-
|
| 248 |
-
|
| 249 |
-
|
| 250 |
-
Germanic languages other than Dutch generalize at −3 to −9 pp. Indic,
|
| 251 |
-
Asian, African, Cyrillic scripts NOT evaluated at meaningful sample
|
| 252 |
sizes — don't deploy without a locale-specific eval pass.
|
| 253 |
-
3. **
|
| 254 |
-
|
| 255 |
-
|
| 256 |
-
same distribution). The window-title score (79.1 %) is the cleaner
|
| 257 |
-
generalization signal.
|
| 258 |
-
4. **Synthetic training data only.** Validated qualitatively on real
|
| 259 |
-
screen captures, but the corpus is fully synthetic. Validate on
|
| 260 |
-
YOUR data before deploying.
|
| 261 |
-
5. **Single-annotator gold labels** on the in-bench data. Absolute
|
| 262 |
-
numbers may shift under a 2nd-annotator pass; relative ordering
|
| 263 |
-
between adapters is more stable.
|
| 264 |
-
6. **Oversmash is non-trivial.** 7.8 % on window titles, 16.5 % on
|
| 265 |
long-form text. The model over-redacts. Acceptable for privacy-first
|
| 266 |
deployments; flag if you need clean OCR text downstream.
|
| 267 |
-
|
| 268 |
-
|
| 269 |
-
|
| 270 |
-
|
| 271 |
-
|
| 272 |
-
## Reproducing the inference numbers
|
| 273 |
|
| 274 |
-
|
| 275 |
-
repository. Inference is reproducible from the artifacts in this repo:
|
| 276 |
|
| 277 |
```bash
|
| 278 |
-
git clone https://
|
| 279 |
cd pii-redactor
|
| 280 |
-
|
| 281 |
-
# pull the model weights via Git LFS
|
| 282 |
git lfs pull
|
| 283 |
-
|
| 284 |
-
# install opf (currently from source)
|
| 285 |
pip install git+https://github.com/openai/privacy-filter.git
|
| 286 |
-
|
| 287 |
-
# run the inference example
|
| 288 |
python examples/inference.py
|
| 289 |
```
|
| 290 |
|
| 291 |
-
|
| 292 |
-
**hi@louis030195.com** for benchmark access
|
| 293 |
-
commercial
|
| 294 |
|
| 295 |
## License
|
| 296 |
|
| 297 |
-
[CC BY-NC 4.0](LICENSE) — non-commercial use only.
|
|
|
|
| 298 |
|
| 299 |
For commercial licensing (production deployment, redistribution rights,
|
| 300 |
SaaS / API embedding, custom fine-tunes for your domain): **hi@louis030195.com**.
|
|
|
|
| 21 |
- screenpipe
|
| 22 |
base_model:
|
| 23 |
- openai/privacy-filter
|
|
|
|
|
|
|
| 24 |
metrics:
|
| 25 |
- f1
|
| 26 |
- recall
|
|
|
|
| 45 |
screen recordings. Mix of window-title-shaped artifacts, app chrome,
|
| 46 |
and occasional long-form (emails, docs).
|
| 47 |
3. **Computer-use traces** — what an agentic model (Claude Computer Use,
|
| 48 |
+
GPT operator, etc.) reads when it controls a desktop.
|
|
|
|
| 49 |
|
| 50 |
These surfaces are short, sparse-context, and full of identifiers that
|
| 51 |
slip past redactors trained on chat-style prose. This model is fine-tuned
|
| 52 |
+
specifically for them — while still handling long-form text at
|
| 53 |
+
competitive accuracy.
|
| 54 |
|
| 55 |
Built on top of the [OpenAI Privacy Filter](https://github.com/openai/privacy-filter)
|
| 56 |
+
(1.5B parameters, 50M active).
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
> **License: CC BY-NC 4.0** (non-commercial). For commercial use —
|
| 59 |
> production redaction, SaaS / API embedding, AI-agent privacy
|
| 60 |
> middleware, custom fine-tunes — contact **hi@louis030195.com**. See
|
| 61 |
> [`LICENSE`](LICENSE).
|
| 62 |
|
| 63 |
+
## Headline numbers
|
| 64 |
|
| 65 |
+
| | base OPF | **this model** |
|
| 66 |
+
|---|---:|---:|
|
| 67 |
+
| Accessibility / window-title PII zero-leak | 38.6% (33.6–43.8) | **79.1% (74.8–83.5)** |
|
| 68 |
+
| Long-form PII zero-leak (English) | 14.0% (11.7–16.2) | **77.5% (74.5–80.3)** |
|
| 69 |
+
| Long-form PII macro-F1 (English) | 0.591 | **0.934** |
|
| 70 |
+
| Targeted secret-redaction (34 realistic shapes) | not measured | **31/34** |
|
| 71 |
+
| p50 inference latency (CUDA) | ~23 ms | ~23 ms |
|
| 72 |
|
| 73 |
+
95% bootstrap CIs in brackets. Zero-leak: % of cases where the model
|
| 74 |
+
caught all gold spans (the metric that matters for privacy).
|
| 75 |
|
| 76 |
## Why this exists (vs the base Privacy Filter)
|
| 77 |
|
| 78 |
The OpenAI Privacy Filter (and most other public PII redactors) is
|
| 79 |
+
trained on prose-shaped data. A typical accessibility-tree node, OCR'd
|
| 80 |
+
window title, or computer-use log line looks nothing like that:
|
|
|
|
| 81 |
|
| 82 |
```
|
| 83 |
AXButton[Send to marcus@helios-ai.io]
|
|
|
|
| 88 |
These are 30-character strings with one or two PII tokens and almost
|
| 89 |
no surrounding context. A model trained on chat corpora will conflate
|
| 90 |
brand names with people, miss `Arc | Marcus Chen` because it expects
|
| 91 |
+
sentence context, and tag `Raycast` and `Claude` as people.
|
|
|
|
|
|
|
| 92 |
|
| 93 |
If you're building an **agentic system that reads screen state** — a
|
| 94 |
desktop-control agent, a memory layer for browsing, anything that
|
| 95 |
+
streams accessibility / OCR / screen-capture data into an LLM — this
|
| 96 |
+
is the redactor designed for that pipe.
|
| 97 |
|
| 98 |
## What it does
|
| 99 |
|
|
|
|
| 108 |
```
|
| 109 |
|
| 110 |
`secret` covers passwords, API keys, JWTs, DB connection strings,
|
| 111 |
+
PRIVATE-KEY block markers, etc.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 112 |
|
| 113 |
## Inference
|
| 114 |
|
|
|
|
| 124 |
```
|
| 125 |
|
| 126 |
See [`examples/inference.py`](examples/inference.py) for a longer example
|
| 127 |
+
covering window titles, long-form text, and secrets.
|
| 128 |
+
|
| 129 |
+
## Multilingual
|
| 130 |
+
|
| 131 |
+
This model handles 6 languages. Performance on a public long-form PII
|
| 132 |
+
benchmark (n=200 per language):
|
| 133 |
+
|
| 134 |
+
| Language | zero-leak |
|
| 135 |
+
|---|---:|
|
| 136 |
+
| English | 76.8% (70.1–83.1) |
|
| 137 |
+
| Spanish | 73.2% (66.5–79.3) |
|
| 138 |
+
| Italian | 70.8% (64.3–77.4) |
|
| 139 |
+
| German | 70.6% (63.5–77.1) |
|
| 140 |
+
| French | 68.1% (61.5–75.3) |
|
| 141 |
+
| Dutch | 56.1% (48.9–63.3) |
|
| 142 |
+
|
| 143 |
+
Romance + Germanic languages drop −3 to −9 pp from English.
|
| 144 |
+
**Dutch is the weakest** — flagged as a known gap.
|
| 145 |
+
|
| 146 |
+
## Limitations
|
| 147 |
+
|
| 148 |
+
1. **Sudo / login password prompts leak.** Pattern like `[sudo]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 149 |
password for alice: hunter2` results in the username being redacted
|
| 150 |
+
but the password surviving. One known hard miss in the targeted
|
| 151 |
+
secret probe; mitigate with an OS-level keystroke-suppression policy
|
| 152 |
+
alongside this model.
|
| 153 |
+
2. **Dutch is the weakest language** at −20.7 pp from English. Indic,
|
| 154 |
+
Asian, African, Cyrillic scripts not evaluated at meaningful sample
|
|
|
|
|
|
|
| 155 |
sizes — don't deploy without a locale-specific eval pass.
|
| 156 |
+
3. **Synthetic training data only.** No real user data was used during
|
| 157 |
+
fine-tuning. Validate on YOUR data before deploying.
|
| 158 |
+
4. **Oversmash.** 7.8% on accessibility / window titles, 16.5% on
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 159 |
long-form text. The model over-redacts. Acceptable for privacy-first
|
| 160 |
deployments; flag if you need clean OCR text downstream.
|
| 161 |
+
5. **Strict label-space evaluation.** The numbers above use a
|
| 162 |
+
12-class taxonomy and a strict per-example zero-leak metric.
|
| 163 |
+
Absolute values depend on the evaluator's label taxonomy and metric
|
| 164 |
+
choice; macro-F1 is a more lenient point of comparison.
|
|
|
|
|
|
|
| 165 |
|
| 166 |
+
## Reproducing inference
|
|
|
|
| 167 |
|
| 168 |
```bash
|
| 169 |
+
git clone https://huggingface.co/screenpipe/pii-redactor
|
| 170 |
cd pii-redactor
|
|
|
|
|
|
|
| 171 |
git lfs pull
|
|
|
|
|
|
|
| 172 |
pip install git+https://github.com/openai/privacy-filter.git
|
|
|
|
|
|
|
| 173 |
python examples/inference.py
|
| 174 |
```
|
| 175 |
|
| 176 |
+
Reproducing the eval scores requires our held-out benchmark, which is
|
| 177 |
+
not redistributed. Contact **hi@louis030195.com** for benchmark access
|
| 178 |
+
or commercial licensing.
|
| 179 |
|
| 180 |
## License
|
| 181 |
|
| 182 |
+
[CC BY-NC 4.0](LICENSE) — non-commercial use only. The base model is
|
| 183 |
+
Apache-2.0; obligations are preserved (see [`NOTICE`](NOTICE)).
|
| 184 |
|
| 185 |
For commercial licensing (production deployment, redistribution rights,
|
| 186 |
SaaS / API embedding, custom fine-tunes for your domain): **hi@louis030195.com**.
|