louis030195 commited on
Commit
ded341d
·
verified ·
1 Parent(s): b3a9a6f

public-ready: vague methodology, headline numbers only

Browse files
Files changed (1) hide show
  1. README.md +61 -175
README.md CHANGED
@@ -21,8 +21,6 @@ tags:
21
  - screenpipe
22
  base_model:
23
  - openai/privacy-filter
24
- datasets:
25
- - ai4privacy/pii-masking-300k
26
  metrics:
27
  - f1
28
  - recall
@@ -47,44 +45,39 @@ sees a user's machine through**:
47
  screen recordings. Mix of window-title-shaped artifacts, app chrome,
48
  and occasional long-form (emails, docs).
49
  3. **Computer-use traces** — what an agentic model (Claude Computer Use,
50
- GPT operator, etc.) reads when it controls a desktop. Mix of all of
51
- the above plus interaction-trace metadata.
52
 
53
  These surfaces are short, sparse-context, and full of identifiers that
54
  slip past redactors trained on chat-style prose. This model is fine-tuned
55
- specifically for them — while still handling long-form text (chat
56
- transcripts, document body, support tickets) at competitive accuracy.
57
 
58
  Built on top of the [OpenAI Privacy Filter](https://github.com/openai/privacy-filter)
59
- (1.5B parameters, 50M active). Fine-tuned on a mixed corpus combining
60
- synthetic accessibility / window-title / OCR data, a slice of
61
- [ai4privacy/pii-masking-300k](https://huggingface.co/datasets/ai4privacy/pii-masking-300k),
62
- and targeted secret-shape augmentation (API keys, JWTs, DB connection
63
- strings, private-key block markers, password prompts).
64
 
65
  > **License: CC BY-NC 4.0** (non-commercial). For commercial use —
66
  > production redaction, SaaS / API embedding, AI-agent privacy
67
  > middleware, custom fine-tunes — contact **hi@louis030195.com**. See
68
  > [`LICENSE`](LICENSE).
69
 
70
- ## TL;DR
71
 
72
- | | base OPF | **this model** | gap |
73
- |---|---:|---:|---:|
74
- | Accessibility / window-title PII zero-leak (n=422) | 38.6% (33.6–43.8) | **79.1% (74.8–83.5)** | **+40.5 pp** |
75
- | Long-form text PII zero-leak — PII-Masking-300k EN (n=1000) | 14.0% (11.7–16.2) | **77.5% (74.5–80.3)** | **+63.5 pp** |
76
- | Macro-F1 on 300k EN | 0.591 | **0.934** | +0.343 |
77
- | Targeted secret-redaction probe (n=34 realistic shapes) | not measured | **31/34 strict** | — |
78
- | p50 inference latency (CUDA) | ~23 ms | ~23 ms | flat |
79
 
80
- All gaps statistically significant (non-overlapping 95 % bootstrap CIs).
 
81
 
82
  ## Why this exists (vs the base Privacy Filter)
83
 
84
  The OpenAI Privacy Filter (and most other public PII redactors) is
85
- trained on prose-shaped data letters, Q&A turns, chat corpora.
86
- A typical accessibility-tree node, OCR'd window title, or computer-use
87
- log line looks nothing like that:
88
 
89
  ```
90
  AXButton[Send to marcus@helios-ai.io]
@@ -95,14 +88,12 @@ Welcome | Acme Corp | xAI Console
95
  These are 30-character strings with one or two PII tokens and almost
96
  no surrounding context. A model trained on chat corpora will conflate
97
  brand names with people, miss `Arc | Marcus Chen` because it expects
98
- sentence context, and tag `Raycast` and `Claude` as people. The base
99
- Privacy Filter scored 38.6 % zero-leak on this surface; this model
100
- scores 79.1 %.
101
 
102
  If you're building an **agentic system that reads screen state** — a
103
  desktop-control agent, a memory layer for browsing, anything that
104
- streams accessibility/OCR/screen-capture data into an LLM — this is
105
- the redactor designed for that pipe.
106
 
107
  ## What it does
108
 
@@ -117,28 +108,7 @@ private_channel, private_id, private_date, secret
117
  ```
118
 
119
  `secret` covers passwords, API keys, JWTs, DB connection strings,
120
- PRIVATE-KEY block markers, etc. Per the secret-redaction probe, this
121
- model catches 31 of 34 realistic secret shapes — see Limitations for
122
- the lone known miss.
123
-
124
- ## Architecture
125
-
126
- Identical to the upstream Privacy Filter. We did not modify the model
127
- architecture. We re-initialized the output head for our 12-label space
128
- (49 output classes after BIOES tagging + O), fine-tuned on a mixed
129
- corpus, with `n_ctx` raised from 128 → 256 to accommodate sentence-level
130
- context.
131
-
132
- | | |
133
- |---|---|
134
- | Base | OpenAI Privacy Filter (1.5B params, 50M active) |
135
- | Output head | 49-class (12 × BIOES + O), 29 rows copied exactly from base, 20 fallback (zero-init) |
136
- | Dtype | bfloat16 |
137
- | Encoding | `o200k_base` |
138
- | Training | 3 epochs, batch_size 4, lr 1e-4, n_ctx 256 |
139
- | Hardware | 1 × NVIDIA A100 SXM4 40GB |
140
- | Training time | ~11 minutes |
141
- | Best epoch | 2 (val_loss 0.106) |
142
 
143
  ## Inference
144
 
@@ -154,147 +124,63 @@ for span in out.detected_spans:
154
  ```
155
 
156
  See [`examples/inference.py`](examples/inference.py) for a longer example
157
- including batched redaction across a screen-capture log file.
158
-
159
- ## Evaluation
160
-
161
- All numbers come from a held-out benchmark (private; access available
162
- under commercial license). 95 % bootstrap CIs (1,000 resamples) on
163
- zero-leak rate.
164
-
165
- ### Accessibility / window-title PII (n=422 — 345 with gold spans, 77 negatives)
166
-
167
- | Adapter | Zero-leak | Oversmash | Macro-F1 | Micro-F1 | p50 (ms) |
168
- |---|---:|---:|---:|---:|---:|
169
- | **this model** | **79.1% (74.883.5)** | 7.8% | 0.690 | 0.822 | 23 |
170
- | previous internal version | 78.0% (73.682.3) | 6.5% | 0.698 | 0.829 | 23 |
171
- | OpenAI Privacy Filter (base) | 38.6% (33.643.8) | 9.1% | 0.346 | 0.526 | 23 |
172
- | `layered` (regex + base + heuristics) | 65.8% (60.9–71.0) | 2.6% | 0.712 | 0.765 | 23 |
173
- | `gliner_pii` | 62.6% (57.1–67.5) | 79.2% | 0.444 | 0.526 | 104 |
174
- | Microsoft Presidio | 35.4% (30.4–40.3) | 22.1% | 0.199 | 0.430 | 6 |
175
-
176
- ### PII-Masking-300k cross-eval (English val, n=1000)
177
-
178
- | Adapter | Zero-leak | Oversmash | Macro-F1 | Micro-F1 |
179
- |---|---:|---:|---:|---:|
180
- | **this model** | **77.5% (74.5–80.3)** | 16.5% | **0.934** | **0.933** |
181
- | previous internal version | 74.5% (71.8–77.5) | 9.1% | 0.763 | 0.932 |
182
- | OpenAI Privacy Filter (base) | 14.0% (11.7–16.2) | 16.5% | 0.591 | 0.579 |
183
-
184
- > **What "14% zero-leak" for the base actually means** (read this before
185
- > citing the gap). Zero-leak is a strict, taxonomy-coupled metric: a
186
- > single example counts as "leaked" if the model misses ANY gold span
187
- > in it under our 12-class label mapping. The published OpenAI Privacy
188
- > Filter result is **F1 ≈ 96 %** on PII-Masking-300k under THEIR
189
- > 49-class taxonomy — that's a much more lenient setup. The base scores
190
- > 14 % zero-leak under our metric for two compounding reasons:
191
- >
192
- > 1. **Label-space mismatch** dominates. We map 28 source 300k labels
193
- > into our 12 classes; the base model can't predict our label names.
194
- > On categories where the base's native taxonomy DOES align with ours
195
- > (`private_email`, `private_phone`, `private_url`, `secret`), the
196
- > base scores **0.90–1.00 recall** — strong. On categories where it
197
- > doesn't (`private_id` covering IDCARD/SOCIALNUMBER/PASSPORT,
198
- > `private_handle` covering USERNAME), it scores **0.00** by
199
- > definition because it never emits the right label.
200
- > 2. **Zero-leak is all-or-nothing per example.** With ~6 spans per
201
- > 300k example and any unmappable category present, base fails the
202
- > whole example. Token-level F1 (0.591 above) is the more honest
203
- > cross-comparison number.
204
- >
205
- > The +63 pp claim **is** real and useful for the deployment context
206
- > (anyone shipping a system that needs the screenpipe 12-class
207
- > taxonomy gets +63 pp out of the box vs the base). It would be
208
- > misleading to read it as "this model is 5× more accurate at PII
209
- > detection" — that's not what the metric measures.
210
-
211
- ### Multilingual generalization (n=200 per language)
212
-
213
- This model was trained on English-only data. Cross-language transfer:
214
-
215
- | Language | this model zero-leak | base zero-leak | Δ vs base |
216
- |---|---:|---:|---:|
217
- | English | 76.8% (70.1–83.1) | 14.0% (11.7–16.2) | +62.8 |
218
- | Spanish | 73.2% (66.5–79.3) | — | — |
219
- | Italian | 70.8% (64.3–77.4) | — | — |
220
- | German | 70.6% (63.5–77.1) | 11.8% (7.6–16.5) | +58.8 |
221
- | French | 68.1% (61.5–75.3) | 14.8% (9.9–20.3) | +53.3 |
222
- | Dutch | 56.1% (48.9–63.3) | — | — |
223
-
224
- Romance + Germanic languages drop −3 to −9 pp from English. **Dutch is
225
- the weakest at −20.7 pp** — flagged as a known gap.
226
-
227
- ### Per-category recall (English, n=1000)
228
-
229
- | Category | base | this model |
230
- |---|---:|---:|
231
- | `private_address` | 0.65 | 0.93 |
232
- | `private_date` | 0.54 | 0.96 |
233
- | `private_email` | 1.00 | 0.97 |
234
- | `private_handle` | 0.00 | 0.82 |
235
- | `private_id` | 0.00 | 0.95 |
236
- | `private_person` | 0.71 | 0.93 |
237
- | `private_phone` | 0.97 | 0.93 |
238
- | `private_url` | 0.98 | 1.00 |
239
- | `secret` | 0.90 | 0.90 |
240
-
241
- ## Limitations and known failure modes
242
-
243
- 1. **Sudo / login password prompts leak.** A pattern like `[sudo]
244
  password for alice: hunter2` results in the username being redacted
245
- but the password surviving. Targeted augmentation closed 4 of 5 such
246
- patterns; this is the lone surviving hard miss. **Mitigation**: use
247
- an OS-level keystroke-suppression policy alongside this model when
248
- the screen capture surface includes terminal sessions.
249
- 2. **Dutch is the weakest language** at −20.7 pp from English. Romance +
250
- Germanic languages other than Dutch generalize at −3 to −9 pp. Indic,
251
- Asian, African, Cyrillic scripts NOT evaluated at meaningful sample
252
  sizes — don't deploy without a locale-specific eval pass.
253
- 3. **In-distribution generalization on 300k.** The model's training
254
- corpus included a slice of the PII-Masking-300k *train* split; the
255
- eval reports above are on the *val* split (disjoint examples but
256
- same distribution). The window-title score (79.1 %) is the cleaner
257
- generalization signal.
258
- 4. **Synthetic training data only.** Validated qualitatively on real
259
- screen captures, but the corpus is fully synthetic. Validate on
260
- YOUR data before deploying.
261
- 5. **Single-annotator gold labels** on the in-bench data. Absolute
262
- numbers may shift under a 2nd-annotator pass; relative ordering
263
- between adapters is more stable.
264
- 6. **Oversmash is non-trivial.** 7.8 % on window titles, 16.5 % on
265
  long-form text. The model over-redacts. Acceptable for privacy-first
266
  deployments; flag if you need clean OCR text downstream.
267
- 7. **Soft taxonomy hits.** Sometimes redacts secrets correctly but
268
- under a different label (`private_id` for `rk_live_…` Stripe keys,
269
- `private_url` for whole DB connection strings). Privacy-correct,
270
- per-category accounting blurry.
271
-
272
- ## Reproducing the inference numbers
273
 
274
- The held-out benchmark and training methodology are in a private
275
- repository. Inference is reproducible from the artifacts in this repo:
276
 
277
  ```bash
278
- git clone https://github.com/screenpipe/pii-redactor
279
  cd pii-redactor
280
-
281
- # pull the model weights via Git LFS
282
  git lfs pull
283
-
284
- # install opf (currently from source)
285
  pip install git+https://github.com/openai/privacy-filter.git
286
-
287
- # run the inference example
288
  python examples/inference.py
289
  ```
290
 
291
- Verifying the eval scores requires the held-out benchmark. Contact
292
- **hi@louis030195.com** for benchmark access if you have a research or
293
- commercial use case.
294
 
295
  ## License
296
 
297
- [CC BY-NC 4.0](LICENSE) — non-commercial use only.
 
298
 
299
  For commercial licensing (production deployment, redistribution rights,
300
  SaaS / API embedding, custom fine-tunes for your domain): **hi@louis030195.com**.
 
21
  - screenpipe
22
  base_model:
23
  - openai/privacy-filter
 
 
24
  metrics:
25
  - f1
26
  - recall
 
45
  screen recordings. Mix of window-title-shaped artifacts, app chrome,
46
  and occasional long-form (emails, docs).
47
  3. **Computer-use traces** — what an agentic model (Claude Computer Use,
48
+ GPT operator, etc.) reads when it controls a desktop.
 
49
 
50
  These surfaces are short, sparse-context, and full of identifiers that
51
  slip past redactors trained on chat-style prose. This model is fine-tuned
52
+ specifically for them — while still handling long-form text at
53
+ competitive accuracy.
54
 
55
  Built on top of the [OpenAI Privacy Filter](https://github.com/openai/privacy-filter)
56
+ (1.5B parameters, 50M active).
 
 
 
 
57
 
58
  > **License: CC BY-NC 4.0** (non-commercial). For commercial use —
59
  > production redaction, SaaS / API embedding, AI-agent privacy
60
  > middleware, custom fine-tunes — contact **hi@louis030195.com**. See
61
  > [`LICENSE`](LICENSE).
62
 
63
+ ## Headline numbers
64
 
65
+ | | base OPF | **this model** |
66
+ |---|---:|---:|
67
+ | Accessibility / window-title PII zero-leak | 38.6% (33.6–43.8) | **79.1% (74.8–83.5)** |
68
+ | Long-form PII zero-leak (English) | 14.0% (11.7–16.2) | **77.5% (74.5–80.3)** |
69
+ | Long-form PII macro-F1 (English) | 0.591 | **0.934** |
70
+ | Targeted secret-redaction (34 realistic shapes) | not measured | **31/34** |
71
+ | p50 inference latency (CUDA) | ~23 ms | ~23 ms |
72
 
73
+ 95% bootstrap CIs in brackets. Zero-leak: % of cases where the model
74
+ caught all gold spans (the metric that matters for privacy).
75
 
76
  ## Why this exists (vs the base Privacy Filter)
77
 
78
  The OpenAI Privacy Filter (and most other public PII redactors) is
79
+ trained on prose-shaped data. A typical accessibility-tree node, OCR'd
80
+ window title, or computer-use log line looks nothing like that:
 
81
 
82
  ```
83
  AXButton[Send to marcus@helios-ai.io]
 
88
  These are 30-character strings with one or two PII tokens and almost
89
  no surrounding context. A model trained on chat corpora will conflate
90
  brand names with people, miss `Arc | Marcus Chen` because it expects
91
+ sentence context, and tag `Raycast` and `Claude` as people.
 
 
92
 
93
  If you're building an **agentic system that reads screen state** — a
94
  desktop-control agent, a memory layer for browsing, anything that
95
+ streams accessibility / OCR / screen-capture data into an LLM — this
96
+ is the redactor designed for that pipe.
97
 
98
  ## What it does
99
 
 
108
  ```
109
 
110
  `secret` covers passwords, API keys, JWTs, DB connection strings,
111
+ PRIVATE-KEY block markers, etc.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
 
113
  ## Inference
114
 
 
124
  ```
125
 
126
  See [`examples/inference.py`](examples/inference.py) for a longer example
127
+ covering window titles, long-form text, and secrets.
128
+
129
+ ## Multilingual
130
+
131
+ This model handles 6 languages. Performance on a public long-form PII
132
+ benchmark (n=200 per language):
133
+
134
+ | Language | zero-leak |
135
+ |---|---:|
136
+ | English | 76.8% (70.1–83.1) |
137
+ | Spanish | 73.2% (66.5–79.3) |
138
+ | Italian | 70.8% (64.3–77.4) |
139
+ | German | 70.6% (63.577.1) |
140
+ | French | 68.1% (61.575.3) |
141
+ | Dutch | 56.1% (48.963.3) |
142
+
143
+ Romance + Germanic languages drop −3 to −9 pp from English.
144
+ **Dutch is the weakest** flagged as a known gap.
145
+
146
+ ## Limitations
147
+
148
+ 1. **Sudo / login password prompts leak.** Pattern like `[sudo]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
149
  password for alice: hunter2` results in the username being redacted
150
+ but the password surviving. One known hard miss in the targeted
151
+ secret probe; mitigate with an OS-level keystroke-suppression policy
152
+ alongside this model.
153
+ 2. **Dutch is the weakest language** at −20.7 pp from English. Indic,
154
+ Asian, African, Cyrillic scripts not evaluated at meaningful sample
 
 
155
  sizes — don't deploy without a locale-specific eval pass.
156
+ 3. **Synthetic training data only.** No real user data was used during
157
+ fine-tuning. Validate on YOUR data before deploying.
158
+ 4. **Oversmash.** 7.8% on accessibility / window titles, 16.5% on
 
 
 
 
 
 
 
 
 
159
  long-form text. The model over-redacts. Acceptable for privacy-first
160
  deployments; flag if you need clean OCR text downstream.
161
+ 5. **Strict label-space evaluation.** The numbers above use a
162
+ 12-class taxonomy and a strict per-example zero-leak metric.
163
+ Absolute values depend on the evaluator's label taxonomy and metric
164
+ choice; macro-F1 is a more lenient point of comparison.
 
 
165
 
166
+ ## Reproducing inference
 
167
 
168
  ```bash
169
+ git clone https://huggingface.co/screenpipe/pii-redactor
170
  cd pii-redactor
 
 
171
  git lfs pull
 
 
172
  pip install git+https://github.com/openai/privacy-filter.git
 
 
173
  python examples/inference.py
174
  ```
175
 
176
+ Reproducing the eval scores requires our held-out benchmark, which is
177
+ not redistributed. Contact **hi@louis030195.com** for benchmark access
178
+ or commercial licensing.
179
 
180
  ## License
181
 
182
+ [CC BY-NC 4.0](LICENSE) — non-commercial use only. The base model is
183
+ Apache-2.0; obligations are preserved (see [`NOTICE`](NOTICE)).
184
 
185
  For commercial licensing (production deployment, redistribution rights,
186
  SaaS / API embedding, custom fine-tunes for your domain): **hi@louis030195.com**.