louis030195 commited on
Commit
3b6d06c
·
verified ·
1 Parent(s): 2af8f9a

upstream-license compliance + rigor disclaimer + correct email

Browse files
Files changed (1) hide show
  1. README.md +31 -4
README.md CHANGED
@@ -30,7 +30,7 @@ metrics:
30
  extra_gated_prompt: >-
31
  This model is licensed CC BY-NC 4.0 (non-commercial). For commercial
32
  use — production deployment, SaaS / API embedding, agent privacy
33
- middleware, custom fine-tunes — contact hi@screenpi.pe.
34
  ---
35
 
36
  # screenpipe-pii-redactor
@@ -64,7 +64,7 @@ strings, private-key block markers, password prompts).
64
 
65
  > **License: CC BY-NC 4.0** (non-commercial). For commercial use —
66
  > production redaction, SaaS / API embedding, AI-agent privacy
67
- > middleware, custom fine-tunes — contact **hi@screenpi.pe**. See
68
  > [`LICENSE`](LICENSE).
69
 
70
  ## TL;DR
@@ -181,6 +181,33 @@ zero-leak rate.
181
  | previous internal version | 74.5% (71.8–77.5) | 9.1% | 0.763 | 0.932 |
182
  | OpenAI Privacy Filter (base) | 14.0% (11.7–16.2) | 16.5% | 0.591 | 0.579 |
183
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
184
  ### Multilingual generalization (n=200 per language)
185
 
186
  This model was trained on English-only data. Cross-language transfer:
@@ -262,7 +289,7 @@ python examples/inference.py
262
  ```
263
 
264
  Verifying the eval scores requires the held-out benchmark. Contact
265
- **hi@screenpi.pe** for benchmark access if you have a research or
266
  commercial use case.
267
 
268
  ## License
@@ -270,7 +297,7 @@ commercial use case.
270
  [CC BY-NC 4.0](LICENSE) — non-commercial use only.
271
 
272
  For commercial licensing (production deployment, redistribution rights,
273
- SaaS / API embedding, custom fine-tunes for your domain): **hi@screenpi.pe**.
274
 
275
  ## Citation
276
 
 
30
  extra_gated_prompt: >-
31
  This model is licensed CC BY-NC 4.0 (non-commercial). For commercial
32
  use — production deployment, SaaS / API embedding, agent privacy
33
+ middleware, custom fine-tunes — contact hi@louis030195.com.
34
  ---
35
 
36
  # screenpipe-pii-redactor
 
64
 
65
  > **License: CC BY-NC 4.0** (non-commercial). For commercial use —
66
  > production redaction, SaaS / API embedding, AI-agent privacy
67
+ > middleware, custom fine-tunes — contact **hi@louis030195.com**. See
68
  > [`LICENSE`](LICENSE).
69
 
70
  ## TL;DR
 
181
  | previous internal version | 74.5% (71.8–77.5) | 9.1% | 0.763 | 0.932 |
182
  | OpenAI Privacy Filter (base) | 14.0% (11.7–16.2) | 16.5% | 0.591 | 0.579 |
183
 
184
+ > **What "14% zero-leak" for the base actually means** (read this before
185
+ > citing the gap). Zero-leak is a strict, taxonomy-coupled metric: a
186
+ > single example counts as "leaked" if the model misses ANY gold span
187
+ > in it under our 12-class label mapping. The published OpenAI Privacy
188
+ > Filter result is **F1 ≈ 96 %** on PII-Masking-300k under THEIR
189
+ > 49-class taxonomy — that's a much more lenient setup. The base scores
190
+ > 14 % zero-leak under our metric for two compounding reasons:
191
+ >
192
+ > 1. **Label-space mismatch** dominates. We map 28 source 300k labels
193
+ > into our 12 classes; the base model can't predict our label names.
194
+ > On categories where the base's native taxonomy DOES align with ours
195
+ > (`private_email`, `private_phone`, `private_url`, `secret`), the
196
+ > base scores **0.90–1.00 recall** — strong. On categories where it
197
+ > doesn't (`private_id` covering IDCARD/SOCIALNUMBER/PASSPORT,
198
+ > `private_handle` covering USERNAME), it scores **0.00** by
199
+ > definition because it never emits the right label.
200
+ > 2. **Zero-leak is all-or-nothing per example.** With ~6 spans per
201
+ > 300k example and any unmappable category present, base fails the
202
+ > whole example. Token-level F1 (0.591 above) is the more honest
203
+ > cross-comparison number.
204
+ >
205
+ > The +63 pp claim **is** real and useful for the deployment context
206
+ > (anyone shipping a system that needs the screenpipe 12-class
207
+ > taxonomy gets +63 pp out of the box vs the base). It would be
208
+ > misleading to read it as "this model is 5× more accurate at PII
209
+ > detection" — that's not what the metric measures.
210
+
211
  ### Multilingual generalization (n=200 per language)
212
 
213
  This model was trained on English-only data. Cross-language transfer:
 
289
  ```
290
 
291
  Verifying the eval scores requires the held-out benchmark. Contact
292
+ **hi@louis030195.com** for benchmark access if you have a research or
293
  commercial use case.
294
 
295
  ## License
 
297
  [CC BY-NC 4.0](LICENSE) — non-commercial use only.
298
 
299
  For commercial licensing (production deployment, redistribution rights,
300
+ SaaS / API embedding, custom fine-tunes for your domain): **hi@louis030195.com**.
301
 
302
  ## Citation
303