File size: 2,161 Bytes
44ea089
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
screenpipe-pii-redactor
Copyright 2026 Mediar, Inc.

This product is a derivative work of the OpenAI Privacy Filter
(https://github.com/openai/privacy-filter), licensed under the
Apache License, Version 2.0. The full upstream Apache 2.0 license
text is preserved in LICENSE.upstream-apache2.txt.

The base model architecture, tokenizer, and pretrained weights are the
work of OpenAI and are licensed Apache 2.0. screenpipe-pii-redactor
extends those weights via supervised fine-tuning on a custom corpus and
re-initializes the output head for a 12-label PII taxonomy specific to
desktop activity logs.

Significant modifications introduced by this derivative:
  - Output head re-initialized for a 12-class PII label space
    (29 rows copied from upstream where labels aligned, 20 rows
    initialized from random for new classes; see model/finetune_summary.json).
  - Fine-tuned for 3 epochs on a mixed corpus of:
      * synthetic accessibility / window-title / OCR data
        (privatenot redistributed)
      * a 25% slice of ai4privacy/pii-masking-300k (CC-BY-4.0)
        with labels mapped to the 12-class taxonomy
      * targeted secret-shape augmentation (private)
  - Context window n_ctx raised from 128 to 256.
  - Hyperparameters: batch_size=4, lr=1e-4, weight_decay=0,
    max_grad_norm=1.0, shuffle_seed=1337.

Distribution license:

  - The fine-tuned weights and accompanying materials in this repository
    (the "Derivative Work") are licensed under CC BY-NC 4.0; see LICENSE.
  - The Apache 2.0 obligations on the base model are preserved by:
    (a) shipping LICENSE.upstream-apache2.txt with this repo,
    (b) attributing OpenAI Privacy Filter in README.md,
    (c) declaring significant modifications above.
  - "OpenAI" and "Privacy Filter" are trademarks / brands of OpenAI; this
    derivative does not use those marks to endorse or suggest endorsement
    of this work by OpenAI.

Third-party datasets used during fine-tuning:

  - ai4privacy/pii-masking-300k
    https://huggingface.co/datasets/ai4privacy/pii-masking-300k
    Licensed CC-BY-4.0.

Questions about license compatibility or commercial use:
  louis@screenpi.pe