File size: 10,149 Bytes
7acb24f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ddd17d8
7acb24f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2c5223b
 
 
 
9fe7a2e
 
2c5223b
9fe7a2e
 
2c5223b
 
 
 
 
9fe7a2e
7acb24f
2c5223b
7acb24f
 
2c5223b
 
7acb24f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2c5223b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7acb24f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
---
license: apache-2.0
language:
  - ko
  - en
tags:
  - privacy-filter
  - pii-detection
  - token-classification
  - korean
  - lora
  - openai-privacy-filter
  - bioes
base_model: openai/privacy-filter
pipeline_tag: token-classification
---

# Privacy Filter β€” Korean

Korean fine-tune of [OpenAI Privacy Filter](https://huggingface.co/openai/privacy-filter)
for span-level PII detection. Adapted via **LoRA** on attention projections only β€”
the base's sparse-MoE backbone (1.5B / 50M active params) is kept frozen.

**[Open Test Notebook](https://huggingface.co/FrameByFrame/privacy-filter-korean/blob/main/test_privacy_filter_ko.ipynb)** β€” load the model and run all examples interactively.

## Capabilities

| Category | Description | Example |
|---|---|---|
| `private_person` | Personal name (Korean / Western / handles) | κΉ€λ―Όμˆ˜, John Smith |
| `private_address` | Physical / postal address | μ„œμšΈνŠΉλ³„μ‹œ 강남ꡬ ν…Œν—€λž€λ‘œ 123 |
| `private_phone` | Phone number | 010-1234-5678 |
| `private_email` | Email address | minsu@example.com |
| `private_date` | Birthday / personally-identifying date | 1985λ…„ 3μ›” 12일 |
| `private_url` | Personal URL | github.com/minsu |
| `account_number` | Bank, card, RRN, passport, etc. | 110-234-567890 |
| `personal_handle` | Username / handle | @minsu_dev |
| `ip_address` | IP address | 192.168.1.5 |

## Benchmark Results

Held-out KDPII Korean PII test set, span-level F1:

| label | base | fine-tuned | Ξ” |
|---|---|---|---|
| `private_phone` | 0.65 | **1.00** | +0.35 |
| `private_url` | 0.21 | **1.00** | +0.79 |
| `private_email` | 0.86 | **1.00** | +0.14 |
| `account_number` | 0.31 | **0.98** | +0.67 |
| `private_date` | 0.00 | **0.90** | +0.90 |
| `private_address` | 0.00 | **0.78** | +0.78 |
| `private_person` | 0.06 | **0.69** | +0.63 |
| **Overall** | β€” | β€” | **+0.58** |

## Quick Start

### Install

> ⚠️ **Requires `transformers` 5.x (currently dev / from source).** The
> `openai_privacy_filter` architecture is *not* in any stable 4.x PyPI release.
> If you `pip install transformers` and load this model, you'll see
> `KeyError: 'openai_privacy_filter'`.

```bash
pip install --upgrade "git+https://github.com/huggingface/transformers.git" peft torch safetensors accelerate
```

The `--upgrade` flag is critical β€” without it, `pip install` is silently
no-op when an older transformers is already present.

After installing, **restart your Python runtime / kernel** so the new
transformers replaces any version pre-loaded into the process. Sanity-check:

```bash
python -c "from transformers.models.auto.configuration_auto import CONFIG_MAPPING_NAMES; assert 'openai_privacy_filter' in CONFIG_MAPPING_NAMES, 'openai_privacy_filter missing β€” re-install transformers from source and restart runtime'"
```

If you're using Colab, the test notebook handles this automatically (auto-restart).

### Load Model

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

MODEL_ID = "FrameByFrame/privacy-filter-korean"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
model = AutoModelForTokenClassification.from_pretrained(
    MODEL_ID, trust_remote_code=True, torch_dtype=torch.bfloat16
)
model.eval()
if torch.cuda.is_available():
    model.cuda()
```

`trust_remote_code=True` is required because Privacy Filter ships a custom
`OpenAIPrivacyFilterForTokenClassification` class (gpt-oss-style sparse MoE).

### Inference

The model emits per-token BIOES labels. The helper below decodes them into
character-offset spans with simple constrained logic:

```python
def extract_pii(text: str, max_length: int = 512):
    enc = tokenizer(
        text,
        truncation=True,
        max_length=max_length,
        return_offsets_mapping=True,
        return_tensors="pt",
    )
    offsets = enc.pop("offset_mapping")[0].tolist()
    enc = {k: v.to(model.device) for k, v in enc.items()}
    with torch.no_grad():
        logits = model(**enc).logits
    pred_ids = logits.argmax(-1)[0].tolist()
    id2label = model.config.id2label

    spans = []
    active = None  # (label, start, end)
    for tok_idx, lid in enumerate(pred_ids):
        label = id2label[int(lid)]
        if label == "O":
            if active is not None:
                spans.append(active); active = None
            continue
        prefix, cat = label.split("-", 1)
        c_start, c_end = offsets[tok_idx]
        if prefix == "S":
            if active is not None: spans.append(active); active = None
            spans.append((cat, c_start, c_end))
        elif prefix == "B":
            if active is not None: spans.append(active)
            active = (cat, c_start, c_end)
        elif prefix in ("I", "E"):
            if active and active[0] == cat:
                active = (active[0], active[1], c_end)
            else:
                if active is not None: spans.append(active); active = None
                if prefix == "E":
                    spans.append((cat, c_start, c_end))
    if active is not None:
        spans.append(active)

    return [
        {"label": cat, "start": s, "end": e, "text": text[s:e].strip()}
        for cat, s, e in spans
        if text[s:e].strip()
    ]
```

### Test

#### Korean: name + phone + email
```python
>>> extract_pii("κΉ€λ―Όμˆ˜μ˜ μ „ν™”λ²ˆν˜ΈλŠ” 010-1234-5678이고 이메일은 minsu@example.comμž…λ‹ˆλ‹€.")
[
  {"label": "private_person", "start": 0, "end": 3, "text": "κΉ€λ―Όμˆ˜"},
  {"label": "private_phone",  "start": 12, "end": 25, "text": "010-1234-5678"},
  {"label": "private_email",  "start": 33, "end": 50, "text": "minsu@example.com"},
]
```

#### Korean: address + name
```python
>>> extract_pii("μ„œμšΈνŠΉλ³„μ‹œ 강남ꡬ ν…Œν—€λž€λ‘œ 123에 μ‚¬λŠ” λ°•μ§€μ˜μ”¨μ—κ²Œ μ—°λ½μ£Όμ„Έμš”.")
[
  {"label": "private_address", "start": 0, "end": 5, "text": "μ„œμšΈνŠΉλ³„μ‹œ"},
  {"label": "private_address", "start": 6, "end": 9, "text": "강남ꡬ"},
  {"label": "private_address", "start": 10, "end": 17, "text": "ν…Œν—€λž€λ‘œ 123"},
  {"label": "private_person",  "start": 22, "end": 25, "text": "λ°•μ§€μ˜"},
]
```

> Note: the model follows KDPII's address convention where each toponym
> component is its own span. Most downstream redaction systems concatenate
> adjacent address spans.

#### Korean: form-style document
```python
>>> extract_pii('''고객 정보
... 이름: μ΄μˆ˜μ§„
... 생년월일: 1985λ…„ 3μ›” 12일
... μ£Όμ†Œ: λΆ€μ‚°κ΄‘μ—­μ‹œ ν•΄μš΄λŒ€κ΅¬ μš°λ™ 1457
... μ—°λ½μ²˜: 010-9876-5432''')
[
  {"label": "private_person",  ..., "text": "μ΄μˆ˜μ§„"},
  {"label": "private_date",    ..., "text": "1985λ…„ 3μ›” 12일"},
  {"label": "private_address", ..., "text": "λΆ€μ‚°κ΄‘μ—­μ‹œ"},
  {"label": "private_address", ..., "text": "ν•΄μš΄λŒ€κ΅¬"},
  {"label": "private_address", ..., "text": "μš°λ™ 1457"},
  {"label": "private_phone",   ..., "text": "010-9876-5432"},
]
```

#### English: account + email
```python
>>> extract_pii("Wire to acct 110-234-567890, contact minsu@example.com")
[
  {"label": "account_number", "start": 13, "end": 26, "text": "110-234-567890"},
  {"label": "private_email",  "start": 36, "end": 53, "text": "minsu@example.com"},
]
```

### Redaction

Wrap the spans into a redactor:

```python
def redact(text: str, mask: str = "[REDACTED]") -> str:
    spans = extract_pii(text)
    spans.sort(key=lambda s: s["start"], reverse=True)
    out = text
    for s in spans:
        out = out[: s["start"]] + f"[{s['label'].upper()}]" + out[s["end"]:]
    return out

>>> redact("κΉ€λ―Όμˆ˜λ‹˜μ˜ λ²ˆν˜ΈλŠ” 010-1234-5678μž…λ‹ˆλ‹€.")
"[PRIVATE_PERSON]λ‹˜μ˜ λ²ˆν˜ΈλŠ” [PRIVATE_PHONE]μž…λ‹ˆλ‹€."
```

## Output Schema

Each detected entity is one dict:

| field | description |
|---|---|
| `label` | One of the 9 categories above |
| `start` | Character offset start (inclusive) |
| `end` | Character offset end (exclusive) |
| `text` | The matched substring |

## Training Details

| | |
|---|---|
| **Base model** | `openai/privacy-filter` (sparse MoE, 1.5B total / 50M active params, 128 experts top-4) |
| **Method** | LoRA r=16, alpha=32, dropout=0.05 on attention projections (`q/k/v/o_proj`); classifier head fully trainable; everything else frozen |
| **Trainable params** | ~614k (~0.04% of the model) |
| **Datasets** | KDPII (Korean, ~53k records, deterministic 5/5/90 test/val/train), `korean_rrn_synthetic` (train only) |
| **Optimizer** | AdamW, lr=5e-4, cosine schedule, warmup 0.1 |
| **Batch** | 64 per device Γ— 2 GPUs = 128 effective |
| **Epochs** | 10, early stopping on `eval_span_f1` (patience 3) |
| **Sequence length** | 512 |
| **Precision** | bf16 mixed (saved as bf16 safetensors after `merge_and_unload`) |
| **Hardware** | 2Γ— NVIDIA RTX A5000 (24 GB each) |
| **Final eval span F1** | 0.848 (validation) |

For full reproduction details, see [`TRAINING.md`](./TRAINING.md).

## Known Limitations

- **`private_person` residual error** is dominated by KDPII's `PS_NICKNAME`
  policy. ~40% of remaining person errors are online-handle-style strings
  (e.g., `탕비싀λ§₯심킹`, `νΌν„°μš”μ •`) that KDPII labels as `PS_NICKNAME β†’
  private_person`. Downstream redaction is unaffected; classification systems
  may want to post-classify handles separately.
- **Foreign names** (Western, Japanese, Arabic transliterations) detected at
  lower rates due to limited training exposure.
- **`private_address` boundaries** follow KDPII's split convention (each
  toponym component is a separate span). Production redactors typically
  concatenate adjacent address spans during post-processing.
- Raw model output may have leading/trailing whitespace in span offsets;
  the `extract_pii` helper above strips them via `text.strip()` on the slice.

## License

Apache 2.0 (inherited from base
[OpenAI Privacy Filter](https://huggingface.co/openai/privacy-filter)).

## Citation

If you use this model:

```bibtex
@misc{framebyframe-privacy-filter-korean-2026,
  title  = {Privacy Filter Korean: LoRA fine-tune of OpenAI Privacy Filter for Korean PII},
  author = {FrameByFrame},
  year   = {2026},
  url    = {https://huggingface.co/FrameByFrame/privacy-filter-korean}
}
```