vijaym's picture
Update README.md
ddd17d8 verified
---
license: apache-2.0
language:
- ko
- en
tags:
- privacy-filter
- pii-detection
- token-classification
- korean
- lora
- openai-privacy-filter
- bioes
base_model: openai/privacy-filter
pipeline_tag: token-classification
---
# Privacy Filter β€” Korean
Korean fine-tune of [OpenAI Privacy Filter](https://huggingface.co/openai/privacy-filter)
for span-level PII detection. Adapted via **LoRA** on attention projections only β€”
the base's sparse-MoE backbone (1.5B / 50M active params) is kept frozen.
**[Open Test Notebook](https://huggingface.co/FrameByFrame/privacy-filter-korean/blob/main/test_privacy_filter_ko.ipynb)** β€” load the model and run all examples interactively.
## Capabilities
| Category | Description | Example |
|---|---|---|
| `private_person` | Personal name (Korean / Western / handles) | κΉ€λ―Όμˆ˜, John Smith |
| `private_address` | Physical / postal address | μ„œμšΈνŠΉλ³„μ‹œ 강남ꡬ ν…Œν—€λž€λ‘œ 123 |
| `private_phone` | Phone number | 010-1234-5678 |
| `private_email` | Email address | minsu@example.com |
| `private_date` | Birthday / personally-identifying date | 1985λ…„ 3μ›” 12일 |
| `private_url` | Personal URL | github.com/minsu |
| `account_number` | Bank, card, RRN, passport, etc. | 110-234-567890 |
| `personal_handle` | Username / handle | @minsu_dev |
| `ip_address` | IP address | 192.168.1.5 |
## Benchmark Results
Held-out KDPII Korean PII test set, span-level F1:
| label | base | fine-tuned | Ξ” |
|---|---|---|---|
| `private_phone` | 0.65 | **1.00** | +0.35 |
| `private_url` | 0.21 | **1.00** | +0.79 |
| `private_email` | 0.86 | **1.00** | +0.14 |
| `account_number` | 0.31 | **0.98** | +0.67 |
| `private_date` | 0.00 | **0.90** | +0.90 |
| `private_address` | 0.00 | **0.78** | +0.78 |
| `private_person` | 0.06 | **0.69** | +0.63 |
| **Overall** | β€” | β€” | **+0.58** |
## Quick Start
### Install
> ⚠️ **Requires `transformers` 5.x (currently dev / from source).** The
> `openai_privacy_filter` architecture is *not* in any stable 4.x PyPI release.
> If you `pip install transformers` and load this model, you'll see
> `KeyError: 'openai_privacy_filter'`.
```bash
pip install --upgrade "git+https://github.com/huggingface/transformers.git" peft torch safetensors accelerate
```
The `--upgrade` flag is critical β€” without it, `pip install` is silently
no-op when an older transformers is already present.
After installing, **restart your Python runtime / kernel** so the new
transformers replaces any version pre-loaded into the process. Sanity-check:
```bash
python -c "from transformers.models.auto.configuration_auto import CONFIG_MAPPING_NAMES; assert 'openai_privacy_filter' in CONFIG_MAPPING_NAMES, 'openai_privacy_filter missing β€” re-install transformers from source and restart runtime'"
```
If you're using Colab, the test notebook handles this automatically (auto-restart).
### Load Model
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
MODEL_ID = "FrameByFrame/privacy-filter-korean"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
model = AutoModelForTokenClassification.from_pretrained(
MODEL_ID, trust_remote_code=True, torch_dtype=torch.bfloat16
)
model.eval()
if torch.cuda.is_available():
model.cuda()
```
`trust_remote_code=True` is required because Privacy Filter ships a custom
`OpenAIPrivacyFilterForTokenClassification` class (gpt-oss-style sparse MoE).
### Inference
The model emits per-token BIOES labels. The helper below decodes them into
character-offset spans with simple constrained logic:
```python
def extract_pii(text: str, max_length: int = 512):
enc = tokenizer(
text,
truncation=True,
max_length=max_length,
return_offsets_mapping=True,
return_tensors="pt",
)
offsets = enc.pop("offset_mapping")[0].tolist()
enc = {k: v.to(model.device) for k, v in enc.items()}
with torch.no_grad():
logits = model(**enc).logits
pred_ids = logits.argmax(-1)[0].tolist()
id2label = model.config.id2label
spans = []
active = None # (label, start, end)
for tok_idx, lid in enumerate(pred_ids):
label = id2label[int(lid)]
if label == "O":
if active is not None:
spans.append(active); active = None
continue
prefix, cat = label.split("-", 1)
c_start, c_end = offsets[tok_idx]
if prefix == "S":
if active is not None: spans.append(active); active = None
spans.append((cat, c_start, c_end))
elif prefix == "B":
if active is not None: spans.append(active)
active = (cat, c_start, c_end)
elif prefix in ("I", "E"):
if active and active[0] == cat:
active = (active[0], active[1], c_end)
else:
if active is not None: spans.append(active); active = None
if prefix == "E":
spans.append((cat, c_start, c_end))
if active is not None:
spans.append(active)
return [
{"label": cat, "start": s, "end": e, "text": text[s:e].strip()}
for cat, s, e in spans
if text[s:e].strip()
]
```
### Test
#### Korean: name + phone + email
```python
>>> extract_pii("κΉ€λ―Όμˆ˜μ˜ μ „ν™”λ²ˆν˜ΈλŠ” 010-1234-5678이고 이메일은 minsu@example.comμž…λ‹ˆλ‹€.")
[
{"label": "private_person", "start": 0, "end": 3, "text": "κΉ€λ―Όμˆ˜"},
{"label": "private_phone", "start": 12, "end": 25, "text": "010-1234-5678"},
{"label": "private_email", "start": 33, "end": 50, "text": "minsu@example.com"},
]
```
#### Korean: address + name
```python
>>> extract_pii("μ„œμšΈνŠΉλ³„μ‹œ 강남ꡬ ν…Œν—€λž€λ‘œ 123에 μ‚¬λŠ” λ°•μ§€μ˜μ”¨μ—κ²Œ μ—°λ½μ£Όμ„Έμš”.")
[
{"label": "private_address", "start": 0, "end": 5, "text": "μ„œμšΈνŠΉλ³„μ‹œ"},
{"label": "private_address", "start": 6, "end": 9, "text": "강남ꡬ"},
{"label": "private_address", "start": 10, "end": 17, "text": "ν…Œν—€λž€λ‘œ 123"},
{"label": "private_person", "start": 22, "end": 25, "text": "λ°•μ§€μ˜"},
]
```
> Note: the model follows KDPII's address convention where each toponym
> component is its own span. Most downstream redaction systems concatenate
> adjacent address spans.
#### Korean: form-style document
```python
>>> extract_pii('''고객 정보
... 이름: μ΄μˆ˜μ§„
... 생년월일: 1985λ…„ 3μ›” 12일
... μ£Όμ†Œ: λΆ€μ‚°κ΄‘μ—­μ‹œ ν•΄μš΄λŒ€κ΅¬ μš°λ™ 1457
... μ—°λ½μ²˜: 010-9876-5432''')
[
{"label": "private_person", ..., "text": "μ΄μˆ˜μ§„"},
{"label": "private_date", ..., "text": "1985λ…„ 3μ›” 12일"},
{"label": "private_address", ..., "text": "λΆ€μ‚°κ΄‘μ—­μ‹œ"},
{"label": "private_address", ..., "text": "ν•΄μš΄λŒ€κ΅¬"},
{"label": "private_address", ..., "text": "μš°λ™ 1457"},
{"label": "private_phone", ..., "text": "010-9876-5432"},
]
```
#### English: account + email
```python
>>> extract_pii("Wire to acct 110-234-567890, contact minsu@example.com")
[
{"label": "account_number", "start": 13, "end": 26, "text": "110-234-567890"},
{"label": "private_email", "start": 36, "end": 53, "text": "minsu@example.com"},
]
```
### Redaction
Wrap the spans into a redactor:
```python
def redact(text: str, mask: str = "[REDACTED]") -> str:
spans = extract_pii(text)
spans.sort(key=lambda s: s["start"], reverse=True)
out = text
for s in spans:
out = out[: s["start"]] + f"[{s['label'].upper()}]" + out[s["end"]:]
return out
>>> redact("κΉ€λ―Όμˆ˜λ‹˜μ˜ λ²ˆν˜ΈλŠ” 010-1234-5678μž…λ‹ˆλ‹€.")
"[PRIVATE_PERSON]λ‹˜μ˜ λ²ˆν˜ΈλŠ” [PRIVATE_PHONE]μž…λ‹ˆλ‹€."
```
## Output Schema
Each detected entity is one dict:
| field | description |
|---|---|
| `label` | One of the 9 categories above |
| `start` | Character offset start (inclusive) |
| `end` | Character offset end (exclusive) |
| `text` | The matched substring |
## Training Details
| | |
|---|---|
| **Base model** | `openai/privacy-filter` (sparse MoE, 1.5B total / 50M active params, 128 experts top-4) |
| **Method** | LoRA r=16, alpha=32, dropout=0.05 on attention projections (`q/k/v/o_proj`); classifier head fully trainable; everything else frozen |
| **Trainable params** | ~614k (~0.04% of the model) |
| **Datasets** | KDPII (Korean, ~53k records, deterministic 5/5/90 test/val/train), `korean_rrn_synthetic` (train only) |
| **Optimizer** | AdamW, lr=5e-4, cosine schedule, warmup 0.1 |
| **Batch** | 64 per device Γ— 2 GPUs = 128 effective |
| **Epochs** | 10, early stopping on `eval_span_f1` (patience 3) |
| **Sequence length** | 512 |
| **Precision** | bf16 mixed (saved as bf16 safetensors after `merge_and_unload`) |
| **Hardware** | 2Γ— NVIDIA RTX A5000 (24 GB each) |
| **Final eval span F1** | 0.848 (validation) |
For full reproduction details, see [`TRAINING.md`](./TRAINING.md).
## Known Limitations
- **`private_person` residual error** is dominated by KDPII's `PS_NICKNAME`
policy. ~40% of remaining person errors are online-handle-style strings
(e.g., `탕비싀λ§₯심킹`, `νΌν„°μš”μ •`) that KDPII labels as `PS_NICKNAME β†’
private_person`. Downstream redaction is unaffected; classification systems
may want to post-classify handles separately.
- **Foreign names** (Western, Japanese, Arabic transliterations) detected at
lower rates due to limited training exposure.
- **`private_address` boundaries** follow KDPII's split convention (each
toponym component is a separate span). Production redactors typically
concatenate adjacent address spans during post-processing.
- Raw model output may have leading/trailing whitespace in span offsets;
the `extract_pii` helper above strips them via `text.strip()` on the slice.
## License
Apache 2.0 (inherited from base
[OpenAI Privacy Filter](https://huggingface.co/openai/privacy-filter)).
## Citation
If you use this model:
```bibtex
@misc{framebyframe-privacy-filter-korean-2026,
title = {Privacy Filter Korean: LoRA fine-tune of OpenAI Privacy Filter for Korean PII},
author = {FrameByFrame},
year = {2026},
url = {https://huggingface.co/FrameByFrame/privacy-filter-korean}
}
```