FrameByFrame
/

privacy-filter-korean

@@ -19,7 +19,8 @@ pipeline_tag: token-classification
 Korean fine-tune of [OpenAI Privacy Filter](https://huggingface.co/openai/privacy-filter)
 for span-level PII detection. Adapted via **LoRA** on attention projections only —
-the base's sparse-MoE backbone (1.5B / 50M active params) is kept frozen.
 **[Open Test Notebook](https://huggingface.co/FrameByFrame/privacy-filter-korean/blob/main/test_privacy_filter_ko.ipynb)** — load the model and run all examples interactively.
@@ -56,19 +57,27 @@ Held-out KDPII Korean PII test set, span-level F1:
 ### Install
-The `openai_privacy_filter` architecture is in `transformers` HEAD but not yet in
-a stable PyPI release, so install from source:
 ```bash
-pip install "git+https://github.com/huggingface/transformers.git" peft torch safetensors accelerate
 ```
-If you already have a stable transformers installed, force the upgrade:
 ```bash
-pip uninstall -y transformers && pip install "git+https://github.com/huggingface/transformers.git"
 ```
 ### Load Model
 ```python
@@ -240,6 +249,38 @@ Each detected entity is one dict:
 | **Hardware** | 2× NVIDIA RTX A5000 (24 GB each) |
 | **Final eval span F1** | 0.848 (validation) |
 ## Serving with vLLM

 Korean fine-tune of [OpenAI Privacy Filter](https://huggingface.co/openai/privacy-filter)
 for span-level PII detection. Adapted via **LoRA** on attention projections only —
+the base's sparse-MoE backbone (1.5B / 50M active params) is kept frozen, with
+just **~614k trainable parameters** (~0.04% of the model).
 **[Open Test Notebook](https://huggingface.co/FrameByFrame/privacy-filter-korean/blob/main/test_privacy_filter_ko.ipynb)** — load the model and run all examples interactively.
 ### Install
+> ⚠️ **Requires `transformers` 5.x (currently dev / from source).** The
+> `openai_privacy_filter` architecture is *not* in any stable 4.x PyPI release.
+> If you `pip install transformers` and load this model, you'll see
+> `KeyError: 'openai_privacy_filter'`.
 ```bash
+pip install --upgrade "git+https://github.com/huggingface/transformers.git" peft torch safetensors accelerate
 ```
+The `--upgrade` flag is critical — without it, `pip install` is silently
+no-op when an older transformers is already present.
+After installing, **restart your Python runtime / kernel** so the new
+transformers replaces any version pre-loaded into the process. Sanity-check:
 ```bash
+python -c "from transformers.models.auto.configuration_auto import CONFIG_MAPPING_NAMES; assert 'openai_privacy_filter' in CONFIG_MAPPING_NAMES, 'openai_privacy_filter missing — re-install transformers from source and restart runtime'"
 ```
+If you're using Colab, the test notebook handles this automatically (auto-restart).
 ### Load Model
 ```python
 | **Hardware** | 2× NVIDIA RTX A5000 (24 GB each) |
 | **Final eval span F1** | 0.848 (validation) |
+For full reproduction details, see [`TRAINING.md`](./TRAINING.md).
+## Why MoE + LoRA
+Full fine-tuning the privacy-filter base on KDPII consistently *hurt* the
+weakest labels (`private_person` and `private_address` stuck at F1 ≈ 0.13–0.20).
+With 128 experts and top-4 routing, Korean tokens hit a small expert subset;
+across 5–10 epochs each expert receives sparse gradient updates relative to
+its parameter count, and the optimizer drags those experts away from their
+pretrained representations faster than it teaches the new task. Net effect:
+the base's pretrained Korean capability gets corrupted before the new task is
+learned.
+LoRA on attention only (this model) avoids this entirely — experts, FFN,
+embeddings, and router stay exactly as the base shipped them; only attention
+re-routing and the classifier head adapt. Result: F1 0.69 / 0.78 on the
+previously-stuck labels, with every other label at or above ceiling.
+## Known Limitations
+- **`private_person` residual error** is dominated by KDPII's `PS_NICKNAME`
+  policy. ~40% of remaining person errors are online-handle-style strings
+  (e.g., `탕비실맥심킹`, `퍼터요정`) that KDPII labels as `PS_NICKNAME →
+  private_person`. Downstream redaction is unaffected; classification systems
+  may want to post-classify handles separately.
+- **Foreign names** (Western, Japanese, Arabic transliterations) detected at
+  lower rates due to limited training exposure.
+- **`private_address` boundaries** follow KDPII's split convention (each
+  toponym component is a separate span). Production redactors typically
+  concatenate adjacent address spans during post-processing.
+- Raw model output may have leading/trailing whitespace in span offsets;
+  the `extract_pii` helper above strips them via `text.strip()` on the slice.
 ## Serving with vLLM