--- base_model: openai/privacy-filter pipeline_tag: token-classification language: - ru tags: - privacy - pii - token-classification - russian - opf model-index: - name: privacy-filter-ru results: - task: type: token-classification name: Token Classification dataset: name: ru_realistic_eval_v1 type: local metrics: - type: f1 value: 0.9916 name: Raw span F1 - task: type: token-classification name: Token Classification dataset: name: ru_raw_hard_v3_eval type: local metrics: - type: f1 value: 1.0 name: Raw span F1 --- # privacy-filter-ru Russian PII fine-tune of [`openai/privacy-filter`](https://huggingface.co/openai/privacy-filter). This checkpoint is the raw-model production candidate from the local `raw_hardening_v3` run. It is intended to run without deterministic post-processing. ## Labels - `private_person` - `private_phone` - `private_email` - `private_address` - `private_date` - `private_url` - `account_number` - `secret` ## Training - Base checkpoint: `checkpoints/production_candidate_ru_v2` - Original base model: `openai/privacy-filter` - Epochs: 1 - Learning rate: `1e-6` - Batch size: 1 - Gradient accumulation steps: 16 - Serialization dtype: `bfloat16` - Train examples: 17,000 - Validation examples: 2,000 The v3 training mix targeted raw-model behavior that previously depended on a deterministic runtime layer: phone/account/secret label separation and person/date boundary cleanup. ## Raw Evaluation No deterministic post-processing was used for these metrics. | Eval | v2 raw span F1 | v3 raw span F1 | v2 mismatch rows | v3 mismatch rows | | --- | ---: | ---: | ---: | ---: | | synthetic test | 1.0000 | 1.0000 | 0 | 0 | | ru_realistic_eval_v1 | 0.8787 | 0.9916 | 158 | 11 | | ru_phone_account_confusion_v1 | 1.0000 | 1.0000 | 0 | 0 | | ru_date_negative_v1 | 1.0000 | 1.0000 | 0 | 0 | | ru_raw_hard_v3_eval | 0.8350 | 1.0000 | 297 | 0 | | ru_person_hard_eval | 0.8074 | 0.8074 | 183 | 183 | | alexen2 | 0.8644 | 0.8547 | 228 | 241 | | Rubai heldout | 0.8054 | 0.8036 | 3,131 | 3,136 | ## Usage ```bash opf --checkpoint apararti/privacy-filter-ru --device cuda "Мой номер 8 999 863 37 84, зовут Андрей Макаров." ``` For a local checkout: ```bash opf --checkpoint ./privacy-filter-ru --device cuda "Перезвоните Наталье Никитиной на 8 903 914 81 88." ``` ## Notes This is a fine-tuned checkpoint, not the original OpenAI model. It is optimized for Russian PII filtering and should be validated on domain-specific shadow traffic before production rollout.