privacy-filter-ru / README.md
apararti's picture
Upload Russian privacy-filter fine-tune
a1650a1 verified
---
base_model: openai/privacy-filter
pipeline_tag: token-classification
language:
- ru
tags:
- privacy
- pii
- token-classification
- russian
- opf
model-index:
- name: privacy-filter-ru
results:
- task:
type: token-classification
name: Token Classification
dataset:
name: ru_realistic_eval_v1
type: local
metrics:
- type: f1
value: 0.9916
name: Raw span F1
- task:
type: token-classification
name: Token Classification
dataset:
name: ru_raw_hard_v3_eval
type: local
metrics:
- type: f1
value: 1.0
name: Raw span F1
---
# privacy-filter-ru
Russian PII fine-tune of [`openai/privacy-filter`](https://huggingface.co/openai/privacy-filter).
This checkpoint is the raw-model production candidate from the local `raw_hardening_v3` run. It is intended to run without deterministic post-processing.
## Labels
- `private_person`
- `private_phone`
- `private_email`
- `private_address`
- `private_date`
- `private_url`
- `account_number`
- `secret`
## Training
- Base checkpoint: `checkpoints/production_candidate_ru_v2`
- Original base model: `openai/privacy-filter`
- Epochs: 1
- Learning rate: `1e-6`
- Batch size: 1
- Gradient accumulation steps: 16
- Serialization dtype: `bfloat16`
- Train examples: 17,000
- Validation examples: 2,000
The v3 training mix targeted raw-model behavior that previously depended on a deterministic runtime layer: phone/account/secret label separation and person/date boundary cleanup.
## Raw Evaluation
No deterministic post-processing was used for these metrics.
| Eval | v2 raw span F1 | v3 raw span F1 | v2 mismatch rows | v3 mismatch rows |
| --- | ---: | ---: | ---: | ---: |
| synthetic test | 1.0000 | 1.0000 | 0 | 0 |
| ru_realistic_eval_v1 | 0.8787 | 0.9916 | 158 | 11 |
| ru_phone_account_confusion_v1 | 1.0000 | 1.0000 | 0 | 0 |
| ru_date_negative_v1 | 1.0000 | 1.0000 | 0 | 0 |
| ru_raw_hard_v3_eval | 0.8350 | 1.0000 | 297 | 0 |
| ru_person_hard_eval | 0.8074 | 0.8074 | 183 | 183 |
| alexen2 | 0.8644 | 0.8547 | 228 | 241 |
| Rubai heldout | 0.8054 | 0.8036 | 3,131 | 3,136 |
## Usage
```bash
opf --checkpoint apararti/privacy-filter-ru --device cuda "Мой номер 8 999 863 37 84, зовут Андрей Макаров."
```
For a local checkout:
```bash
opf --checkpoint ./privacy-filter-ru --device cuda "Перезвоните Наталье Никитиной на 8 903 914 81 88."
```
## Notes
This is a fine-tuned checkpoint, not the original OpenAI model. It is optimized for Russian PII filtering and should be validated on domain-specific shadow traffic before production rollout.