privacy-filter-ru
Russian PII fine-tune of openai/privacy-filter.
This checkpoint is the raw-model production candidate from the local raw_hardening_v3 run. It is intended to run without deterministic post-processing.
Labels
private_personprivate_phoneprivate_emailprivate_addressprivate_dateprivate_urlaccount_numbersecret
Training
- Base checkpoint:
checkpoints/production_candidate_ru_v2 - Original base model:
openai/privacy-filter - Epochs: 1
- Learning rate:
1e-6 - Batch size: 1
- Gradient accumulation steps: 16
- Serialization dtype:
bfloat16 - Train examples: 17,000
- Validation examples: 2,000
The v3 training mix targeted raw-model behavior that previously depended on a deterministic runtime layer: phone/account/secret label separation and person/date boundary cleanup.
Raw Evaluation
No deterministic post-processing was used for these metrics.
| Eval | v2 raw span F1 | v3 raw span F1 | v2 mismatch rows | v3 mismatch rows |
|---|---|---|---|---|
| synthetic test | 1.0000 | 1.0000 | 0 | 0 |
| ru_realistic_eval_v1 | 0.8787 | 0.9916 | 158 | 11 |
| ru_phone_account_confusion_v1 | 1.0000 | 1.0000 | 0 | 0 |
| ru_date_negative_v1 | 1.0000 | 1.0000 | 0 | 0 |
| ru_raw_hard_v3_eval | 0.8350 | 1.0000 | 297 | 0 |
| ru_person_hard_eval | 0.8074 | 0.8074 | 183 | 183 |
| alexen2 | 0.8644 | 0.8547 | 228 | 241 |
| Rubai heldout | 0.8054 | 0.8036 | 3,131 | 3,136 |
Usage
opf --checkpoint apararti/privacy-filter-ru --device cuda "Мой номер 8 999 863 37 84, зовут Андрей Макаров."
For a local checkout:
opf --checkpoint ./privacy-filter-ru --device cuda "Перезвоните Наталье Никитиной на 8 903 914 81 88."
Notes
This is a fine-tuned checkpoint, not the original OpenAI model. It is optimized for Russian PII filtering and should be validated on domain-specific shadow traffic before production rollout.
- Downloads last month
- 133
Model tree for apararti/privacy-filter-ru
Base model
openai/privacy-filterEvaluation results
- Raw span F1 on ru_realistic_eval_v1self-reported0.992
- Raw span F1 on ru_raw_hard_v3_evalself-reported1.000