privacy-filter-ru

Russian PII fine-tune of openai/privacy-filter.

This checkpoint is the raw-model production candidate from the local raw_hardening_v3 run. It is intended to run without deterministic post-processing.

Labels

  • private_person
  • private_phone
  • private_email
  • private_address
  • private_date
  • private_url
  • account_number
  • secret

Training

  • Base checkpoint: checkpoints/production_candidate_ru_v2
  • Original base model: openai/privacy-filter
  • Epochs: 1
  • Learning rate: 1e-6
  • Batch size: 1
  • Gradient accumulation steps: 16
  • Serialization dtype: bfloat16
  • Train examples: 17,000
  • Validation examples: 2,000

The v3 training mix targeted raw-model behavior that previously depended on a deterministic runtime layer: phone/account/secret label separation and person/date boundary cleanup.

Raw Evaluation

No deterministic post-processing was used for these metrics.

Eval v2 raw span F1 v3 raw span F1 v2 mismatch rows v3 mismatch rows
synthetic test 1.0000 1.0000 0 0
ru_realistic_eval_v1 0.8787 0.9916 158 11
ru_phone_account_confusion_v1 1.0000 1.0000 0 0
ru_date_negative_v1 1.0000 1.0000 0 0
ru_raw_hard_v3_eval 0.8350 1.0000 297 0
ru_person_hard_eval 0.8074 0.8074 183 183
alexen2 0.8644 0.8547 228 241
Rubai heldout 0.8054 0.8036 3,131 3,136

Usage

opf --checkpoint apararti/privacy-filter-ru --device cuda "Мой номер 8 999 863 37 84, зовут Андрей Макаров."

For a local checkout:

opf --checkpoint ./privacy-filter-ru --device cuda "Перезвоните Наталье Никитиной на 8 903 914 81 88."

Notes

This is a fine-tuned checkpoint, not the original OpenAI model. It is optimized for Russian PII filtering and should be validated on domain-specific shadow traffic before production rollout.

Downloads last month
133
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for apararti/privacy-filter-ru

Finetuned
(30)
this model

Evaluation results