privacy-filter-ru

Russian PII fine-tune of openai/privacy-filter.

This checkpoint is the raw-model production candidate from the local raw_hardening_v3 run. It is intended to run without deterministic post-processing.

Labels

private_person
private_phone
private_email
private_address
private_date
private_url
account_number
secret

Training

Base checkpoint: checkpoints/production_candidate_ru_v2
Original base model: openai/privacy-filter
Epochs: 1
Learning rate: 1e-6
Batch size: 1
Gradient accumulation steps: 16
Serialization dtype: bfloat16
Train examples: 17,000
Validation examples: 2,000

The v3 training mix targeted raw-model behavior that previously depended on a deterministic runtime layer: phone/account/secret label separation and person/date boundary cleanup.

Raw Evaluation

No deterministic post-processing was used for these metrics.

Eval	v2 raw span F1	v3 raw span F1	v2 mismatch rows	v3 mismatch rows
synthetic test	1.0000	1.0000	0	0
ru_realistic_eval_v1	0.8787	0.9916	158	11
ru_phone_account_confusion_v1	1.0000	1.0000	0	0
ru_date_negative_v1	1.0000	1.0000	0	0
ru_raw_hard_v3_eval	0.8350	1.0000	297	0
ru_person_hard_eval	0.8074	0.8074	183	183
alexen2	0.8644	0.8547	228	241
Rubai heldout	0.8054	0.8036	3,131	3,136

Usage

opf --checkpoint apararti/privacy-filter-ru --device cuda "Мой номер 8 999 863 37 84, зовут Андрей Макаров."

For a local checkout:

opf --checkpoint ./privacy-filter-ru --device cuda "Перезвоните Наталье Никитиной на 8 903 914 81 88."

Notes

This is a fine-tuned checkpoint, not the original OpenAI model. It is optimized for Russian PII filtering and should be validated on domain-specific shadow traffic before production rollout.

Downloads last month: 133

Model tree for apararti/privacy-filter-ru

Base model

openai/privacy-filter

Finetuned

(30)

this model

Evaluation results

Raw span F1 on ru_realistic_eval_v1
self-reported

0.992
Raw span F1 on ru_raw_hard_v3_eval
self-reported

1.000