privacy-filter-ru / README.md
apararti's picture
Upload Russian privacy-filter fine-tune
a1650a1 verified
metadata
base_model: openai/privacy-filter
pipeline_tag: token-classification
language:
  - ru
tags:
  - privacy
  - pii
  - token-classification
  - russian
  - opf
model-index:
  - name: privacy-filter-ru
    results:
      - task:
          type: token-classification
          name: Token Classification
        dataset:
          name: ru_realistic_eval_v1
          type: local
        metrics:
          - type: f1
            value: 0.9916
            name: Raw span F1
      - task:
          type: token-classification
          name: Token Classification
        dataset:
          name: ru_raw_hard_v3_eval
          type: local
        metrics:
          - type: f1
            value: 1
            name: Raw span F1

privacy-filter-ru

Russian PII fine-tune of openai/privacy-filter.

This checkpoint is the raw-model production candidate from the local raw_hardening_v3 run. It is intended to run without deterministic post-processing.

Labels

  • private_person
  • private_phone
  • private_email
  • private_address
  • private_date
  • private_url
  • account_number
  • secret

Training

  • Base checkpoint: checkpoints/production_candidate_ru_v2
  • Original base model: openai/privacy-filter
  • Epochs: 1
  • Learning rate: 1e-6
  • Batch size: 1
  • Gradient accumulation steps: 16
  • Serialization dtype: bfloat16
  • Train examples: 17,000
  • Validation examples: 2,000

The v3 training mix targeted raw-model behavior that previously depended on a deterministic runtime layer: phone/account/secret label separation and person/date boundary cleanup.

Raw Evaluation

No deterministic post-processing was used for these metrics.

Eval v2 raw span F1 v3 raw span F1 v2 mismatch rows v3 mismatch rows
synthetic test 1.0000 1.0000 0 0
ru_realistic_eval_v1 0.8787 0.9916 158 11
ru_phone_account_confusion_v1 1.0000 1.0000 0 0
ru_date_negative_v1 1.0000 1.0000 0 0
ru_raw_hard_v3_eval 0.8350 1.0000 297 0
ru_person_hard_eval 0.8074 0.8074 183 183
alexen2 0.8644 0.8547 228 241
Rubai heldout 0.8054 0.8036 3,131 3,136

Usage

opf --checkpoint apararti/privacy-filter-ru --device cuda "Мой номер 8 999 863 37 84, зовут Андрей Макаров."

For a local checkout:

opf --checkpoint ./privacy-filter-ru --device cuda "Перезвоните Наталье Никитиной на 8 903 914 81 88."

Notes

This is a fine-tuned checkpoint, not the original OpenAI model. It is optimized for Russian PII filtering and should be validated on domain-specific shadow traffic before production rollout.