| --- |
| license: apache-2.0 |
| pipeline_tag: token-classification |
| library_name: burn |
| tags: |
| - rust |
| - burn |
| - privacy |
| - PII |
| - NER |
| - token-classification |
| - openai |
| base_model: openai/privacy-filter |
| --- |
| |
| # OpenAI Privacy Filter — Rust/Burn Weights |
|
|
| Safetensors weights for [openai/privacy-filter](https://huggingface.co/openai/privacy-filter), packaged for inference with [privacy-filter-rs](https://github.com/eugenehp/privacy-filter-rs) (pure-Rust, Burn ML framework). |
|
|
| ## Contents |
|
|
| | File | Size | Description | |
| |---|---|---| |
| | `model.safetensors` | 2.6 GB | Model weights (bfloat16) | |
| | `config.json` | 3 KB | HuggingFace model configuration | |
| | `tokenizer.json` | 27 MB | BPE tokenizer (o200k_base) | |
| | `tokenizer_config.json` | 234 B | Tokenizer metadata | |
| | `viterbi_calibration.json` | 372 B | Viterbi decoder operating points | |
|
|
| ## Model Details |
|
|
| - **Architecture**: Bidirectional transformer encoder with Sparse MoE |
| - **Parameters**: 1.5B total, ~50M active per token (top-4 of 128 experts) |
| - **Hidden size**: 640, **Layers**: 8, **Heads**: 14 Q / 2 KV (GQA) |
| - **Context**: 128,000 tokens (YaRN RoPE, sliding window 257) |
| - **Output**: 33 BIOES token classes over 8 privacy categories |
| - **Dtype**: bfloat16 (converted to f32 at load time by the Rust runtime) |
|
|
| ## Privacy Categories |
|
|
| 1. `account_number` |
| 2. `private_address` |
| 3. `private_date` |
| 4. `private_email` |
| 5. `private_person` |
| 6. `private_phone` |
| 7. `private_url` |
| 8. `secret` |
|
|
| ## Usage with privacy-filter-rs |
|
|
| ```bash |
| # Clone the Rust project |
| git clone https://github.com/eugenehp/privacy-filter-rs |
| cd privacy-filter-rs |
| |
| # Download weights into ./data (this repo) |
| # git clone https://huggingface.co/eugenehp/privacy-filter-rs data |
| |
| # Run inference |
| cargo run --release -- -m data "My name is Alice Smith" |
| ``` |
|
|
| ```rust |
| use privacy_filter_rs::{PrivacyFilterInference, backend::{B, Device}}; |
| use std::path::Path; |
| |
| let device = <Device as Default>::default(); |
| let engine = PrivacyFilterInference::<B>::load(Path::new("data"), device)?; |
| |
| let spans = engine.predict("My name is Alice Smith")?; |
| for s in &spans { |
| println!("{}: {} (score: {:.4})", s.entity_group, s.word, s.score); |
| } |
| // private_person: Alice Smith (score: 1.0000) |
| ``` |
|
|
| ## License |
|
|
| Apache 2.0 — same as the upstream [openai/privacy-filter](https://huggingface.co/openai/privacy-filter) model. |
|
|