--- license: apache-2.0 pipeline_tag: token-classification library_name: burn tags: - rust - burn - privacy - PII - NER - token-classification - openai base_model: openai/privacy-filter --- # OpenAI Privacy Filter — Rust/Burn Weights Safetensors weights for [openai/privacy-filter](https://huggingface.co/openai/privacy-filter), packaged for inference with [privacy-filter-rs](https://github.com/eugenehp/privacy-filter-rs) (pure-Rust, Burn ML framework). ## Contents | File | Size | Description | |---|---|---| | `model.safetensors` | 2.6 GB | Model weights (bfloat16) | | `config.json` | 3 KB | HuggingFace model configuration | | `tokenizer.json` | 27 MB | BPE tokenizer (o200k_base) | | `tokenizer_config.json` | 234 B | Tokenizer metadata | | `viterbi_calibration.json` | 372 B | Viterbi decoder operating points | ## Model Details - **Architecture**: Bidirectional transformer encoder with Sparse MoE - **Parameters**: 1.5B total, ~50M active per token (top-4 of 128 experts) - **Hidden size**: 640, **Layers**: 8, **Heads**: 14 Q / 2 KV (GQA) - **Context**: 128,000 tokens (YaRN RoPE, sliding window 257) - **Output**: 33 BIOES token classes over 8 privacy categories - **Dtype**: bfloat16 (converted to f32 at load time by the Rust runtime) ## Privacy Categories 1. `account_number` 2. `private_address` 3. `private_date` 4. `private_email` 5. `private_person` 6. `private_phone` 7. `private_url` 8. `secret` ## Usage with privacy-filter-rs ```bash # Clone the Rust project git clone https://github.com/eugenehp/privacy-filter-rs cd privacy-filter-rs # Download weights into ./data (this repo) # git clone https://huggingface.co/eugenehp/privacy-filter-rs data # Run inference cargo run --release -- -m data "My name is Alice Smith" ``` ```rust use privacy_filter_rs::{PrivacyFilterInference, backend::{B, Device}}; use std::path::Path; let device = ::default(); let engine = PrivacyFilterInference::::load(Path::new("data"), device)?; let spans = engine.predict("My name is Alice Smith")?; for s in &spans { println!("{}: {} (score: {:.4})", s.entity_group, s.word, s.score); } // private_person: Alice Smith (score: 1.0000) ``` ## License Apache 2.0 — same as the upstream [openai/privacy-filter](https://huggingface.co/openai/privacy-filter) model.