privacy-filter-rs / README.md
eugenehp's picture
Add files using upload-large-folder tool
d2620e1 verified
---
license: apache-2.0
pipeline_tag: token-classification
library_name: burn
tags:
- rust
- burn
- privacy
- PII
- NER
- token-classification
- openai
base_model: openai/privacy-filter
---
# OpenAI Privacy Filter — Rust/Burn Weights
Safetensors weights for [openai/privacy-filter](https://huggingface.co/openai/privacy-filter), packaged for inference with [privacy-filter-rs](https://github.com/eugenehp/privacy-filter-rs) (pure-Rust, Burn ML framework).
## Contents
| File | Size | Description |
|---|---|---|
| `model.safetensors` | 2.6 GB | Model weights (bfloat16) |
| `config.json` | 3 KB | HuggingFace model configuration |
| `tokenizer.json` | 27 MB | BPE tokenizer (o200k_base) |
| `tokenizer_config.json` | 234 B | Tokenizer metadata |
| `viterbi_calibration.json` | 372 B | Viterbi decoder operating points |
## Model Details
- **Architecture**: Bidirectional transformer encoder with Sparse MoE
- **Parameters**: 1.5B total, ~50M active per token (top-4 of 128 experts)
- **Hidden size**: 640, **Layers**: 8, **Heads**: 14 Q / 2 KV (GQA)
- **Context**: 128,000 tokens (YaRN RoPE, sliding window 257)
- **Output**: 33 BIOES token classes over 8 privacy categories
- **Dtype**: bfloat16 (converted to f32 at load time by the Rust runtime)
## Privacy Categories
1. `account_number`
2. `private_address`
3. `private_date`
4. `private_email`
5. `private_person`
6. `private_phone`
7. `private_url`
8. `secret`
## Usage with privacy-filter-rs
```bash
# Clone the Rust project
git clone https://github.com/eugenehp/privacy-filter-rs
cd privacy-filter-rs
# Download weights into ./data (this repo)
# git clone https://huggingface.co/eugenehp/privacy-filter-rs data
# Run inference
cargo run --release -- -m data "My name is Alice Smith"
```
```rust
use privacy_filter_rs::{PrivacyFilterInference, backend::{B, Device}};
use std::path::Path;
let device = <Device as Default>::default();
let engine = PrivacyFilterInference::<B>::load(Path::new("data"), device)?;
let spans = engine.predict("My name is Alice Smith")?;
for s in &spans {
println!("{}: {} (score: {:.4})", s.entity_group, s.word, s.score);
}
// private_person: Alice Smith (score: 1.0000)
```
## License
Apache 2.0 — same as the upstream [openai/privacy-filter](https://huggingface.co/openai/privacy-filter) model.