File size: 2,308 Bytes
d2620e1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
license: apache-2.0
pipeline_tag: token-classification
library_name: burn
tags:
- rust
- burn
- privacy
- PII
- NER
- token-classification
- openai
base_model: openai/privacy-filter
---

# OpenAI Privacy Filter — Rust/Burn Weights

Safetensors weights for [openai/privacy-filter](https://huggingface.co/openai/privacy-filter), packaged for inference with [privacy-filter-rs](https://github.com/eugenehp/privacy-filter-rs) (pure-Rust, Burn ML framework).

## Contents

| File | Size | Description |
|---|---|---|
| `model.safetensors` | 2.6 GB | Model weights (bfloat16) |
| `config.json` | 3 KB | HuggingFace model configuration |
| `tokenizer.json` | 27 MB | BPE tokenizer (o200k_base) |
| `tokenizer_config.json` | 234 B | Tokenizer metadata |
| `viterbi_calibration.json` | 372 B | Viterbi decoder operating points |

## Model Details

- **Architecture**: Bidirectional transformer encoder with Sparse MoE
- **Parameters**: 1.5B total, ~50M active per token (top-4 of 128 experts)
- **Hidden size**: 640, **Layers**: 8, **Heads**: 14 Q / 2 KV (GQA)
- **Context**: 128,000 tokens (YaRN RoPE, sliding window 257)
- **Output**: 33 BIOES token classes over 8 privacy categories
- **Dtype**: bfloat16 (converted to f32 at load time by the Rust runtime)

## Privacy Categories

1. `account_number`
2. `private_address`
3. `private_date`
4. `private_email`
5. `private_person`
6. `private_phone`
7. `private_url`
8. `secret`

## Usage with privacy-filter-rs

```bash
# Clone the Rust project
git clone https://github.com/eugenehp/privacy-filter-rs
cd privacy-filter-rs

# Download weights into ./data (this repo)
# git clone https://huggingface.co/eugenehp/privacy-filter-rs data

# Run inference
cargo run --release -- -m data "My name is Alice Smith"
```

```rust
use privacy_filter_rs::{PrivacyFilterInference, backend::{B, Device}};
use std::path::Path;

let device = <Device as Default>::default();
let engine = PrivacyFilterInference::<B>::load(Path::new("data"), device)?;

let spans = engine.predict("My name is Alice Smith")?;
for s in &spans {
    println!("{}: {} (score: {:.4})", s.entity_group, s.word, s.score);
}
// private_person:  Alice Smith (score: 1.0000)
```

## License

Apache 2.0 — same as the upstream [openai/privacy-filter](https://huggingface.co/openai/privacy-filter) model.