File size: 13,196 Bytes
e140ddc bbed34a e140ddc bbed34a e140ddc bbed34a e140ddc bbed34a e140ddc bbed34a e140ddc bbed34a e140ddc bbed34a e140ddc bbed34a e140ddc 7941c32 bbed34a 7941c32 bbed34a e140ddc bbed34a e140ddc bbed34a e140ddc bbed34a e140ddc bbed34a e140ddc bbed34a e140ddc bbed34a e140ddc bbed34a e140ddc bbed34a e140ddc bbed34a e140ddc 7941c32 bbed34a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 | ---
license: apache-2.0
base_model: openai/privacy-filter
tags:
- token-classification
- text-classification
- multi-task
- pii-detection
- document-classification
- privacy
datasets:
- ai4privacy/pii-masking-400k
- community-datasets/yahoo_answers_topics
metrics:
- f1
- accuracy
model-index:
- name: privacy-filter-multitask
results:
- task:
type: token-classification
name: PII Detection (NER)
dataset:
name: ai4privacy/pii-masking-400k
type: ai4privacy/pii-masking-400k
metrics:
- type: f1
value: 0.4925
name: F1 (strict span-level)
- type: precision
value: 0.6968
- type: recall
value: 0.3809
- task:
type: text-classification
name: Document Classification (10 classes)
dataset:
name: yahoo_answers_topics
type: community-datasets/yahoo_answers_topics
metrics:
- type: accuracy
value: 0.4776
name: Test Accuracy
---
# Privacy Filter Multi-Task 🔒📄
A **single model** for simultaneous **PII Detection (NER)** and **Document Classification (10 categories)**.
Adapted from [openai/privacy-filter](https://huggingface.co/openai/privacy-filter) — a 1.4B Sparse MoE transformer with only ~50M active parameters per token.
## Architecture
```
Input → BPE Tokenizer (o200k_base, 200K vocab)
↓
8-layer Sparse MoE Transformer
• 128 experts, top-4 routing (~50M active params/token)
• Banded sliding-window attention (window=128)
• GQA: 14 query heads, 2 KV heads, head_dim=64
• Hidden size: 640
↓ ↓
NER Head (640→33) Doc Head (mean-pool → 640→10)
↓ ↓
BIOES PII tags 10-class document category
```
## Results
### PII Detection (NER)
| Metric | Value |
|--------|-------|
| **F1 (strict span-level)** | **0.493** |
| Precision | 0.697 |
| Recall | 0.381 |
| Token Accuracy | 0.944 |
8 entity types: `private_person` · `private_email` · `private_phone` · `private_address` · `private_date` · `private_url` · `account_number` · `secret`
### Document Classification (10 classes)
| Split | Accuracy |
|-------|----------|
| Val | 0.470 |
| **Test** | **0.478** |
Per-class test accuracy:
| Category | Accuracy |
|----------|----------|
| Computers & Internet | 0.688 |
| Family & Relationships | 0.615 |
| Science & Mathematics | 0.556 |
| Health | 0.524 |
| Sports | 0.523 |
| Politics & Government | 0.493 |
| Entertainment & Music | 0.444 |
| Society & Culture | 0.363 |
| Education & Reference | 0.310 |
| Business & Finance | 0.263 |
---
## 🚀 Production Inference Guide
All numbers below are measured on real hardware with both task heads (NER + doc classification) executing on every call. Benchmark script: single forward pass produces PII entity tags **and** document category simultaneously.
### Resource Requirements
| Resource | Value |
|----------|-------|
| Model weights (bf16) | **2.8 GB** GPU VRAM / RAM |
| Model weights (fp32) | **5.6 GB** RAM |
| ONNX variants available upstream | fp16, int8, q4 (see [openai/privacy-filter](https://huggingface.co/openai/privacy-filter/tree/main/onnx)) |
| Min GPU VRAM (bs=1, seq≤512) | **2.9 GB** |
| Min GPU VRAM (bs=64, seq=512) | **6.2 GB** |
| Fits on | T4 (16 GB), L4 (24 GB), A10G (24 GB), A100, any ≥8 GB GPU |
### GPU — Single-Document Latency (NVIDIA A10G, bf16)
Time from raw text to both NER tags + document category:
| Sequence Length | Latency (mean) | Latency (p95) | Latency (p99) |
|:-:|:-:|:-:|:-:|
| 64 tokens | 113 ms | 117 ms | 122 ms |
| 128 tokens | 106 ms | 110 ms | 115 ms |
| 256 tokens | 106 ms | 111 ms | 113 ms |
| 512 tokens | 106 ms | 113 ms | 116 ms |
> Latency is dominated by a fixed ~105 ms kernel-launch overhead from the Sparse MoE routing — it barely changes with sequence length up to 512 tokens.
### GPU — Batched Throughput (NVIDIA A10G, bf16)
| Batch Size | Seq 64 | Seq 128 | Seq 256 | Seq 512 |
|:-:|:-:|:-:|:-:|:-:|
| **1** | 8.9 docs/s | 9.4 docs/s | 9.4 docs/s | 9.4 docs/s |
| **4** | 36 docs/s | 37 docs/s | 37 docs/s | 32 docs/s |
| **8** | 73 docs/s | 73 docs/s | 69 docs/s | 53 docs/s |
| **16** | 139 docs/s | 138 docs/s | 114 docs/s | 73 docs/s |
| **32** | 265 docs/s | 238 docs/s | 165 docs/s | 89 docs/s |
| **64** | **460 docs/s** | **348 docs/s** | **207 docs/s** | **101 docs/s** |
### GPU — Batched Latency Detail (NVIDIA A10G, bf16)
<details>
<summary>Full latency table (click to expand)</summary>
| Batch | Seq Len | Batch Latency (ms) | Per-Doc (ms) | p95 (ms) | p99 (ms) |
|:-:|:-:|:-:|:-:|:-:|:-:|
| 1 | 64 | 113 | 112.7 | 117 | 122 |
| 4 | 64 | 111 | 27.8 | 116 | 118 |
| 8 | 64 | 110 | 13.8 | 114 | 126 |
| 16 | 64 | 115 | 7.2 | 121 | 125 |
| 32 | 64 | 121 | 3.8 | 127 | 135 |
| 64 | 64 | 139 | 2.2 | 144 | 144 |
| 1 | 128 | 106 | 105.9 | 110 | 115 |
| 4 | 128 | 107 | 26.9 | 112 | 115 |
| 8 | 128 | 110 | 13.7 | 115 | 116 |
| 16 | 128 | 116 | 7.3 | 121 | 128 |
| 32 | 128 | 134 | 4.2 | 139 | 143 |
| 64 | 128 | 184 | 2.9 | 189 | 191 |
| 1 | 256 | 106 | 106.1 | 111 | 113 |
| 4 | 256 | 109 | 27.2 | 114 | 115 |
| 8 | 256 | 117 | 14.6 | 123 | 126 |
| 16 | 256 | 140 | 8.8 | 145 | 147 |
| 32 | 256 | 194 | 6.1 | 199 | 202 |
| 64 | 256 | 309 | 4.8 | 314 | 315 |
| 1 | 512 | 106 | 106.5 | 113 | 116 |
| 4 | 512 | 125 | 31.2 | 129 | 130 |
| 8 | 512 | 152 | 19.0 | 158 | 165 |
| 16 | 512 | 219 | 13.7 | 223 | 225 |
| 32 | 512 | 358 | 11.2 | 361 | 364 |
| 64 | 512 | 636 | 9.9 | 639 | 641 |
</details>
### GPU — Peak VRAM Usage (bf16)
| Batch Size | Seq 128 | Seq 256 | Seq 512 |
|:-:|:-:|:-:|:-:|
| 1 | 2,817 MB | 2,824 MB | 2,862 MB |
| 8 | 2,857 MB | 2,936 MB | 3,237 MB |
| 32 | 3,000 MB | 3,309 MB | 4,522 MB |
| 64 | 3,189 MB | 3,809 MB | **6,236 MB** |
> The model is extremely memory-efficient. Even at batch=64, seq=512, it uses only 6.2 GB — comfortably fits on a T4 (16 GB). This is because the Sparse MoE only activates 4 of 128 experts per token.
### CPU — Latency & Throughput (AMD EPYC 7R32, 8 cores, fp32)
| Batch | Seq 64 | Seq 128 | Seq 256 | Seq 512 |
|:-:|:-:|:-:|:-:|:-:|
| **1** | 152 ms (6.6/s) | 193 ms (5.2/s) | 302 ms (3.3/s) | 569 ms (1.8/s) |
| **4** | 278 ms (14.4/s) | 468 ms (8.6/s) | 935 ms (4.3/s) | 2,464 ms (1.6/s) |
| **8** | 467 ms (17.1/s) | 862 ms (9.3/s) | 1,728 ms (4.6/s) | 4,745 ms (1.7/s) |
| **16** | 837 ms (19.1/s) | 1,624 ms (9.9/s) | 3,814 ms (4.2/s) | 9,143 ms (1.7/s) |
> On CPU the model runs at ~152 ms/doc for short texts (seq=64, bs=1) — suitable for low-volume or batch-offline pipelines.
### Daily Throughput Projections
Sustained throughput for a **single device**, running 24/7 at the optimal batch size:
| Sequence Length | GPU (A10G, bf16) | CPU (8-core, fp32) |
|:-:|:-:|:-:|
| 64 tokens | **39.8M docs/day** (460/s, bs=64) | 1.7M docs/day (19/s, bs=16) |
| 128 tokens | **30.1M docs/day** (348/s, bs=64) | 855K docs/day (10/s, bs=16) |
| 256 tokens | **17.9M docs/day** (207/s, bs=64) | 397K docs/day (4.6/s, bs=8) |
| 512 tokens | **8.7M docs/day** (101/s, bs=64) | 156K docs/day (1.8/s, bs=1) |
#### Multi-GPU Scaling Estimates
| Config | seq=128 | seq=256 | seq=512 |
|--------|:-:|:-:|:-:|
| 1× A10G (24 GB, ~$1/hr) | 30M/day | 18M/day | 8.7M/day |
| 1× A100 (80 GB, ~$3/hr) | ~70M/day¹ | ~42M/day¹ | ~20M/day¹ |
| 4× A10G data-parallel | 120M/day | 72M/day | 35M/day |
| 8× A10G data-parallel | 240M/day | 143M/day | 70M/day |
<sub>¹ A100 estimates are linearly extrapolated from A10G numbers using A100's ~2.3× higher memory bandwidth and larger batch capacity. Actual numbers will vary — benchmark on your target hardware.</sub>
### Serving Recommendations
| Deployment Scenario | Recommended Config | Expected Perf |
|---|---|---|
| **Real-time API** (SLA <200ms) | 1× GPU, bs=1, seq≤512 | ~106 ms p50, ~113 ms p95 |
| **Near-real-time** (SLA <500ms) | 1× GPU, bs=8–16, seq≤512 | 53–73 docs/s, p95 <225 ms |
| **High-throughput batch** | 1× GPU, bs=64, seq=256 | 207 docs/s, 17.9M/day |
| **Max throughput batch** | 1× GPU, bs=64, seq=64² | 460 docs/s, 39.8M/day |
| **CPU offline / dev** | CPU, bs=1, seq≤256 | 3–7 docs/s |
<sub>² At seq=64 most documents will be truncated. Use seq=128–256 for production balance.</sub>
**Key observations:**
- The model has a **fixed ~105 ms overhead** per forward pass regardless of sequence length (MoE routing + expert dispatch). Batching amortizes this cost across documents — the per-doc cost drops from 106 ms (bs=1) to under 10 ms (bs=64).
- **Memory is not the bottleneck** — even at bs=64/seq=512 the model uses only 6.2 GB. You can run this on a T4 (16 GB) with room to spare.
- **Optimal batch size for throughput**: bs=64 for all sequence lengths on A10G.
- **Optimal batch size for latency-constrained**: bs=8–16 gives a good per-doc latency (13–19 ms) while keeping batch latency under 225 ms.
---
## Training Strategy
Two-phase training approach:
1. **Phase 1 — Multi-task fine-tuning**: Partially unfroze last 4 MoE layers + both task heads. Trained on 20K NER examples (ai4privacy) + 20K doc examples (Yahoo Answers). Multi-task loss (NER×1.0 + Doc×0.5). 2 epochs, LR=2e-5.
2. **Phase 2 — Doc head retraining** (head-only): Froze entire backbone + NER head. Pre-computed 640-dim pooled features for 100K Yahoo Answers examples. Trained fresh `Linear(640→10)` classifier for 10 epochs, LR=1e-3, cosine decay. This approach:
- Preserves NER performance exactly (backbone untouched)
- Is extremely fast (~seconds per epoch on cached features)
- Achieves **47.8% test accuracy** (up from 24.8% in phase 1)
## Usage
```python
import torch
import torch.nn as nn
from transformers import AutoModelForTokenClassification, AutoTokenizer
from huggingface_hub import hf_hub_download
# Load model + tokenizer
tokenizer = AutoTokenizer.from_pretrained("binga/privacy-filter-multitask")
model = AutoModelForTokenClassification.from_pretrained(
"binga/privacy-filter-multitask", dtype=torch.bfloat16, device_map="auto"
)
# Load document classification head
doc_head = nn.Linear(640, 10)
doc_head.load_state_dict(torch.load(
hf_hub_download("binga/privacy-filter-multitask", "doc_head.pt"),
weights_only=True, map_location=model.device
))
doc_head = doc_head.to(dtype=torch.bfloat16, device=model.device)
doc_head.eval()
# Inference
text = "John Smith (SSN: 123-45-6789) emailed john@corp.com about Q3 earnings."
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model(**inputs, output_hidden_states=True)
# === PII Detection ===
print("PII entities:")
for tok, pred in zip(
tokenizer.convert_ids_to_tokens(inputs["input_ids"][0]),
outputs.logits.argmax(-1)[0]
):
label = model.config.id2label[pred.item()]
if label != "O":
print(f" {tok} → {label}")
# === Document Classification ===
categories = [
"Society & Culture", "Science & Math", "Health", "Education",
"Computers & Internet", "Sports", "Business & Finance",
"Entertainment", "Family", "Politics"
]
hidden = outputs.hidden_states[-1]
mask = inputs["attention_mask"].unsqueeze(-1).to(hidden.dtype)
pooled = (hidden * mask).sum(1) / mask.sum(1).clamp(min=1)
probs = torch.softmax(doc_head(pooled)[0].float(), dim=-1)
top = probs.argmax().item()
print(f"\nCategory: {categories[top]} ({probs[top]:.1%})")
```
### Batched Inference (Production)
```python
# Process a batch of documents — both tasks in a single forward pass
texts = ["doc1...", "doc2...", "doc3...", ...]
inputs = tokenizer(texts, return_tensors="pt", padding=True,
truncation=True, max_length=256).to(model.device)
with torch.no_grad():
outputs = model(**inputs, output_hidden_states=True)
# NER predictions for all docs: [batch, seq_len]
ner_preds = outputs.logits.argmax(dim=-1)
# Doc class for all docs: [batch]
hidden = outputs.hidden_states[-1]
mask = inputs["attention_mask"].unsqueeze(-1).to(hidden.dtype)
pooled = (hidden * mask).sum(1) / mask.sum(1).clamp(min=1)
doc_preds = doc_head(pooled).argmax(dim=-1)
```
## Example Outputs
| Input | PII Detected | Category (confidence) |
|-------|-------------|----------------------|
| "My name is John Smith... email john@example.com" | ✅ John Smith, john@example.com, 123 Main St | Computers & Internet (56%) |
| "Liverpool FC defeated Manchester City 3-1" | ❌ None | **Sports (98%)** |
| "Federal Reserve announced a rate cut" | ❌ None | **Politics (52%)** |
| "health benefits of meditation and yoga" | ❌ None | **Health (38%)** |
| "Patient Jane Doe (SSN: 123-45-6789)" | ✅ Jane Doe, 123-45-6789, jane.doe@hospital.com | Education (41%) |
| "learn programming? I want to learn Python" | ❌ None | **Education (53%)** |
| "legal to record phone calls in California?" | ❌ None | **Politics (64%)** |
## Files
| File | Size | Description |
|------|------|-------------|
| `model.safetensors` | 2.6 GB | Backbone + NER head (1.4B MoE params) |
| `doc_head.pt` | 26 KB | Document classification head (640→10) |
| `config.json` | 3 KB | Model architecture config |
| `tokenizer.json` | 27 MB | BPE tokenizer (o200k_base) |
| `multitask_config.json` | 349 B | Multi-task metadata |
|