---
license: apache-2.0
pipeline_tag: token-classification
library_name: transformers
tags:
- transformers.js
---
# OpenAI Privacy Filter
OpenAI Privacy Filter is a bidirectional token-classification model for personally identifiable information (PII) detection and masking in text. It is intended for high-throughput data sanitization workflows where teams need a model that they can run on-premises that is fast, context-aware, and tunable.
OpenAI Privacy Filter is pretrained autoregressively to arrive at a checkpoint with similar architecture to gpt-oss, albeit of a smaller size. We then converted that checkpoint into a bidirectional token classifier over a privacy label taxonomy, and post-trained with a supervised classification loss. (For architecture details about gpt-oss, please see the gpt-oss model card.) Instead of generating text token-by-token, this model labels an input sequence in a single forward pass, then decodes coherent spans with a constrained Viterbi procedure. For each input token, the model predicts a probability distribution over the label taxonomy which consists of 8 output categories described below.
Highlights:
- Permissive Apache 2.0 license: ideal for experimentation, customization, and commercial deployment.
- Small size: Runs in a web browser or on a laptop – 1.5B parameters total and 50M active parameters.
- Fine-tunable: Adapt the model to specific data distributions through easy and data efficient finetuning.
- Long-context: 128,000-token context window enables processing long text with high throughput and no chunking.
- Runtime control: configure precision/recall tradeoffs and detected span lengths through preset operating points.
## Usage
### Transformers
1. Using the `pipeline` API:
```py
from transformers import pipeline
classifier = pipeline(
task="token-classification",
model="openai/privacy-filter",
)
classifier("My name is Alice Smith")
```
2. Using as `AutoModelForTokenClassification` model:
```py
import torch
from transformers import AutoModelForTokenClassification, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("openai/privacy-filter")
model = AutoModelForTokenClassification.from_pretrained("openai/privacy-filter", device_map="auto")
inputs = tokenizer("My name is Alice Smith", return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model(**inputs)
predicted_token_class_ids = outputs.logits.argmax(dim=-1)
predicted_token_classes = [model.config.id2label[token_id.item()] for token_id in predicted_token_class_ids[0]]
print(predicted_token_classes)
```
### Transformers.js
1. Using the `pipeline` API:
```js
import { pipeline } from "@huggingface/transformers";
const classifier = await pipeline(
"token-classification", "openai/privacy-filter",
{ device: "webgpu", dtype: "q4" },
);
const input = "My name is Harry Potter and my email is harry.potter@hogwarts.edu.";
const output = await classifier(input, { aggregation_strategy: "simple" });
console.dir(output, { depth: null });
```
See example output
```js
[
{
entity_group: 'private_person',
score: 0.9999957978725433,
word: ' Harry Potter'
},
{
entity_group: 'private_email',
score: 0.9999990728166368,
word: ' harry.potter@hogwarts.edu'
}
]
```
## Model Details
### Model Description
Privacy Filter is a bidirectional token classification model with span decoding. It is trained in phases, beginning with autoregressive pretraining. The pretrained language model is then modified and post-trained as a bidirectional banded attention token classifier with band size 128 (effective attention window: 257 tokens including self). This means:
* The base model is an autoregressive pretrained checkpoint.
* The language-model output head is replaced with a token-classification head over privacy labels.
* Post-training is supervised token-level classification rather than next-token prediction.
* Inference applies constrained sequence decoding to produce coherent BIOES (Begin, Inside, Outside, End, Single) span labels.
Architecturally, the implementation in this repo is a pre-norm transformer encoder-style stack with:
* token embeddings
* 8 repeated transformer blocks
* grouped-query attention with rotary positional embeddings, with 14 query heads and 2 KV heads (group size = 7 queries per KV head)
* sparse mixture-of-experts feed-forward blocks with 128 experts total (top-4 routing per token)
* a final token-classification head over privacy labels (rather than natural language vocabulary tokens), with residual stream width `d_model = 640`.
Relative to iterative autoregressive approaches, this design allows all tokens to be labeled in one pass, which improves throughput. Relative to classical masked-language-model pretraining approaches, this is a post-training conversion of an autoregressive model rather than a native masked-LM setup.
### Output Shape
Privacy Filter can detect 8 privacy span categories:
1. `account_number`
2. `private_address`
3. `private_email`
4. `private_person`
5. `private_phone`
6. `private_url`
7. `private_date`
8. `secret`
To perform token-classification, each non-background span category is expanded into boundary-tagged token classes: `B-