Spaces:

openai
/

privacy-filter

Running on Zero

App Files Files Community

privacy-filter / README.md

Mihai Maruseac

ZeroGPU Gradio demo for OpenAI Privacy Filter

1d92498 unverified 16 days ago

preview code

raw

history blame contribute delete

2.38 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

metadata

title: OpenAI Privacy Filter
emoji: 🛡️
colorFrom: gray
colorTo: gray
sdk: gradio
sdk_version: 6.12.0
python_version: '3.12'
app_file: app.py
pinned: false
license: apache-2.0
short_description: OpenAI Privacy Filter ZeroGPU demo

OpenAI Privacy Filter

OpenAI Privacy Filter is a bidirectional token-classification model for personally identifiable information (PII) detection and masking in text. It is intended for high-throughput data sanitization workflows where teams need a model that they can run on-premises that is fast, context-aware, and tunable.

OpenAI Privacy Filter is pretrained autoregressively to arrive at a checkpoint with similar architecture to gpt-oss, albeit of a smaller size. We then converted that checkpoint into a bidirectional token classifier over a privacy label taxonomy, and post-trained with a supervised classification loss. (For architecture details about gpt-oss, please see the gpt-oss model card.) Instead of generating text token-by-token, this model labels an input sequence in a single forward pass, then decodes coherent spans with a constrained Viterbi procedure. For each input token, the model predicts a probability distribution over the label taxonomy which consists of 8 output categories described below.

Highlights:

Permissive Apache 2.0 license: ideal for experimentation, customization, and commercial deployment.
Small size: Runs in a web browser or on a laptop – 1.5B parameters total and 50M active parameters.
Fine-tunable: Adapt the model to specific data distributions through easy and data efficient finetuning.
Long-context: 128,000-token context window enables processing long text with high throughput and no chunking.
Runtime control: configure precision/recall tradeoffs and detected span lengths through preset operating points.

Metadata

Developed by: OpenAI
Funded by: OpenAI
Shared by: OpenAI
Model type: Bidirectional token classification model for privacy span detection
Language(s): Primarily English; selected multilingual robustness evaluation reported
License: Apache 2.0
Source repository: https://github.com/openai/privacy-filter
Model weights: https://huggingface.co/openai/privacy-filter
Model card: OpenAI Privacy Filter Model Card