Spaces:
Running on Zero
Running on Zero
| title: OpenAI Privacy Filter | |
| emoji: 🛡️ | |
| colorFrom: gray | |
| colorTo: gray | |
| sdk: gradio | |
| sdk_version: 6.12.0 | |
| python_version: '3.12' | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| short_description: OpenAI Privacy Filter ZeroGPU demo | |
| # OpenAI Privacy Filter | |
| OpenAI Privacy Filter is a bidirectional token-classification model for personally identifiable information (PII) detection and masking in text. It is intended for high-throughput data sanitization workflows where teams need a model that they can run on-premises that is fast, context-aware, and tunable. | |
| OpenAI Privacy Filter is pretrained autoregressively to arrive at a checkpoint with similar architecture to gpt-oss, albeit of a smaller size. We then converted that checkpoint into a bidirectional token classifier over a privacy label taxonomy, and post-trained with a supervised classification loss. (For architecture details about gpt-oss, please see the gpt-oss model card.) Instead of generating text token-by-token, this model labels an input sequence in a single forward pass, then decodes coherent spans with a constrained Viterbi procedure. For each input token, the model predicts a probability distribution over the label taxonomy which consists of 8 output categories described below. | |
| Highlights: | |
| - Permissive Apache 2.0 license: ideal for experimentation, customization, and commercial deployment. | |
| - Small size: Runs in a web browser or on a laptop – 1.5B parameters total and 50M active parameters. | |
| - Fine-tunable: Adapt the model to specific data distributions through easy and data efficient finetuning. | |
| - Long-context: 128,000-token context window enables processing long text with high throughput and no chunking. | |
| - Runtime control: configure precision/recall tradeoffs and detected span lengths through preset operating points. | |
| ## Metadata | |
| - Developed by: OpenAI | |
| - Funded by: OpenAI | |
| - Shared by: OpenAI | |
| - Model type: Bidirectional token classification model for privacy span detection | |
| - Language(s): Primarily English; selected multilingual robustness evaluation reported | |
| - License: [Apache 2.0](LICENSE) | |
| - Source repository: https://github.com/openai/privacy-filter | |
| - Model weights: https://huggingface.co/openai/privacy-filter | |
| - Model card: [OpenAI Privacy Filter Model Card](https://cdn.openai.com/pdf/c66281ed-b638-456a-8ce1-97e9f5264a90/OpenAI-Privacy-Filter-Model-Card.pdf) | |