yasserrmd
/

privacy-filter-ONNX

@@ -1,78 +1,151 @@
 ---
 license: apache-2.0
 base_model: openai/privacy-filter
 tags:
   - token-classification
   - pii-detection
   - onnx
-  - browser
   - privacy
-  - transformers.js
 library_name: transformers
 pipeline_tag: token-classification
 ---
-# Privacy Filter - ONNX (FP16)
-FP16 ONNX export of [openai/privacy-filter](https://huggingface.co/openai/privacy-filter)
-for in-browser inference via onnxruntime-web. Detects 8 categories of personally
-identifiable information (PII) and returns BIOES token labels.
-## Files
-- `onnx/model_fp16.onnx` - graph
-- `onnx/model_fp16.onnx.data` - weights (external data, ~2.6 GB)
-- `tokenizer.json`, `tokenizer_config.json`, `special_tokens_map.json` - tokenizer
-- `config.json` - model config with the 33 BIOES label taxonomy
-- `viterbi_calibration.json` - default operating-point biases for the Viterbi decoder
-## Label taxonomy (33 classes)
-Background class `O` plus BIOES tags (`B-`, `I-`, `E-`, `S-`) for 8 span categories:
-- `account_number`
-- `private_address`
-- `private_date`
-- `private_email`
-- `private_person`
-- `private_phone`
-- `private_url`
-- `secret`
-## Usage (browser, onnxruntime-web)
-```javascript
-import * as ort from 'onnxruntime-web';
-const session = await ort.InferenceSession.create(
-  'https://huggingface.co/YOUR_REPO/resolve/main/onnx/model_fp16.onnx',
-  { executionProviders: ['webgpu', 'wasm'] }
-);
-// Tokenize with @huggingface/tokenizers using tokenizer.json from this repo.
-// Feed int64 input_ids and attention_mask. Output is logits [batch, seq, 33].
-// Decode with a constrained BIOES Viterbi pass using viterbi_calibration.json.
 ```
-Full browser runner (tokenizer + ONNX + Viterbi decoder in JS) is in the
-conversion project's `web/` folder.
 ## Export notes
-- Exported with `torch.onnx.export(dynamo=True)` from `transformers>=5.6.0.dev0`
-- MoE blocks (128 experts top-4) rewritten to a dense-weighted-sum form for
-  ONNX compatibility while preserving reference math
-- FP16 precision (original is BF16). Keeps int64 inputs/outputs
-- Dynamic axes on batch and sequence length. Practical browser range: 256-4096
-  tokens depending on memory
-- Parity vs reference PyTorch: 100% argmax agreement on seed prompts
 ## License
-Apache 2.0, same as the base model.
-## Acknowledgements
-Base model by OpenAI. See the
-[original model card](https://huggingface.co/openai/privacy-filter) for
-training details, intended use, and limitations.

+## README
+```markdown
 ---
 license: apache-2.0
 base_model: openai/privacy-filter
 tags:
   - token-classification
   - pii-detection
+  - pii-masking
   - onnx
+  - onnxruntime
   - privacy
 library_name: transformers
 pipeline_tag: token-classification
+language:
+  - en
 ---
+# Privacy Filter (ONNX, FP16)
+FP16 ONNX export of [`openai/privacy-filter`](https://huggingface.co/openai/privacy-filter) for efficient inference with ONNX Runtime. The model detects eight categories of personally identifiable information (PII) in text and returns BIOES-tagged token spans.
+The exported graph has dynamic batch and sequence dimensions, and has been validated against the original PyTorch implementation with 100% argmax agreement on reference prompts.
+## Model details
+| | |
+|---|---|
+| Base model | `openai/privacy-filter` |
+| Parameters | 1.5 B total, 50 M active (128-expert top-4 MoE) |
+| Precision | FP16 weights, FP32 router |
+| Context length | Up to 128k tokens (dynamic) |
+| Label set | 33 classes (`O` + BIOES × 8 categories) |
+| License | Apache 2.0 |
+### Detected categories
+`account_number`, `private_address`, `private_date`, `private_email`, `private_person`, `private_phone`, `private_url`, `secret`
+### Repository contents
+```
+config.json                   Model config including the 33-class id2label map
+tokenizer.json                o200k tokenizer (tiktoken-compatible)
+tokenizer_config.json
+special_tokens_map.json
+viterbi_calibration.json      Default operating-point biases for Viterbi decoding
+onnx/
+  model_fp16.onnx             Graph
+  model_fp16.onnx.data        Weights (external data, ~2.6 GB)
+```
+## Installation
+```bash
+pip install onnxruntime transformers tiktoken numpy huggingface_hub
+```
+For GPU inference, substitute `onnxruntime-gpu` for `onnxruntime`.
+## Usage
+### Minimal example
+```python
+from huggingface_hub import snapshot_download
+from transformers import AutoTokenizer
+import onnxruntime as ort
+import numpy as np
+import json
+repo = "yasserrmd/privacy-filter-ONNX"
+local = snapshot_download(repo)
+tokenizer = AutoTokenizer.from_pretrained(local)
+session = ort.InferenceSession(
+    f"{local}/onnx/model_fp16.onnx",
+    providers=["CPUExecutionProvider"],  # or ["CUDAExecutionProvider"] for GPU
+)
+with open(f"{local}/config.json") as f:
+    id2label = {int(k): v for k, v in json.load(f)["id2label"].items()}
+text = "Hi, I'm Alice Smith, email alice@example.com."
+enc = tokenizer(text, return_tensors="np", add_special_tokens=False)
+logits = session.run(None, {
+    "input_ids":      enc["input_ids"].astype(np.int64),
+    "attention_mask": enc["attention_mask"].astype(np.int64),
+})[0]
+labels = [id2label[int(i)] for i in logits[0].argmax(-1)]
+tokens = tokenizer.convert_ids_to_tokens(enc["input_ids"][0])
+for tok, lbl in zip(tokens, labels):
+    if lbl != "O":
+        print(f"{tok:<20} {lbl}")
 ```
+### Complete usage with span decoding
+The raw model output is per-token logits over 33 BIOES classes. For coherent spans, decode the logits with a constrained Viterbi pass using the biases in `viterbi_calibration.json`. A reference implementation is included in `examples/detect.py` in the export project; the essential steps are:
+1. Tokenize the input with `return_offsets_mapping=True` to recover character positions.
+2. Run the ONNX session to obtain logits of shape `[1, seq_len, 33]`.
+3. Run Viterbi decoding over the 33 labels with legal BIOES transitions.
+4. Group the resulting label sequence into spans and map token indices back to character spans via the offsets.
+The `viterbi_calibration.json` file holds six transition-bias parameters under `operating_points.default.biases` that control the precision/recall trade-off. The defaults in this file are zeroed and match the reference implementation's `default` operating point.
+### Input and output shapes
+| Tensor | Shape | Dtype |
+|---|---|---|
+| `input_ids` (input) | `[batch, sequence]` | `int64` |
+| `attention_mask` (input) | `[batch, sequence]` | `int64` |
+| `logits` (output) | `[batch, sequence, 33]` | `float32` |
+Both `batch` and `sequence` are dynamic at runtime.
 ## Export notes
+- Exported with `torch.onnx.export(dynamo=True)` from `transformers>=5.6.0.dev0` and `torch>=2.6`.
+- The 128-expert top-4 MoE blocks in each decoder layer were rewritten to a dense-weighted-sum form to produce an ONNX-traceable graph while preserving reference arithmetic, including the clamped-SwiGLU activation (`alpha=1.702`, `limit=7.0`) and the post-experts scaling.
+- The router linear remains in FP32 for numerical stability; all other weights are FP16.
+- Parity validated against the PyTorch reference: max logit difference on the order of 1e-4, argmax agreement 100% across the standard evaluation prompts.
+## Intended use and limitations
+This export preserves the behavior of the base model. Its intended use, evaluation results, and limitations are documented in the [base model card](https://huggingface.co/openai/privacy-filter) and the accompanying [OpenAI Privacy Filter Model Card (PDF)](https://cdn.openai.com/pdf/c66281ed-b638-456a-8ce1-97e9f5264a90/OpenAI-Privacy-Filter-Model-Card.pdf). In brief:
+- Optimized primarily for English; multilingual performance varies.
+- Model-based redaction is a data-minimization aid, not an anonymization guarantee or compliance certification.
+- For high-sensitivity domains (medical, legal, financial, government), pair with human review and organization-specific policies.
 ## License
+Apache 2.0, inherited from the base model.
+## Citation
+If you use this export, please cite the base model:
+```
+@misc{openai_privacy_filter_2026,
+  title        = {OpenAI Privacy Filter},
+  author       = {OpenAI},
+  year         = {2026},
+  howpublished = {\url{https://huggingface.co/openai/privacy-filter}},
+  note         = {Apache-2.0},
+}
+```