File size: 7,982 Bytes
00b3f2d 798bb66 00b3f2d 798bb66 00b3f2d 798bb66 00b3f2d 798bb66 00b3f2d 798bb66 00b3f2d 798bb66 00b3f2d 798bb66 00b3f2d 798bb66 00b3f2d 798bb66 00b3f2d 798bb66 00b3f2d 798bb66 00b3f2d 798bb66 00b3f2d 798bb66 00b3f2d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 | ---
license: apache-2.0
base_model: OpenMed/privacy-filter-nemotron
datasets:
- nvidia/Nemotron-PII
pipeline_tag: token-classification
library_name: openmed
tags:
- openmed
- mlx
- apple-silicon
- token-classification
- pii
- de-identification
- medical
- clinical
- privacy-filter
- nemotron
language:
- en
---
# OpenMed Privacy Filter (Nemotron) β MLX BF16
A native [MLX](https://github.com/ml-explore/mlx) port of
[`OpenMed/privacy-filter-nemotron`](https://huggingface.co/OpenMed/privacy-filter-nemotron)
for fast, on-device PII detection on Apple Silicon. This BF16 artifact
preserves the full source precision; for a smaller / faster sibling, see
[`OpenMed/privacy-filter-nemotron-mlx-8bit`](https://huggingface.co/OpenMed/privacy-filter-nemotron-mlx-8bit).
> **Family at a glance.** Same architecture and training data, three runtimes:
> - **PyTorch** β [`OpenMed/privacy-filter-nemotron`](https://huggingface.co/OpenMed/privacy-filter-nemotron) β CPU + CUDA.
> - **MLX BF16 (this repo)** β Apple Silicon, full precision (~2.6 GB).
> - **MLX 8-bit** β [`OpenMed/privacy-filter-nemotron-mlx-8bit`](https://huggingface.co/OpenMed/privacy-filter-nemotron-mlx-8bit) β Apple Silicon, ~1.4 GB, ~1.7Γ faster.
## What it does
The model is a token classifier built on OpenAI's open Privacy Filter
architecture (the same `openai_privacy_filter` model type used by
[`openai/privacy-filter`](https://huggingface.co/openai/privacy-filter)).
It tags each token with a BIOES label across **55 PII span classes**, then
a Viterbi pass over the BIOES grammar yields clean entity spans. Detected
categories include:
- Personal identifiers β `first_name`, `last_name`, `user_name`, `gender`, `age`, `date_of_birth`
- Contact β `email`, `phone_number`, `fax_number`, `street_address`, `city`, `state`, `country`, `county`, `postcode`, `coordinate`
- Government / legal IDs β `ssn`, `national_id`, `tax_id`, `certificate_license_number`
- Financial β `account_number`, `bank_routing_number`, `credit_debit_card`, `cvv`, `pin`, `swift_bic`
- Medical β `medical_record_number`, `health_plan_beneficiary_number`, `blood_type`
- Workplace β `company_name`, `occupation`, `employee_id`, `customer_id`, `employment_status`, `education_level`
- Online β `url`, `ipv4`, `ipv6`, `mac_address`, `http_cookie`, `api_key`, `password`, `device_identifier`
- Demographic β `race_ethnicity`, `religious_belief`, `political_view`, `sexuality`, `language`
- Vehicles β `license_plate`, `vehicle_identifier`
- Time β `date`, `date_time`, `time`
- Misc β `biometric_identifier`, `unique_id`
<details>
<summary>Full label schema (221 labels)</summary>
The output space is `O` plus `B-`, `I-`, `E-`, `S-` for each of the 55
span classes (4 Γ 55 + 1 = 221). The runtime `PrivacyFilterMLXPipeline`
runs Viterbi over this BIOES grammar, so the consumer sees clean grouped
entities rather than raw token tags.
The full `id2label.json` is shipped alongside the weights in this repo.
</details>
For per-label accuracy, training recipe, and dataset details, see the
[base PyTorch checkpoint](https://huggingface.co/OpenMed/privacy-filter-nemotron).
## Architecture
| Field | Value |
| --- | --- |
| Source model type | `openai_privacy_filter` |
| Source architecture | `OpenAIPrivacyFilterForTokenClassification` |
| Hidden size | 640 |
| Transformer layers | 8 |
| Attention | Grouped-Query (14 query heads / 2 KV heads, head_dim=64) with attention sinks |
| FFN | Sparse Mixture-of-Experts β 128 experts, top-4 routing, SwiGLU |
| Position encoding | YARN-scaled RoPE (`rope_theta=150_000`, factor=32) |
| Context length | 131,072 tokens (initial 4,096) |
| Tokenizer | `o200k_base` (tiktoken) β vocab 200,064 |
| Output head | Linear(640 β 221) with bias |
## File set
| File | Size | Purpose |
| --- | --- | --- |
| `weights.safetensors` | 2.6 GB | BF16 model weights in OpenMed-MLX layout |
| `config.json` | 19 KB | Model + MLX runtime config |
| `id2label.json` | 5.4 KB | Numeric ID β BIOES label string |
| `openmed-mlx.json` | 0.7 KB | OpenMed MLX manifest (task, family, runtime hints) |
| `tokenizer.json`, `tokenizer_config.json` | 27 MB | Source tokenizer files (kept for reference) |
The MLX runtime uses `tiktoken` `o200k_base` directly for tokenization;
the `tokenizer.json` is kept so consumers can inspect or re-tokenize via
`transformers` if desired.
## Quick start
### With [OpenMed](https://github.com/maziyarpanahi/openmed) β recommended
OpenMed gives you a single `extract_pii()` / `deidentify()` API that
auto-selects MLX on Apple Silicon and PyTorch elsewhere β same code on
every host.
```bash
pip install -U "openmed[mlx]"
```
```python
from openmed import extract_pii, deidentify
text = (
"Patient Sarah Johnson (DOB 03/15/1985), MRN 4872910, "
"phone 415-555-0123, email sarah.johnson@example.com."
)
# Extract grouped entity spans (runs on MLX here, PyTorch fallback elsewhere)
result = extract_pii(text, model_name="OpenMed/privacy-filter-nemotron-mlx")
for ent in result.entities:
print(f"{ent.label:30s} {ent.text!r} conf={ent.confidence:.2f}")
# De-identify
masked = deidentify(text, method="mask",
model_name="OpenMed/privacy-filter-nemotron-mlx")
fake = deidentify(
text,
method="replace",
model_name="OpenMed/privacy-filter-nemotron-mlx",
consistent=True,
seed=42, # deterministic locale-aware Faker surrogates
)
```
When MLX isn't available (Linux, Windows, Intel Mac, missing `mlx` package),
this exact same call automatically falls back to the PyTorch checkpoint
[`OpenMed/privacy-filter-nemotron`](https://huggingface.co/OpenMed/privacy-filter-nemotron)
with a one-time warning. Family-aware fallback: a Nemotron MLX request never
substitutes the unrelated `openai/privacy-filter` baseline.
### Direct MLX usage (lower-level)
```python
from huggingface_hub import snapshot_download
from openmed.mlx.inference import PrivacyFilterMLXPipeline
model_path = snapshot_download("OpenMed/privacy-filter-nemotron-mlx")
pipe = PrivacyFilterMLXPipeline(model_path)
print(pipe("Email me at alice.smith@example.com after 5pm."))
# [{'entity_group': 'email',
# 'score': 0.92,
# 'word': 'alice.smith@example.com',
# 'start': 12,
# 'end': 35}]
```
The pipeline returns a list of dicts with `entity_group`, `score`, `word`,
`start`, and `end` (character offsets into the input string).
### Loading from a local snapshot
```python
from openmed.mlx.models import load_model
import mlx.core as mx
model = load_model("/path/to/privacy-filter-nemotron-mlx")
ids = mx.array([[1, 100, 200, 300]], dtype=mx.int32)
mask = mx.ones((1, 4), dtype=mx.bool_)
logits = model(ids, attention_mask=mask) # shape (1, 4, 221)
```
## Hardware notes
- Designed for Apple Silicon (M-series GPUs); CPU inference works but is slower.
- Tested on macOS with `mlx>=0.18`. The MLX runtime in this repo is
independent of `mlx_lm` (token classification, not causal LM).
- Forward pass on a typical PII sentence (~10 tokens) takes ~14 ms on
M-series GPU after warmup. For lower latency or smaller memory footprint,
use the [`-mlx-8bit`](https://huggingface.co/OpenMed/privacy-filter-nemotron-mlx-8bit)
sibling instead.
## Credits & Acknowledgements
This model wouldn't exist without two open-source releases β sincere
thanks to both teams:
- **OpenAI** for [open-sourcing the Privacy Filter](https://huggingface.co/openai/privacy-filter)
(architecture, modeling code, and `opf` training/eval CLI). The MLX port
in this repo runs that same architecture under Apple's MLX framework.
- **NVIDIA** for releasing the [Nemotron-PII dataset](https://huggingface.co/datasets/nvidia/Nemotron-PII)
used to fine-tune the source PyTorch checkpoint.
Additional thanks to **Apple** for [MLX](https://github.com/ml-explore/mlx)
and the **HuggingFace** team for the model-distribution ecosystem.
## License
Apache 2.0 (matches the source checkpoint).
|