File size: 7,982 Bytes
00b3f2d
 
 
798bb66
 
00b3f2d
 
 
 
 
 
 
 
 
 
 
 
 
798bb66
 
00b3f2d
 
 
 
798bb66
 
 
 
 
00b3f2d
798bb66
 
 
 
00b3f2d
 
 
798bb66
 
 
 
 
 
00b3f2d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
798bb66
 
 
 
00b3f2d
 
 
 
798bb66
 
 
00b3f2d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
798bb66
 
 
 
 
 
 
00b3f2d
798bb66
 
 
00b3f2d
 
798bb66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
00b3f2d
 
798bb66
 
 
 
 
 
 
 
00b3f2d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
798bb66
 
00b3f2d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
798bb66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
00b3f2d
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
---
license: apache-2.0
base_model: OpenMed/privacy-filter-nemotron
datasets:
  - nvidia/Nemotron-PII
pipeline_tag: token-classification
library_name: openmed
tags:
  - openmed
  - mlx
  - apple-silicon
  - token-classification
  - pii
  - de-identification
  - medical
  - clinical
  - privacy-filter
  - nemotron
language:
  - en
---

# OpenMed Privacy Filter (Nemotron) β€” MLX BF16

A native [MLX](https://github.com/ml-explore/mlx) port of
[`OpenMed/privacy-filter-nemotron`](https://huggingface.co/OpenMed/privacy-filter-nemotron)
for fast, on-device PII detection on Apple Silicon. This BF16 artifact
preserves the full source precision; for a smaller / faster sibling, see
[`OpenMed/privacy-filter-nemotron-mlx-8bit`](https://huggingface.co/OpenMed/privacy-filter-nemotron-mlx-8bit).

> **Family at a glance.** Same architecture and training data, three runtimes:
> - **PyTorch** β€” [`OpenMed/privacy-filter-nemotron`](https://huggingface.co/OpenMed/privacy-filter-nemotron) β€” CPU + CUDA.
> - **MLX BF16 (this repo)** β€” Apple Silicon, full precision (~2.6 GB).
> - **MLX 8-bit** β€” [`OpenMed/privacy-filter-nemotron-mlx-8bit`](https://huggingface.co/OpenMed/privacy-filter-nemotron-mlx-8bit) β€” Apple Silicon, ~1.4 GB, ~1.7Γ— faster.

## What it does

The model is a token classifier built on OpenAI's open Privacy Filter
architecture (the same `openai_privacy_filter` model type used by
[`openai/privacy-filter`](https://huggingface.co/openai/privacy-filter)).
It tags each token with a BIOES label across **55 PII span classes**, then
a Viterbi pass over the BIOES grammar yields clean entity spans. Detected
categories include:

- Personal identifiers β€” `first_name`, `last_name`, `user_name`, `gender`, `age`, `date_of_birth`
- Contact β€” `email`, `phone_number`, `fax_number`, `street_address`, `city`, `state`, `country`, `county`, `postcode`, `coordinate`
- Government / legal IDs β€” `ssn`, `national_id`, `tax_id`, `certificate_license_number`
- Financial β€” `account_number`, `bank_routing_number`, `credit_debit_card`, `cvv`, `pin`, `swift_bic`
- Medical β€” `medical_record_number`, `health_plan_beneficiary_number`, `blood_type`
- Workplace β€” `company_name`, `occupation`, `employee_id`, `customer_id`, `employment_status`, `education_level`
- Online β€” `url`, `ipv4`, `ipv6`, `mac_address`, `http_cookie`, `api_key`, `password`, `device_identifier`
- Demographic β€” `race_ethnicity`, `religious_belief`, `political_view`, `sexuality`, `language`
- Vehicles β€” `license_plate`, `vehicle_identifier`
- Time β€” `date`, `date_time`, `time`
- Misc β€” `biometric_identifier`, `unique_id`

<details>
<summary>Full label schema (221 labels)</summary>

The output space is `O` plus `B-`, `I-`, `E-`, `S-` for each of the 55
span classes (4 Γ— 55 + 1 = 221). The runtime `PrivacyFilterMLXPipeline`
runs Viterbi over this BIOES grammar, so the consumer sees clean grouped
entities rather than raw token tags.

The full `id2label.json` is shipped alongside the weights in this repo.
</details>

For per-label accuracy, training recipe, and dataset details, see the
[base PyTorch checkpoint](https://huggingface.co/OpenMed/privacy-filter-nemotron).

## Architecture

| Field | Value |
| --- | --- |
| Source model type | `openai_privacy_filter` |
| Source architecture | `OpenAIPrivacyFilterForTokenClassification` |
| Hidden size | 640 |
| Transformer layers | 8 |
| Attention | Grouped-Query (14 query heads / 2 KV heads, head_dim=64) with attention sinks |
| FFN | Sparse Mixture-of-Experts β€” 128 experts, top-4 routing, SwiGLU |
| Position encoding | YARN-scaled RoPE (`rope_theta=150_000`, factor=32) |
| Context length | 131,072 tokens (initial 4,096) |
| Tokenizer | `o200k_base` (tiktoken) β€” vocab 200,064 |
| Output head | Linear(640 β†’ 221) with bias |

## File set

| File | Size | Purpose |
| --- | --- | --- |
| `weights.safetensors` | 2.6 GB | BF16 model weights in OpenMed-MLX layout |
| `config.json` | 19 KB | Model + MLX runtime config |
| `id2label.json` | 5.4 KB | Numeric ID β†’ BIOES label string |
| `openmed-mlx.json` | 0.7 KB | OpenMed MLX manifest (task, family, runtime hints) |
| `tokenizer.json`, `tokenizer_config.json` | 27 MB | Source tokenizer files (kept for reference) |

The MLX runtime uses `tiktoken` `o200k_base` directly for tokenization;
the `tokenizer.json` is kept so consumers can inspect or re-tokenize via
`transformers` if desired.

## Quick start

### With [OpenMed](https://github.com/maziyarpanahi/openmed) β€” recommended

OpenMed gives you a single `extract_pii()` / `deidentify()` API that
auto-selects MLX on Apple Silicon and PyTorch elsewhere β€” same code on
every host.

```bash
pip install -U "openmed[mlx]"
```

```python
from openmed import extract_pii, deidentify

text = (
    "Patient Sarah Johnson (DOB 03/15/1985), MRN 4872910, "
    "phone 415-555-0123, email sarah.johnson@example.com."
)

# Extract grouped entity spans (runs on MLX here, PyTorch fallback elsewhere)
result = extract_pii(text, model_name="OpenMed/privacy-filter-nemotron-mlx")
for ent in result.entities:
    print(f"{ent.label:30s} {ent.text!r}  conf={ent.confidence:.2f}")

# De-identify
masked = deidentify(text, method="mask",
                    model_name="OpenMed/privacy-filter-nemotron-mlx")
fake   = deidentify(
    text,
    method="replace",
    model_name="OpenMed/privacy-filter-nemotron-mlx",
    consistent=True,
    seed=42,   # deterministic locale-aware Faker surrogates
)
```

When MLX isn't available (Linux, Windows, Intel Mac, missing `mlx` package),
this exact same call automatically falls back to the PyTorch checkpoint
[`OpenMed/privacy-filter-nemotron`](https://huggingface.co/OpenMed/privacy-filter-nemotron)
with a one-time warning. Family-aware fallback: a Nemotron MLX request never
substitutes the unrelated `openai/privacy-filter` baseline.

### Direct MLX usage (lower-level)

```python
from huggingface_hub import snapshot_download
from openmed.mlx.inference import PrivacyFilterMLXPipeline

model_path = snapshot_download("OpenMed/privacy-filter-nemotron-mlx")
pipe = PrivacyFilterMLXPipeline(model_path)

print(pipe("Email me at alice.smith@example.com after 5pm."))
# [{'entity_group': 'email',
#   'score': 0.92,
#   'word': 'alice.smith@example.com',
#   'start': 12,
#   'end': 35}]
```

The pipeline returns a list of dicts with `entity_group`, `score`, `word`,
`start`, and `end` (character offsets into the input string).

### Loading from a local snapshot

```python
from openmed.mlx.models import load_model
import mlx.core as mx

model = load_model("/path/to/privacy-filter-nemotron-mlx")
ids = mx.array([[1, 100, 200, 300]], dtype=mx.int32)
mask = mx.ones((1, 4), dtype=mx.bool_)
logits = model(ids, attention_mask=mask)   # shape (1, 4, 221)
```

## Hardware notes

- Designed for Apple Silicon (M-series GPUs); CPU inference works but is slower.
- Tested on macOS with `mlx>=0.18`. The MLX runtime in this repo is
  independent of `mlx_lm` (token classification, not causal LM).
- Forward pass on a typical PII sentence (~10 tokens) takes ~14 ms on
  M-series GPU after warmup. For lower latency or smaller memory footprint,
  use the [`-mlx-8bit`](https://huggingface.co/OpenMed/privacy-filter-nemotron-mlx-8bit)
  sibling instead.

## Credits & Acknowledgements

This model wouldn't exist without two open-source releases β€” sincere
thanks to both teams:

- **OpenAI** for [open-sourcing the Privacy Filter](https://huggingface.co/openai/privacy-filter)
  (architecture, modeling code, and `opf` training/eval CLI). The MLX port
  in this repo runs that same architecture under Apple's MLX framework.
- **NVIDIA** for releasing the [Nemotron-PII dataset](https://huggingface.co/datasets/nvidia/Nemotron-PII)
  used to fine-tune the source PyTorch checkpoint.

Additional thanks to **Apple** for [MLX](https://github.com/ml-explore/mlx)
and the **HuggingFace** team for the model-distribution ecosystem.

## License

Apache 2.0 (matches the source checkpoint).