docs: refresh README — openmed examples, credits, public release

798bb66 verified 9 days ago

7.98 kB

	---
	license: apache-2.0
	base_model: OpenMed/privacy-filter-nemotron
	datasets:
	- nvidia/Nemotron-PII
	pipeline_tag: token-classification
	library_name: openmed
	tags:
	- openmed
	- mlx
	- apple-silicon
	- token-classification
	- pii
	- de-identification
	- medical
	- clinical
	- privacy-filter
	- nemotron
	language:
	- en
	---

	# OpenMed Privacy Filter (Nemotron) — MLX BF16

	A native [MLX](https://github.com/ml-explore/mlx) port of
	[`OpenMed/privacy-filter-nemotron`](https://huggingface.co/OpenMed/privacy-filter-nemotron)
	for fast, on-device PII detection on Apple Silicon. This BF16 artifact
	preserves the full source precision; for a smaller / faster sibling, see
	[`OpenMed/privacy-filter-nemotron-mlx-8bit`](https://huggingface.co/OpenMed/privacy-filter-nemotron-mlx-8bit).

	> Family at a glance. Same architecture and training data, three runtimes:
	> - PyTorch — [`OpenMed/privacy-filter-nemotron`](https://huggingface.co/OpenMed/privacy-filter-nemotron) — CPU + CUDA.
	> - MLX BF16 (this repo) — Apple Silicon, full precision (~2.6 GB).
	> - MLX 8-bit — [`OpenMed/privacy-filter-nemotron-mlx-8bit`](https://huggingface.co/OpenMed/privacy-filter-nemotron-mlx-8bit) — Apple Silicon, ~1.4 GB, ~1.7× faster.

	## What it does

	The model is a token classifier built on OpenAI's open Privacy Filter
	architecture (the same `openai_privacy_filter` model type used by
	[`openai/privacy-filter`](https://huggingface.co/openai/privacy-filter)).
	It tags each token with a BIOES label across 55 PII span classes, then
	a Viterbi pass over the BIOES grammar yields clean entity spans. Detected
	categories include:

	- Personal identifiers — `first_name`, `last_name`, `user_name`, `gender`, `age`, `date_of_birth`
	- Contact — `email`, `phone_number`, `fax_number`, `street_address`, `city`, `state`, `country`, `county`, `postcode`, `coordinate`
	- Government / legal IDs — `ssn`, `national_id`, `tax_id`, `certificate_license_number`
	- Financial — `account_number`, `bank_routing_number`, `credit_debit_card`, `cvv`, `pin`, `swift_bic`
	- Medical — `medical_record_number`, `health_plan_beneficiary_number`, `blood_type`
	- Workplace — `company_name`, `occupation`, `employee_id`, `customer_id`, `employment_status`, `education_level`
	- Online — `url`, `ipv4`, `ipv6`, `mac_address`, `http_cookie`, `api_key`, `password`, `device_identifier`
	- Demographic — `race_ethnicity`, `religious_belief`, `political_view`, `sexuality`, `language`
	- Vehicles — `license_plate`, `vehicle_identifier`
	- Time — `date`, `date_time`, `time`
	- Misc — `biometric_identifier`, `unique_id`

	<details>
	<summary>Full label schema (221 labels)</summary>

	The output space is `O` plus `B-`, `I-`, `E-`, `S-` for each of the 55
	span classes (4 × 55 + 1 = 221). The runtime `PrivacyFilterMLXPipeline`
	runs Viterbi over this BIOES grammar, so the consumer sees clean grouped
	entities rather than raw token tags.

	The full `id2label.json` is shipped alongside the weights in this repo.
	</details>

	For per-label accuracy, training recipe, and dataset details, see the
	[base PyTorch checkpoint](https://huggingface.co/OpenMed/privacy-filter-nemotron).

	## Architecture

	\| Field \| Value \|
	\| --- \| --- \|
	\| Source model type \| `openai_privacy_filter` \|
	\| Source architecture \| `OpenAIPrivacyFilterForTokenClassification` \|
	\| Hidden size \| 640 \|
	\| Transformer layers \| 8 \|
	\| Attention \| Grouped-Query (14 query heads / 2 KV heads, head_dim=64) with attention sinks \|
	\| FFN \| Sparse Mixture-of-Experts — 128 experts, top-4 routing, SwiGLU \|
	\| Position encoding \| YARN-scaled RoPE (`rope_theta=150_000`, factor=32) \|
	\| Context length \| 131,072 tokens (initial 4,096) \|
	\| Tokenizer \| `o200k_base` (tiktoken) — vocab 200,064 \|
	\| Output head \| Linear(640 → 221) with bias \|

	## File set

	\| File \| Size \| Purpose \|
	\| --- \| --- \| --- \|
	\| `weights.safetensors` \| 2.6 GB \| BF16 model weights in OpenMed-MLX layout \|
	\| `config.json` \| 19 KB \| Model + MLX runtime config \|
	\| `id2label.json` \| 5.4 KB \| Numeric ID → BIOES label string \|
	\| `openmed-mlx.json` \| 0.7 KB \| OpenMed MLX manifest (task, family, runtime hints) \|
	\| `tokenizer.json`, `tokenizer_config.json` \| 27 MB \| Source tokenizer files (kept for reference) \|

	The MLX runtime uses `tiktoken` `o200k_base` directly for tokenization;
	the `tokenizer.json` is kept so consumers can inspect or re-tokenize via
	`transformers` if desired.

	## Quick start

	### With [OpenMed](https://github.com/maziyarpanahi/openmed) — recommended

	OpenMed gives you a single `extract_pii()` / `deidentify()` API that
	auto-selects MLX on Apple Silicon and PyTorch elsewhere — same code on
	every host.

	```bash
	pip install -U "openmed[mlx]"
	```

	```python
	from openmed import extract_pii, deidentify

	text = (
	"Patient Sarah Johnson (DOB 03/15/1985), MRN 4872910, "
	"phone 415-555-0123, email sarah.johnson@example.com."
	)

	# Extract grouped entity spans (runs on MLX here, PyTorch fallback elsewhere)
	result = extract_pii(text, model_name="OpenMed/privacy-filter-nemotron-mlx")
	for ent in result.entities:
	print(f"{ent.label:30s} {ent.text!r} conf={ent.confidence:.2f}")

	# De-identify
	masked = deidentify(text, method="mask",
	model_name="OpenMed/privacy-filter-nemotron-mlx")
	fake = deidentify(
	text,
	method="replace",
	model_name="OpenMed/privacy-filter-nemotron-mlx",
	consistent=True,
	seed=42, # deterministic locale-aware Faker surrogates
	)
	```

	When MLX isn't available (Linux, Windows, Intel Mac, missing `mlx` package),
	this exact same call automatically falls back to the PyTorch checkpoint
	[`OpenMed/privacy-filter-nemotron`](https://huggingface.co/OpenMed/privacy-filter-nemotron)
	with a one-time warning. Family-aware fallback: a Nemotron MLX request never
	substitutes the unrelated `openai/privacy-filter` baseline.

	### Direct MLX usage (lower-level)

	```python
	from huggingface_hub import snapshot_download
	from openmed.mlx.inference import PrivacyFilterMLXPipeline

	model_path = snapshot_download("OpenMed/privacy-filter-nemotron-mlx")
	pipe = PrivacyFilterMLXPipeline(model_path)

	print(pipe("Email me at alice.smith@example.com after 5pm."))
	# [{'entity_group': 'email',
	# 'score': 0.92,
	# 'word': 'alice.smith@example.com',
	# 'start': 12,
	# 'end': 35}]
	```

	The pipeline returns a list of dicts with `entity_group`, `score`, `word`,
	`start`, and `end` (character offsets into the input string).

	### Loading from a local snapshot

	```python
	from openmed.mlx.models import load_model
	import mlx.core as mx

	model = load_model("/path/to/privacy-filter-nemotron-mlx")
	ids = mx.array([[1, 100, 200, 300]], dtype=mx.int32)
	mask = mx.ones((1, 4), dtype=mx.bool_)
	logits = model(ids, attention_mask=mask) # shape (1, 4, 221)
	```

	## Hardware notes

	- Designed for Apple Silicon (M-series GPUs); CPU inference works but is slower.
	- Tested on macOS with `mlx>=0.18`. The MLX runtime in this repo is
	independent of `mlx_lm` (token classification, not causal LM).
	- Forward pass on a typical PII sentence (~10 tokens) takes ~14 ms on
	M-series GPU after warmup. For lower latency or smaller memory footprint,
	use the [`-mlx-8bit`](https://huggingface.co/OpenMed/privacy-filter-nemotron-mlx-8bit)
	sibling instead.

	## Credits & Acknowledgements

	This model wouldn't exist without two open-source releases — sincere
	thanks to both teams:

	- OpenAI for [open-sourcing the Privacy Filter](https://huggingface.co/openai/privacy-filter)
	(architecture, modeling code, and `opf` training/eval CLI). The MLX port
	in this repo runs that same architecture under Apple's MLX framework.
	- NVIDIA for releasing the [Nemotron-PII dataset](https://huggingface.co/datasets/nvidia/Nemotron-PII)
	used to fine-tune the source PyTorch checkpoint.

	Additional thanks to Apple for [MLX](https://github.com/ml-explore/mlx)
	and the HuggingFace team for the model-distribution ecosystem.

	## License

	Apache 2.0 (matches the source checkpoint).