Update 8-bit Privacy Filter artifact with expert quantization

4c9836d verified 13 days ago

4.4 kB

	---
	license: apache-2.0
	base_model: openai/privacy-filter
	pipeline_tag: token-classification
	library_name: openmed
	tags:
	- openmed
	- mlx
	- apple-silicon
	- token-classification
	- pii
	- privacy
	- de-identification
	- redaction
	- quantized
	- int8
	- q8
	- medical
	- clinical
	---

	# OpenAI Privacy Filter MLX 8-bit

	This repository contains an 8-bit OpenMed MLX artifact for [`openai/privacy-filter`](https://huggingface.co/openai/privacy-filter), packaged for local PII detection on Apple Silicon with [OpenMed](https://github.com/maziyarpanahi/openmed).

	OpenAI Privacy Filter is a bidirectional token-classification model for detecting personally identifiable information in text. This OpenMed MLX build keeps the original BIOES token-label head, uses the `o200k_base` tokenizer assets, and runs with OpenMed's Python and Swift MLX runtimes.

	After the model is downloaded once, inference runs locally. No document text is sent to a server.

	## Model Details

	- Source checkpoint: [`openai/privacy-filter`](https://huggingface.co/openai/privacy-filter)
	- OpenMed MLX family: `openai-privacy-filter`
	- Task: token classification for privacy span detection
	- Weight format: `weights.safetensors`
	- Quantization: 8-bit affine quantization, group size 64
	- Runtime: OpenMed + MLX on Apple Silicon
	- Tokenizer: `o200k_base` / tiktoken-style BPE
	- Labels: `account_number`, `private_address`, `private_date`, `private_email`, `private_person`, `private_phone`, `private_url`, `secret`

	This artifact uses expert-aware MLX quantization: embeddings, attention projections, MoE gates, sparse-MoE expert tensors, and the token-classification head are all stored in 8-bit packed form. The resulting `weights.safetensors` file is about 1.39 GiB, compared with about 2.61 GiB for the BF16 OpenMed MLX artifact.

	## Quick Start: Python

	```bash
	pip install -U openmed "openmed[mlx]"
	```

	```python
	from huggingface_hub import snapshot_download
	from openmed.mlx.inference import create_mlx_pipeline

	model_path = snapshot_download("OpenMed/privacy-filter-mlx-8bit")
	pipe = create_mlx_pipeline(model_path)

	text = "My name is Alice Smith and my email is alice.smith@example.com."
	entities = pipe(text)

	for entity in entities:
	print(entity)
	```

	Example output:

	```python
	{
	"entity_group": "private_person",
	"word": "Alice Smith",
	"start": 11,
	"end": 22,
	"score": 0.9999,
	}
	{
	"entity_group": "private_email",
	"word": "alice.smith@example.com",
	"start": 39,
	"end": 62,
	"score": 0.9998,
	}
	```

	## Quick Start: Swift and Apple Apps

	Add OpenMedKit to your Xcode project:

	1. Open Xcode and choose File > Add Package Dependencies.
	2. Paste `https://github.com/maziyarpanahi/openmed`.
	3. Select the `OpenMedKit` package product.
	4. Download and cache the MLX model once, then run inference locally.

	```swift
	import OpenMedKit

	let modelURL = try await OpenMedModelStore.downloadMLXModel(
	repoID: "OpenMed/privacy-filter-mlx-8bit"
	)

	let openmed = try OpenMed(backend: .mlx(modelDirectoryURL: modelURL))
	let entities = try openmed.extractPII(
	"My name is Alice Smith and my email is alice.smith@example.com."
	)

	for entity in entities {
	print(entity.text, entity.label, entity.score)
	}
	```

	For iOS, run on Apple Silicon hardware. The iOS Simulator is not the recommended acceptance target for MLX inference.

	## Validation

	The 8-bit artifact was validated against the unquantized OpenMed MLX artifact with fixed text samples. BF16 and Q8 returned identical grouped spans for person, date, phone, email, address, and account-number examples.

	OpenMed also includes unit tests for:

	- q8 artifact loading
	- quantization metadata decoding
	- expert tensor packing and `.scales` coverage
	- finite logits from the q8 runtime
	- bf16/q8 shape and argmax-label coherence
	- BIOES/Viterbi span decoding

	## Intended Use

	Use this model for local privacy filtering, PII detection, redaction workflows, and evaluation on Apple devices. For high-risk domains such as healthcare, legal, finance, education, and government, evaluate against your own data and policy requirements before production use.

	## Credits

	- Base checkpoint: [`openai/privacy-filter`](https://huggingface.co/openai/privacy-filter)
	- MLX conversion and runtime support: [OpenMed](https://github.com/maziyarpanahi/openmed)
	- OpenMed website: [https://openmed.life](https://openmed.life)