Add README with PII token-classification usage example

c855bd1 verified 14 days ago

1.57 kB

	---
	license: apache-2.0
	pipeline_tag: token-classification
	library_name: mlx
	base_model: openai/privacy-filter
	tags:
	- transformers.js
	- mlx
	- mlx-embeddings
	---

	# mlx-community/openai-privacy-filter-bf16

	The Model [mlx-community/openai-privacy-filter-bf16](https://huggingface.co/mlx-community/openai-privacy-filter-bf16) was converted to MLX format from [openai/privacy-filter](https://huggingface.co/openai/privacy-filter) using mlx-embeddings version 0.1.1.

	`openai/privacy-filter` is a bidirectional 1.5B-parameter / 50M-active sparse-MoE token classifier that tags personally identifiable information (PII) with BIOES spans over 8 categories (person, email, phone, URL, address, date, account number, secret).

	## Use with mlx

	```bash
	pip install mlx-embeddings
	```

	```python
	from itertools import groupby
	import mlx.core as mx
	from mlx_embeddings.utils import load

	model, tokenizer = load("mlx-community/openai-privacy-filter-bf16")
	id2label = model.config.id2label

	text = "My name is Alice Smith and my email is alice@example.com. Phone: 555-1234."
	inputs = tokenizer(text, return_tensors="mlx")

	outputs = model(inputs["input_ids"], attention_mask=inputs["attention_mask"])
	preds = mx.argmax(outputs.logits, axis=-1)[0].tolist()

	entity = lambda p: id2label[str(p)].split("-", 1)[-1] if id2label[str(p)] != "O" else None

	for ent, group in groupby(zip(inputs["input_ids"][0].tolist(), preds), key=lambda x: entity(x[1])):
	if ent:
	span = tokenizer.decode([tid for tid, _ in group]).strip()
	print(f"{ent:18s} -> {span!r}")
	```