| --- |
| license: apache-2.0 |
| pipeline_tag: token-classification |
| library_name: mlx |
| base_model: openai/privacy-filter |
| tags: |
| - transformers.js |
| - mlx |
| - mlx-embeddings |
| --- |
| |
| # mlx-community/openai-privacy-filter-4bit |
|
|
| The Model [mlx-community/openai-privacy-filter-4bit](https://huggingface.co/mlx-community/openai-privacy-filter-4bit) was converted to MLX format from [openai/privacy-filter](https://huggingface.co/openai/privacy-filter) using mlx-embeddings version **0.1.1**. |
|
|
| `openai/privacy-filter` is a bidirectional 1.5B-parameter / 50M-active sparse-MoE token classifier that tags personally identifiable information (PII) with BIOES spans over 8 categories (person, email, phone, URL, address, date, account number, secret). |
|
|
| ## Use with mlx |
|
|
| ```bash |
| pip install mlx-embeddings |
| ``` |
|
|
| ```python |
| from itertools import groupby |
| import mlx.core as mx |
| from mlx_embeddings.utils import load |
| |
| model, tokenizer = load("mlx-community/openai-privacy-filter-4bit") |
| id2label = model.config.id2label |
| |
| text = "My name is Alice Smith and my email is alice@example.com. Phone: 555-1234." |
| inputs = tokenizer(text, return_tensors="mlx") |
| |
| outputs = model(inputs["input_ids"], attention_mask=inputs["attention_mask"]) |
| preds = mx.argmax(outputs.logits, axis=-1)[0].tolist() |
| |
| entity = lambda p: id2label[str(p)].split("-", 1)[-1] if id2label[str(p)] != "O" else None |
| |
| for ent, group in groupby(zip(inputs["input_ids"][0].tolist(), preds), key=lambda x: entity(x[1])): |
| if ent: |
| span = tokenizer.decode([tid for tid, _ in group]).strip() |
| print(f"{ent:18s} -> {span!r}") |
| ``` |
|
|