File size: 1,565 Bytes
608e5d3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
license: apache-2.0
pipeline_tag: token-classification
library_name: mlx
base_model: openai/privacy-filter
tags:
- transformers.js
- mlx
- mlx-embeddings
---

# mlx-community/openai-privacy-filter-6bit

The Model [mlx-community/openai-privacy-filter-6bit](https://huggingface.co/mlx-community/openai-privacy-filter-6bit) was converted to MLX format from [openai/privacy-filter](https://huggingface.co/openai/privacy-filter) using mlx-embeddings version **0.1.1**.

`openai/privacy-filter` is a bidirectional 1.5B-parameter / 50M-active sparse-MoE token classifier that tags personally identifiable information (PII) with BIOES spans over 8 categories (person, email, phone, URL, address, date, account number, secret).

## Use with mlx

```bash
pip install mlx-embeddings
```

```python
from itertools import groupby
import mlx.core as mx
from mlx_embeddings.utils import load

model, tokenizer = load("mlx-community/openai-privacy-filter-6bit")
id2label = model.config.id2label

text = "My name is Alice Smith and my email is alice@example.com. Phone: 555-1234."
inputs = tokenizer(text, return_tensors="mlx")

outputs = model(inputs["input_ids"], attention_mask=inputs["attention_mask"])
preds = mx.argmax(outputs.logits, axis=-1)[0].tolist()

entity = lambda p: id2label[str(p)].split("-", 1)[-1] if id2label[str(p)] != "O" else None

for ent, group in groupby(zip(inputs["input_ids"][0].tolist(), preds), key=lambda x: entity(x[1])):
    if ent:
        span = tokenizer.decode([tid for tid, _ in group]).strip()
        print(f"{ent:18s} -> {span!r}")
```