prince-canuma commited on
Commit
73372ca
·
verified ·
1 Parent(s): 1f5742e

Add README with PII token-classification usage example

Browse files
Files changed (1) hide show
  1. README.md +44 -0
README.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: token-classification
4
+ library_name: mlx
5
+ base_model: openai/privacy-filter
6
+ tags:
7
+ - transformers.js
8
+ - mlx
9
+ - mlx-embeddings
10
+ ---
11
+
12
+ # mlx-community/openai-privacy-filter-mxfp8
13
+
14
+ The Model [mlx-community/openai-privacy-filter-mxfp8](https://huggingface.co/mlx-community/openai-privacy-filter-mxfp8) was converted to MLX format from [openai/privacy-filter](https://huggingface.co/openai/privacy-filter) using mlx-embeddings version **0.1.1**.
15
+
16
+ `openai/privacy-filter` is a bidirectional 1.5B-parameter / 50M-active sparse-MoE token classifier that tags personally identifiable information (PII) with BIOES spans over 8 categories (person, email, phone, URL, address, date, account number, secret).
17
+
18
+ ## Use with mlx
19
+
20
+ ```bash
21
+ pip install mlx-embeddings
22
+ ```
23
+
24
+ ```python
25
+ from itertools import groupby
26
+ import mlx.core as mx
27
+ from mlx_embeddings.utils import load
28
+
29
+ model, tokenizer = load("mlx-community/openai-privacy-filter-mxfp8")
30
+ id2label = model.config.id2label
31
+
32
+ text = "My name is Alice Smith and my email is alice@example.com. Phone: 555-1234."
33
+ inputs = tokenizer(text, return_tensors="mlx")
34
+
35
+ outputs = model(inputs["input_ids"], attention_mask=inputs["attention_mask"])
36
+ preds = mx.argmax(outputs.logits, axis=-1)[0].tolist()
37
+
38
+ entity = lambda p: id2label[str(p)].split("-", 1)[-1] if id2label[str(p)] != "O" else None
39
+
40
+ for ent, group in groupby(zip(inputs["input_ids"][0].tolist(), preds), key=lambda x: entity(x[1])):
41
+ if ent:
42
+ span = tokenizer.decode([tid for tid, _ in group]).strip()
43
+ print(f"{ent:18s} -> {span!r}")
44
+ ```