Add README with PII token-classification usage example

91f5568 verified 14 days ago

1.57 kB

license: apache-2.0
pipeline_tag: token-classification
library_name: mlx
base_model: openai/privacy-filter
tags:
  - transformers.js
  - mlx
  - mlx-embeddings

mlx-community/openai-privacy-filter-nvfp4

The Model mlx-community/openai-privacy-filter-nvfp4 was converted to MLX format from openai/privacy-filter using mlx-embeddings version 0.1.1.

openai/privacy-filter is a bidirectional 1.5B-parameter / 50M-active sparse-MoE token classifier that tags personally identifiable information (PII) with BIOES spans over 8 categories (person, email, phone, URL, address, date, account number, secret).

Use with mlx

pip install mlx-embeddings

from itertools import groupby
import mlx.core as mx
from mlx_embeddings.utils import load

model, tokenizer = load("mlx-community/openai-privacy-filter-nvfp4")
id2label = model.config.id2label

text = "My name is Alice Smith and my email is alice@example.com. Phone: 555-1234."
inputs = tokenizer(text, return_tensors="mlx")

outputs = model(inputs["input_ids"], attention_mask=inputs["attention_mask"])
preds = mx.argmax(outputs.logits, axis=-1)[0].tolist()

entity = lambda p: id2label[str(p)].split("-", 1)[-1] if id2label[str(p)] != "O" else None

for ent, group in groupby(zip(inputs["input_ids"][0].tolist(), preds), key=lambda x: entity(x[1])):
    if ent:
        span = tokenizer.decode([tid for tid, _ in group]).strip()
        print(f"{ent:18s} -> {span!r}")