File size: 4,404 Bytes
428fc63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4c9836d
428fc63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4c9836d
428fc63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4c9836d
428fc63
 
 
 
 
4c9836d
428fc63
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
license: apache-2.0
base_model: openai/privacy-filter
pipeline_tag: token-classification
library_name: openmed
tags:
  - openmed
  - mlx
  - apple-silicon
  - token-classification
  - pii
  - privacy
  - de-identification
  - redaction
  - quantized
  - int8
  - q8
  - medical
  - clinical
---

# OpenAI Privacy Filter MLX 8-bit

This repository contains an 8-bit OpenMed MLX artifact for [`openai/privacy-filter`](https://huggingface.co/openai/privacy-filter), packaged for local PII detection on Apple Silicon with [OpenMed](https://github.com/maziyarpanahi/openmed).

OpenAI Privacy Filter is a bidirectional token-classification model for detecting personally identifiable information in text. This OpenMed MLX build keeps the original BIOES token-label head, uses the `o200k_base` tokenizer assets, and runs with OpenMed's Python and Swift MLX runtimes.

After the model is downloaded once, inference runs locally. No document text is sent to a server.

## Model Details

- Source checkpoint: [`openai/privacy-filter`](https://huggingface.co/openai/privacy-filter)
- OpenMed MLX family: `openai-privacy-filter`
- Task: token classification for privacy span detection
- Weight format: `weights.safetensors`
- Quantization: 8-bit affine quantization, group size 64
- Runtime: OpenMed + MLX on Apple Silicon
- Tokenizer: `o200k_base` / tiktoken-style BPE
- Labels: `account_number`, `private_address`, `private_date`, `private_email`, `private_person`, `private_phone`, `private_url`, `secret`

This artifact uses expert-aware MLX quantization: embeddings, attention projections, MoE gates, sparse-MoE expert tensors, and the token-classification head are all stored in 8-bit packed form. The resulting `weights.safetensors` file is about 1.39 GiB, compared with about 2.61 GiB for the BF16 OpenMed MLX artifact.

## Quick Start: Python

```bash
pip install -U openmed "openmed[mlx]"
```

```python
from huggingface_hub import snapshot_download
from openmed.mlx.inference import create_mlx_pipeline

model_path = snapshot_download("OpenMed/privacy-filter-mlx-8bit")
pipe = create_mlx_pipeline(model_path)

text = "My name is Alice Smith and my email is alice.smith@example.com."
entities = pipe(text)

for entity in entities:
    print(entity)
```

Example output:

```python
{
    "entity_group": "private_person",
    "word": "Alice Smith",
    "start": 11,
    "end": 22,
    "score": 0.9999,
}
{
    "entity_group": "private_email",
    "word": "alice.smith@example.com",
    "start": 39,
    "end": 62,
    "score": 0.9998,
}
```

## Quick Start: Swift and Apple Apps

Add OpenMedKit to your Xcode project:

1. Open Xcode and choose File > Add Package Dependencies.
2. Paste `https://github.com/maziyarpanahi/openmed`.
3. Select the `OpenMedKit` package product.
4. Download and cache the MLX model once, then run inference locally.

```swift
import OpenMedKit

let modelURL = try await OpenMedModelStore.downloadMLXModel(
    repoID: "OpenMed/privacy-filter-mlx-8bit"
)

let openmed = try OpenMed(backend: .mlx(modelDirectoryURL: modelURL))
let entities = try openmed.extractPII(
    "My name is Alice Smith and my email is alice.smith@example.com."
)

for entity in entities {
    print(entity.text, entity.label, entity.score)
}
```

For iOS, run on Apple Silicon hardware. The iOS Simulator is not the recommended acceptance target for MLX inference.

## Validation

The 8-bit artifact was validated against the unquantized OpenMed MLX artifact with fixed text samples. BF16 and Q8 returned identical grouped spans for person, date, phone, email, address, and account-number examples.

OpenMed also includes unit tests for:

- q8 artifact loading
- quantization metadata decoding
- expert tensor packing and `.scales` coverage
- finite logits from the q8 runtime
- bf16/q8 shape and argmax-label coherence
- BIOES/Viterbi span decoding

## Intended Use

Use this model for local privacy filtering, PII detection, redaction workflows, and evaluation on Apple devices. For high-risk domains such as healthcare, legal, finance, education, and government, evaluate against your own data and policy requirements before production use.

## Credits

- Base checkpoint: [`openai/privacy-filter`](https://huggingface.co/openai/privacy-filter)
- MLX conversion and runtime support: [OpenMed](https://github.com/maziyarpanahi/openmed)
- OpenMed website: [https://openmed.life](https://openmed.life)