MaziyarPanahi commited on
Commit
00b3f2d
·
verified ·
1 Parent(s): 7dbf1f6

Upload MLX packaging for privacy-filter-nemotron-mlx

Browse files
Files changed (8) hide show
  1. .gitattributes +1 -0
  2. README.md +136 -0
  3. config.json +809 -0
  4. id2label.json +223 -0
  5. openmed-mlx.json +31 -0
  6. tokenizer.json +3 -0
  7. tokenizer_config.json +12 -0
  8. weights.safetensors +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: OpenMed/privacy-filter-nemotron
4
+ pipeline_tag: token-classification
5
+ library_name: openmed
6
+ tags:
7
+ - openmed
8
+ - mlx
9
+ - apple-silicon
10
+ - token-classification
11
+ - pii
12
+ - de-identification
13
+ - medical
14
+ - clinical
15
+ - privacy-filter
16
+ - nemotron
17
+ ---
18
+
19
+ # OpenMed Privacy Filter (Nemotron) — MLX BF16
20
+
21
+ A native [MLX](https://github.com/ml-explore/mlx) port of [`OpenMed/privacy-filter-nemotron`](https://huggingface.co/OpenMed/privacy-filter-nemotron) for fast, on-device PII detection on Apple Silicon. This BF16 artifact preserves the full source precision; for a smaller / faster sibling, see [`OpenMed/privacy-filter-nemotron-mlx-8bit`](https://huggingface.co/OpenMed/privacy-filter-nemotron-mlx-8bit).
22
+
23
+ > **Private / early access.** Both this repo and the source checkpoint are private. You need explicit access to download.
24
+
25
+ ## What it does
26
+
27
+ The model is a token classifier trained on OpenAI's open Privacy Filter architecture (the same `openai_privacy_filter` model type used by `openai/privacy-filter`). It tags each token in a string with a BIOES label across **55 PII span classes**, then a Viterbi pass over the BIOES grammar yields clean entity spans. Detected categories include:
28
+
29
+ - Personal identifiers — `first_name`, `last_name`, `user_name`, `gender`, `age`, `date_of_birth`
30
+ - Contact — `email`, `phone_number`, `fax_number`, `street_address`, `city`, `state`, `country`, `county`, `postcode`, `coordinate`
31
+ - Government / legal IDs — `ssn`, `national_id`, `tax_id`, `certificate_license_number`
32
+ - Financial — `account_number`, `bank_routing_number`, `credit_debit_card`, `cvv`, `pin`, `swift_bic`
33
+ - Medical — `medical_record_number`, `health_plan_beneficiary_number`, `blood_type`
34
+ - Workplace — `company_name`, `occupation`, `employee_id`, `customer_id`, `employment_status`, `education_level`
35
+ - Online — `url`, `ipv4`, `ipv6`, `mac_address`, `http_cookie`, `api_key`, `password`, `device_identifier`
36
+ - Demographic — `race_ethnicity`, `religious_belief`, `political_view`, `sexuality`, `language`
37
+ - Vehicles — `license_plate`, `vehicle_identifier`
38
+ - Time — `date`, `date_time`, `time`
39
+ - Misc — `biometric_identifier`, `unique_id`
40
+
41
+ <details>
42
+ <summary>Full label schema (221 labels)</summary>
43
+
44
+ The output space is `O` plus `B-`, `I-`, `E-`, `S-` for each of the 55 span classes (4 × 55 + 1 = 221). The runtime `PrivacyFilterMLXPipeline` runs Viterbi over this BIOES grammar, so the consumer sees clean grouped entities rather than raw token tags.
45
+
46
+ The full `id2label.json` is shipped alongside the weights in this repo.
47
+ </details>
48
+
49
+ ## Architecture
50
+
51
+ | Field | Value |
52
+ | --- | --- |
53
+ | Source model type | `openai_privacy_filter` |
54
+ | Source architecture | `OpenAIPrivacyFilterForTokenClassification` |
55
+ | Hidden size | 640 |
56
+ | Transformer layers | 8 |
57
+ | Attention | Grouped-Query (14 query heads / 2 KV heads, head_dim=64) with attention sinks |
58
+ | FFN | Sparse Mixture-of-Experts — 128 experts, top-4 routing, SwiGLU |
59
+ | Position encoding | YARN-scaled RoPE (`rope_theta=150_000`, factor=32) |
60
+ | Context length | 131,072 tokens (initial 4,096) |
61
+ | Tokenizer | `o200k_base` (tiktoken) — vocab 200,064 |
62
+ | Output head | Linear(640 → 221) with bias |
63
+
64
+ ## File set
65
+
66
+ | File | Size | Purpose |
67
+ | --- | --- | --- |
68
+ | `weights.safetensors` | 2.6 GB | BF16 model weights in OpenMed-MLX layout |
69
+ | `config.json` | 19 KB | Model + MLX runtime config |
70
+ | `id2label.json` | 5.4 KB | Numeric ID → BIOES label string |
71
+ | `openmed-mlx.json` | 0.7 KB | OpenMed MLX manifest (task, family, runtime hints) |
72
+ | `tokenizer.json`, `tokenizer_config.json` | 27 MB | Source tokenizer files (kept for reference) |
73
+
74
+ The MLX runtime uses `tiktoken` `o200k_base` directly for tokenization; the `tokenizer.json` is kept so consumers can inspect or re-tokenize via `transformers` if desired.
75
+
76
+ ## Usage
77
+
78
+ ```bash
79
+ pip install "openmed[mlx]"
80
+ ```
81
+
82
+ ```python
83
+ from huggingface_hub import snapshot_download
84
+ from openmed.mlx.inference import PrivacyFilterMLXPipeline
85
+
86
+ model_path = snapshot_download("OpenMed/privacy-filter-nemotron-mlx")
87
+ pipe = PrivacyFilterMLXPipeline(model_path)
88
+
89
+ print(pipe("Email me at alice.smith@example.com after 5pm."))
90
+ # [{'entity_group': 'email',
91
+ # 'score': 0.92,
92
+ # 'word': 'alice.smith@example.com',
93
+ # 'start': 12,
94
+ # 'end': 35}]
95
+ ```
96
+
97
+ The pipeline returns a list of dicts with `entity_group`, `score`, `word`, `start`, and `end` (character offsets into the input string).
98
+
99
+ ### Loading from a local snapshot
100
+
101
+ ```python
102
+ from openmed.mlx.models import load_model
103
+ import mlx.core as mx
104
+
105
+ model = load_model("/path/to/privacy-filter-nemotron-mlx")
106
+ ids = mx.array([[1, 100, 200, 300]], dtype=mx.int32)
107
+ mask = mx.ones((1, 4), dtype=mx.bool_)
108
+ logits = model(ids, attention_mask=mask) # shape (1, 4, 221)
109
+ ```
110
+
111
+ ## Hardware notes
112
+
113
+ - Designed for Apple Silicon (M-series GPUs); CPU inference works but is slower.
114
+ - Tested on macOS with `mlx>=0.18`, `mlx_lm>=0.31`. The MLX runtime in this repo is independent of `mlx_lm` (token classification, not causal LM).
115
+ - Forward pass on a typical PII sentence (~10 tokens) takes ~14 ms on M-series GPU after warmup. For lower latency or smaller memory footprint, use the [`-mlx-8bit`](https://huggingface.co/OpenMed/privacy-filter-nemotron-mlx-8bit) sibling instead.
116
+
117
+ ## Conversion
118
+
119
+ This repo was produced by [`openmed.mlx.convert`](https://github.com/maziyarpanahi/openmed) from the source HuggingFace checkpoint. The converter:
120
+
121
+ 1. Downloads `model.safetensors` from the source repo.
122
+ 2. Remaps the standard transformers state dict (`model.embed_tokens`, `self_attn.q_proj/k_proj/v_proj`, `mlp.experts.gate_up_proj`, `score`, …) into the OpenMed-MLX layout (`embedding.weight`, fused `block.X.attn.qkv.weight`, `block.X.mlp.swiglu.weight`, `unembedding.weight`, …).
123
+ 3. Writes an `openmed-mlx.json` manifest that pins the runtime contract: `task=token-classification`, `family=openai-privacy-filter`, `runtime.decode=bioes-viterbi`, `runtime.tokenizer=tiktoken`.
124
+
125
+ For the quantized variant, the same path runs an additional `mlx.nn.quantize(..., bits=8, group_size=64, mode="affine")` pass over `embedding`, attention `qkv`/`out`, MoE `gate`, expert `swiglu`/`out`, and `unembedding`.
126
+
127
+ ## Credits
128
+
129
+ - Source checkpoint: [`OpenMed/privacy-filter-nemotron`](https://huggingface.co/OpenMed/privacy-filter-nemotron)
130
+ - Architecture: OpenAI Privacy Filter (open architecture, custom modeling code) adapted by OpenMed.
131
+ - MLX conversion + runtime: OpenMed
132
+ - Apple Silicon SDK: [Apple MLX](https://github.com/ml-explore/mlx)
133
+
134
+ ## License
135
+
136
+ Apache 2.0 (matches the source checkpoint).
config.json ADDED
@@ -0,0 +1,809 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "OpenAIPrivacyFilterForTokenClassification"
4
+ ],
5
+ "attention_bias": true,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": null,
8
+ "classifier_dropout": 0.0,
9
+ "default_n_ctx": 128000,
10
+ "dtype": "bfloat16",
11
+ "eos_token_id": 199999,
12
+ "head_dim": 64,
13
+ "hidden_act": "silu",
14
+ "hidden_size": 640,
15
+ "id2label": {
16
+ "0": "O",
17
+ "1": "B-account_number",
18
+ "2": "I-account_number",
19
+ "3": "E-account_number",
20
+ "4": "S-account_number",
21
+ "5": "B-age",
22
+ "6": "I-age",
23
+ "7": "E-age",
24
+ "8": "S-age",
25
+ "9": "B-api_key",
26
+ "10": "I-api_key",
27
+ "11": "E-api_key",
28
+ "12": "S-api_key",
29
+ "13": "B-bank_routing_number",
30
+ "14": "I-bank_routing_number",
31
+ "15": "E-bank_routing_number",
32
+ "16": "S-bank_routing_number",
33
+ "17": "B-biometric_identifier",
34
+ "18": "I-biometric_identifier",
35
+ "19": "E-biometric_identifier",
36
+ "20": "S-biometric_identifier",
37
+ "21": "B-blood_type",
38
+ "22": "I-blood_type",
39
+ "23": "E-blood_type",
40
+ "24": "S-blood_type",
41
+ "25": "B-certificate_license_number",
42
+ "26": "I-certificate_license_number",
43
+ "27": "E-certificate_license_number",
44
+ "28": "S-certificate_license_number",
45
+ "29": "B-city",
46
+ "30": "I-city",
47
+ "31": "E-city",
48
+ "32": "S-city",
49
+ "33": "B-company_name",
50
+ "34": "I-company_name",
51
+ "35": "E-company_name",
52
+ "36": "S-company_name",
53
+ "37": "B-coordinate",
54
+ "38": "I-coordinate",
55
+ "39": "E-coordinate",
56
+ "40": "S-coordinate",
57
+ "41": "B-country",
58
+ "42": "I-country",
59
+ "43": "E-country",
60
+ "44": "S-country",
61
+ "45": "B-county",
62
+ "46": "I-county",
63
+ "47": "E-county",
64
+ "48": "S-county",
65
+ "49": "B-credit_debit_card",
66
+ "50": "I-credit_debit_card",
67
+ "51": "E-credit_debit_card",
68
+ "52": "S-credit_debit_card",
69
+ "53": "B-customer_id",
70
+ "54": "I-customer_id",
71
+ "55": "E-customer_id",
72
+ "56": "S-customer_id",
73
+ "57": "B-cvv",
74
+ "58": "I-cvv",
75
+ "59": "E-cvv",
76
+ "60": "S-cvv",
77
+ "61": "B-date",
78
+ "62": "I-date",
79
+ "63": "E-date",
80
+ "64": "S-date",
81
+ "65": "B-date_of_birth",
82
+ "66": "I-date_of_birth",
83
+ "67": "E-date_of_birth",
84
+ "68": "S-date_of_birth",
85
+ "69": "B-date_time",
86
+ "70": "I-date_time",
87
+ "71": "E-date_time",
88
+ "72": "S-date_time",
89
+ "73": "B-device_identifier",
90
+ "74": "I-device_identifier",
91
+ "75": "E-device_identifier",
92
+ "76": "S-device_identifier",
93
+ "77": "B-education_level",
94
+ "78": "I-education_level",
95
+ "79": "E-education_level",
96
+ "80": "S-education_level",
97
+ "81": "B-email",
98
+ "82": "I-email",
99
+ "83": "E-email",
100
+ "84": "S-email",
101
+ "85": "B-employee_id",
102
+ "86": "I-employee_id",
103
+ "87": "E-employee_id",
104
+ "88": "S-employee_id",
105
+ "89": "B-employment_status",
106
+ "90": "I-employment_status",
107
+ "91": "E-employment_status",
108
+ "92": "S-employment_status",
109
+ "93": "B-fax_number",
110
+ "94": "I-fax_number",
111
+ "95": "E-fax_number",
112
+ "96": "S-fax_number",
113
+ "97": "B-first_name",
114
+ "98": "I-first_name",
115
+ "99": "E-first_name",
116
+ "100": "S-first_name",
117
+ "101": "B-gender",
118
+ "102": "I-gender",
119
+ "103": "E-gender",
120
+ "104": "S-gender",
121
+ "105": "B-health_plan_beneficiary_number",
122
+ "106": "I-health_plan_beneficiary_number",
123
+ "107": "E-health_plan_beneficiary_number",
124
+ "108": "S-health_plan_beneficiary_number",
125
+ "109": "B-http_cookie",
126
+ "110": "I-http_cookie",
127
+ "111": "E-http_cookie",
128
+ "112": "S-http_cookie",
129
+ "113": "B-ipv4",
130
+ "114": "I-ipv4",
131
+ "115": "E-ipv4",
132
+ "116": "S-ipv4",
133
+ "117": "B-ipv6",
134
+ "118": "I-ipv6",
135
+ "119": "E-ipv6",
136
+ "120": "S-ipv6",
137
+ "121": "B-language",
138
+ "122": "I-language",
139
+ "123": "E-language",
140
+ "124": "S-language",
141
+ "125": "B-last_name",
142
+ "126": "I-last_name",
143
+ "127": "E-last_name",
144
+ "128": "S-last_name",
145
+ "129": "B-license_plate",
146
+ "130": "I-license_plate",
147
+ "131": "E-license_plate",
148
+ "132": "S-license_plate",
149
+ "133": "B-mac_address",
150
+ "134": "I-mac_address",
151
+ "135": "E-mac_address",
152
+ "136": "S-mac_address",
153
+ "137": "B-medical_record_number",
154
+ "138": "I-medical_record_number",
155
+ "139": "E-medical_record_number",
156
+ "140": "S-medical_record_number",
157
+ "141": "B-national_id",
158
+ "142": "I-national_id",
159
+ "143": "E-national_id",
160
+ "144": "S-national_id",
161
+ "145": "B-occupation",
162
+ "146": "I-occupation",
163
+ "147": "E-occupation",
164
+ "148": "S-occupation",
165
+ "149": "B-password",
166
+ "150": "I-password",
167
+ "151": "E-password",
168
+ "152": "S-password",
169
+ "153": "B-phone_number",
170
+ "154": "I-phone_number",
171
+ "155": "E-phone_number",
172
+ "156": "S-phone_number",
173
+ "157": "B-pin",
174
+ "158": "I-pin",
175
+ "159": "E-pin",
176
+ "160": "S-pin",
177
+ "161": "B-political_view",
178
+ "162": "I-political_view",
179
+ "163": "E-political_view",
180
+ "164": "S-political_view",
181
+ "165": "B-postcode",
182
+ "166": "I-postcode",
183
+ "167": "E-postcode",
184
+ "168": "S-postcode",
185
+ "169": "B-race_ethnicity",
186
+ "170": "I-race_ethnicity",
187
+ "171": "E-race_ethnicity",
188
+ "172": "S-race_ethnicity",
189
+ "173": "B-religious_belief",
190
+ "174": "I-religious_belief",
191
+ "175": "E-religious_belief",
192
+ "176": "S-religious_belief",
193
+ "177": "B-sexuality",
194
+ "178": "I-sexuality",
195
+ "179": "E-sexuality",
196
+ "180": "S-sexuality",
197
+ "181": "B-ssn",
198
+ "182": "I-ssn",
199
+ "183": "E-ssn",
200
+ "184": "S-ssn",
201
+ "185": "B-state",
202
+ "186": "I-state",
203
+ "187": "E-state",
204
+ "188": "S-state",
205
+ "189": "B-street_address",
206
+ "190": "I-street_address",
207
+ "191": "E-street_address",
208
+ "192": "S-street_address",
209
+ "193": "B-swift_bic",
210
+ "194": "I-swift_bic",
211
+ "195": "E-swift_bic",
212
+ "196": "S-swift_bic",
213
+ "197": "B-tax_id",
214
+ "198": "I-tax_id",
215
+ "199": "E-tax_id",
216
+ "200": "S-tax_id",
217
+ "201": "B-time",
218
+ "202": "I-time",
219
+ "203": "E-time",
220
+ "204": "S-time",
221
+ "205": "B-unique_id",
222
+ "206": "I-unique_id",
223
+ "207": "E-unique_id",
224
+ "208": "S-unique_id",
225
+ "209": "B-url",
226
+ "210": "I-url",
227
+ "211": "E-url",
228
+ "212": "S-url",
229
+ "213": "B-user_name",
230
+ "214": "I-user_name",
231
+ "215": "E-user_name",
232
+ "216": "S-user_name",
233
+ "217": "B-vehicle_identifier",
234
+ "218": "I-vehicle_identifier",
235
+ "219": "E-vehicle_identifier",
236
+ "220": "S-vehicle_identifier"
237
+ },
238
+ "initial_context_length": 4096,
239
+ "initializer_range": 0.02,
240
+ "intermediate_size": 640,
241
+ "label2id": {
242
+ "O": 0,
243
+ "B-account_number": 1,
244
+ "I-account_number": 2,
245
+ "E-account_number": 3,
246
+ "S-account_number": 4,
247
+ "B-age": 5,
248
+ "I-age": 6,
249
+ "E-age": 7,
250
+ "S-age": 8,
251
+ "B-api_key": 9,
252
+ "I-api_key": 10,
253
+ "E-api_key": 11,
254
+ "S-api_key": 12,
255
+ "B-bank_routing_number": 13,
256
+ "I-bank_routing_number": 14,
257
+ "E-bank_routing_number": 15,
258
+ "S-bank_routing_number": 16,
259
+ "B-biometric_identifier": 17,
260
+ "I-biometric_identifier": 18,
261
+ "E-biometric_identifier": 19,
262
+ "S-biometric_identifier": 20,
263
+ "B-blood_type": 21,
264
+ "I-blood_type": 22,
265
+ "E-blood_type": 23,
266
+ "S-blood_type": 24,
267
+ "B-certificate_license_number": 25,
268
+ "I-certificate_license_number": 26,
269
+ "E-certificate_license_number": 27,
270
+ "S-certificate_license_number": 28,
271
+ "B-city": 29,
272
+ "I-city": 30,
273
+ "E-city": 31,
274
+ "S-city": 32,
275
+ "B-company_name": 33,
276
+ "I-company_name": 34,
277
+ "E-company_name": 35,
278
+ "S-company_name": 36,
279
+ "B-coordinate": 37,
280
+ "I-coordinate": 38,
281
+ "E-coordinate": 39,
282
+ "S-coordinate": 40,
283
+ "B-country": 41,
284
+ "I-country": 42,
285
+ "E-country": 43,
286
+ "S-country": 44,
287
+ "B-county": 45,
288
+ "I-county": 46,
289
+ "E-county": 47,
290
+ "S-county": 48,
291
+ "B-credit_debit_card": 49,
292
+ "I-credit_debit_card": 50,
293
+ "E-credit_debit_card": 51,
294
+ "S-credit_debit_card": 52,
295
+ "B-customer_id": 53,
296
+ "I-customer_id": 54,
297
+ "E-customer_id": 55,
298
+ "S-customer_id": 56,
299
+ "B-cvv": 57,
300
+ "I-cvv": 58,
301
+ "E-cvv": 59,
302
+ "S-cvv": 60,
303
+ "B-date": 61,
304
+ "I-date": 62,
305
+ "E-date": 63,
306
+ "S-date": 64,
307
+ "B-date_of_birth": 65,
308
+ "I-date_of_birth": 66,
309
+ "E-date_of_birth": 67,
310
+ "S-date_of_birth": 68,
311
+ "B-date_time": 69,
312
+ "I-date_time": 70,
313
+ "E-date_time": 71,
314
+ "S-date_time": 72,
315
+ "B-device_identifier": 73,
316
+ "I-device_identifier": 74,
317
+ "E-device_identifier": 75,
318
+ "S-device_identifier": 76,
319
+ "B-education_level": 77,
320
+ "I-education_level": 78,
321
+ "E-education_level": 79,
322
+ "S-education_level": 80,
323
+ "B-email": 81,
324
+ "I-email": 82,
325
+ "E-email": 83,
326
+ "S-email": 84,
327
+ "B-employee_id": 85,
328
+ "I-employee_id": 86,
329
+ "E-employee_id": 87,
330
+ "S-employee_id": 88,
331
+ "B-employment_status": 89,
332
+ "I-employment_status": 90,
333
+ "E-employment_status": 91,
334
+ "S-employment_status": 92,
335
+ "B-fax_number": 93,
336
+ "I-fax_number": 94,
337
+ "E-fax_number": 95,
338
+ "S-fax_number": 96,
339
+ "B-first_name": 97,
340
+ "I-first_name": 98,
341
+ "E-first_name": 99,
342
+ "S-first_name": 100,
343
+ "B-gender": 101,
344
+ "I-gender": 102,
345
+ "E-gender": 103,
346
+ "S-gender": 104,
347
+ "B-health_plan_beneficiary_number": 105,
348
+ "I-health_plan_beneficiary_number": 106,
349
+ "E-health_plan_beneficiary_number": 107,
350
+ "S-health_plan_beneficiary_number": 108,
351
+ "B-http_cookie": 109,
352
+ "I-http_cookie": 110,
353
+ "E-http_cookie": 111,
354
+ "S-http_cookie": 112,
355
+ "B-ipv4": 113,
356
+ "I-ipv4": 114,
357
+ "E-ipv4": 115,
358
+ "S-ipv4": 116,
359
+ "B-ipv6": 117,
360
+ "I-ipv6": 118,
361
+ "E-ipv6": 119,
362
+ "S-ipv6": 120,
363
+ "B-language": 121,
364
+ "I-language": 122,
365
+ "E-language": 123,
366
+ "S-language": 124,
367
+ "B-last_name": 125,
368
+ "I-last_name": 126,
369
+ "E-last_name": 127,
370
+ "S-last_name": 128,
371
+ "B-license_plate": 129,
372
+ "I-license_plate": 130,
373
+ "E-license_plate": 131,
374
+ "S-license_plate": 132,
375
+ "B-mac_address": 133,
376
+ "I-mac_address": 134,
377
+ "E-mac_address": 135,
378
+ "S-mac_address": 136,
379
+ "B-medical_record_number": 137,
380
+ "I-medical_record_number": 138,
381
+ "E-medical_record_number": 139,
382
+ "S-medical_record_number": 140,
383
+ "B-national_id": 141,
384
+ "I-national_id": 142,
385
+ "E-national_id": 143,
386
+ "S-national_id": 144,
387
+ "B-occupation": 145,
388
+ "I-occupation": 146,
389
+ "E-occupation": 147,
390
+ "S-occupation": 148,
391
+ "B-password": 149,
392
+ "I-password": 150,
393
+ "E-password": 151,
394
+ "S-password": 152,
395
+ "B-phone_number": 153,
396
+ "I-phone_number": 154,
397
+ "E-phone_number": 155,
398
+ "S-phone_number": 156,
399
+ "B-pin": 157,
400
+ "I-pin": 158,
401
+ "E-pin": 159,
402
+ "S-pin": 160,
403
+ "B-political_view": 161,
404
+ "I-political_view": 162,
405
+ "E-political_view": 163,
406
+ "S-political_view": 164,
407
+ "B-postcode": 165,
408
+ "I-postcode": 166,
409
+ "E-postcode": 167,
410
+ "S-postcode": 168,
411
+ "B-race_ethnicity": 169,
412
+ "I-race_ethnicity": 170,
413
+ "E-race_ethnicity": 171,
414
+ "S-race_ethnicity": 172,
415
+ "B-religious_belief": 173,
416
+ "I-religious_belief": 174,
417
+ "E-religious_belief": 175,
418
+ "S-religious_belief": 176,
419
+ "B-sexuality": 177,
420
+ "I-sexuality": 178,
421
+ "E-sexuality": 179,
422
+ "S-sexuality": 180,
423
+ "B-ssn": 181,
424
+ "I-ssn": 182,
425
+ "E-ssn": 183,
426
+ "S-ssn": 184,
427
+ "B-state": 185,
428
+ "I-state": 186,
429
+ "E-state": 187,
430
+ "S-state": 188,
431
+ "B-street_address": 189,
432
+ "I-street_address": 190,
433
+ "E-street_address": 191,
434
+ "S-street_address": 192,
435
+ "B-swift_bic": 193,
436
+ "I-swift_bic": 194,
437
+ "E-swift_bic": 195,
438
+ "S-swift_bic": 196,
439
+ "B-tax_id": 197,
440
+ "I-tax_id": 198,
441
+ "E-tax_id": 199,
442
+ "S-tax_id": 200,
443
+ "B-time": 201,
444
+ "I-time": 202,
445
+ "E-time": 203,
446
+ "S-time": 204,
447
+ "B-unique_id": 205,
448
+ "I-unique_id": 206,
449
+ "E-unique_id": 207,
450
+ "S-unique_id": 208,
451
+ "B-url": 209,
452
+ "I-url": 210,
453
+ "E-url": 211,
454
+ "S-url": 212,
455
+ "B-user_name": 213,
456
+ "I-user_name": 214,
457
+ "E-user_name": 215,
458
+ "S-user_name": 216,
459
+ "B-vehicle_identifier": 217,
460
+ "I-vehicle_identifier": 218,
461
+ "E-vehicle_identifier": 219,
462
+ "S-vehicle_identifier": 220
463
+ },
464
+ "max_position_embeddings": 131072,
465
+ "model_type": "openai_privacy_filter",
466
+ "num_attention_heads": 14,
467
+ "num_experts_per_tok": 4,
468
+ "num_hidden_layers": 8,
469
+ "num_key_value_heads": 2,
470
+ "num_local_experts": 128,
471
+ "output_router_logits": false,
472
+ "pad_token_id": 199999,
473
+ "rms_norm_eps": 1e-05,
474
+ "rope_parameters": {
475
+ "beta_fast": 32.0,
476
+ "beta_slow": 1.0,
477
+ "factor": 32.0,
478
+ "original_max_position_embeddings": 4096,
479
+ "rope_theta": 150000.0,
480
+ "rope_type": "yarn",
481
+ "truncate": false
482
+ },
483
+ "router_aux_loss_coef": 0.001,
484
+ "sliding_window": 128,
485
+ "tie_word_embeddings": false,
486
+ "transformers_version": "5.6.0.dev0",
487
+ "use_cache": true,
488
+ "vocab_size": 200064,
489
+ "transformers.js_config": {
490
+ "use_external_data_format": {
491
+ "model.onnx": 3,
492
+ "model_fp16.onnx": 2,
493
+ "model": 1
494
+ }
495
+ },
496
+ "num_labels": 221,
497
+ "opf_metadata": {
498
+ "category_version": "nemotron_fine_v1",
499
+ "encoding": "o200k_base",
500
+ "span_class_names": [
501
+ "O",
502
+ "account_number",
503
+ "age",
504
+ "api_key",
505
+ "bank_routing_number",
506
+ "biometric_identifier",
507
+ "blood_type",
508
+ "certificate_license_number",
509
+ "city",
510
+ "company_name",
511
+ "coordinate",
512
+ "country",
513
+ "county",
514
+ "credit_debit_card",
515
+ "customer_id",
516
+ "cvv",
517
+ "date",
518
+ "date_of_birth",
519
+ "date_time",
520
+ "device_identifier",
521
+ "education_level",
522
+ "email",
523
+ "employee_id",
524
+ "employment_status",
525
+ "fax_number",
526
+ "first_name",
527
+ "gender",
528
+ "health_plan_beneficiary_number",
529
+ "http_cookie",
530
+ "ipv4",
531
+ "ipv6",
532
+ "language",
533
+ "last_name",
534
+ "license_plate",
535
+ "mac_address",
536
+ "medical_record_number",
537
+ "national_id",
538
+ "occupation",
539
+ "password",
540
+ "phone_number",
541
+ "pin",
542
+ "political_view",
543
+ "postcode",
544
+ "race_ethnicity",
545
+ "religious_belief",
546
+ "sexuality",
547
+ "ssn",
548
+ "state",
549
+ "street_address",
550
+ "swift_bic",
551
+ "tax_id",
552
+ "time",
553
+ "unique_id",
554
+ "url",
555
+ "user_name",
556
+ "vehicle_identifier"
557
+ ],
558
+ "ner_class_names": [
559
+ "O",
560
+ "B-account_number",
561
+ "I-account_number",
562
+ "E-account_number",
563
+ "S-account_number",
564
+ "B-age",
565
+ "I-age",
566
+ "E-age",
567
+ "S-age",
568
+ "B-api_key",
569
+ "I-api_key",
570
+ "E-api_key",
571
+ "S-api_key",
572
+ "B-bank_routing_number",
573
+ "I-bank_routing_number",
574
+ "E-bank_routing_number",
575
+ "S-bank_routing_number",
576
+ "B-biometric_identifier",
577
+ "I-biometric_identifier",
578
+ "E-biometric_identifier",
579
+ "S-biometric_identifier",
580
+ "B-blood_type",
581
+ "I-blood_type",
582
+ "E-blood_type",
583
+ "S-blood_type",
584
+ "B-certificate_license_number",
585
+ "I-certificate_license_number",
586
+ "E-certificate_license_number",
587
+ "S-certificate_license_number",
588
+ "B-city",
589
+ "I-city",
590
+ "E-city",
591
+ "S-city",
592
+ "B-company_name",
593
+ "I-company_name",
594
+ "E-company_name",
595
+ "S-company_name",
596
+ "B-coordinate",
597
+ "I-coordinate",
598
+ "E-coordinate",
599
+ "S-coordinate",
600
+ "B-country",
601
+ "I-country",
602
+ "E-country",
603
+ "S-country",
604
+ "B-county",
605
+ "I-county",
606
+ "E-county",
607
+ "S-county",
608
+ "B-credit_debit_card",
609
+ "I-credit_debit_card",
610
+ "E-credit_debit_card",
611
+ "S-credit_debit_card",
612
+ "B-customer_id",
613
+ "I-customer_id",
614
+ "E-customer_id",
615
+ "S-customer_id",
616
+ "B-cvv",
617
+ "I-cvv",
618
+ "E-cvv",
619
+ "S-cvv",
620
+ "B-date",
621
+ "I-date",
622
+ "E-date",
623
+ "S-date",
624
+ "B-date_of_birth",
625
+ "I-date_of_birth",
626
+ "E-date_of_birth",
627
+ "S-date_of_birth",
628
+ "B-date_time",
629
+ "I-date_time",
630
+ "E-date_time",
631
+ "S-date_time",
632
+ "B-device_identifier",
633
+ "I-device_identifier",
634
+ "E-device_identifier",
635
+ "S-device_identifier",
636
+ "B-education_level",
637
+ "I-education_level",
638
+ "E-education_level",
639
+ "S-education_level",
640
+ "B-email",
641
+ "I-email",
642
+ "E-email",
643
+ "S-email",
644
+ "B-employee_id",
645
+ "I-employee_id",
646
+ "E-employee_id",
647
+ "S-employee_id",
648
+ "B-employment_status",
649
+ "I-employment_status",
650
+ "E-employment_status",
651
+ "S-employment_status",
652
+ "B-fax_number",
653
+ "I-fax_number",
654
+ "E-fax_number",
655
+ "S-fax_number",
656
+ "B-first_name",
657
+ "I-first_name",
658
+ "E-first_name",
659
+ "S-first_name",
660
+ "B-gender",
661
+ "I-gender",
662
+ "E-gender",
663
+ "S-gender",
664
+ "B-health_plan_beneficiary_number",
665
+ "I-health_plan_beneficiary_number",
666
+ "E-health_plan_beneficiary_number",
667
+ "S-health_plan_beneficiary_number",
668
+ "B-http_cookie",
669
+ "I-http_cookie",
670
+ "E-http_cookie",
671
+ "S-http_cookie",
672
+ "B-ipv4",
673
+ "I-ipv4",
674
+ "E-ipv4",
675
+ "S-ipv4",
676
+ "B-ipv6",
677
+ "I-ipv6",
678
+ "E-ipv6",
679
+ "S-ipv6",
680
+ "B-language",
681
+ "I-language",
682
+ "E-language",
683
+ "S-language",
684
+ "B-last_name",
685
+ "I-last_name",
686
+ "E-last_name",
687
+ "S-last_name",
688
+ "B-license_plate",
689
+ "I-license_plate",
690
+ "E-license_plate",
691
+ "S-license_plate",
692
+ "B-mac_address",
693
+ "I-mac_address",
694
+ "E-mac_address",
695
+ "S-mac_address",
696
+ "B-medical_record_number",
697
+ "I-medical_record_number",
698
+ "E-medical_record_number",
699
+ "S-medical_record_number",
700
+ "B-national_id",
701
+ "I-national_id",
702
+ "E-national_id",
703
+ "S-national_id",
704
+ "B-occupation",
705
+ "I-occupation",
706
+ "E-occupation",
707
+ "S-occupation",
708
+ "B-password",
709
+ "I-password",
710
+ "E-password",
711
+ "S-password",
712
+ "B-phone_number",
713
+ "I-phone_number",
714
+ "E-phone_number",
715
+ "S-phone_number",
716
+ "B-pin",
717
+ "I-pin",
718
+ "E-pin",
719
+ "S-pin",
720
+ "B-political_view",
721
+ "I-political_view",
722
+ "E-political_view",
723
+ "S-political_view",
724
+ "B-postcode",
725
+ "I-postcode",
726
+ "E-postcode",
727
+ "S-postcode",
728
+ "B-race_ethnicity",
729
+ "I-race_ethnicity",
730
+ "E-race_ethnicity",
731
+ "S-race_ethnicity",
732
+ "B-religious_belief",
733
+ "I-religious_belief",
734
+ "E-religious_belief",
735
+ "S-religious_belief",
736
+ "B-sexuality",
737
+ "I-sexuality",
738
+ "E-sexuality",
739
+ "S-sexuality",
740
+ "B-ssn",
741
+ "I-ssn",
742
+ "E-ssn",
743
+ "S-ssn",
744
+ "B-state",
745
+ "I-state",
746
+ "E-state",
747
+ "S-state",
748
+ "B-street_address",
749
+ "I-street_address",
750
+ "E-street_address",
751
+ "S-street_address",
752
+ "B-swift_bic",
753
+ "I-swift_bic",
754
+ "E-swift_bic",
755
+ "S-swift_bic",
756
+ "B-tax_id",
757
+ "I-tax_id",
758
+ "E-tax_id",
759
+ "S-tax_id",
760
+ "B-time",
761
+ "I-time",
762
+ "E-time",
763
+ "S-time",
764
+ "B-unique_id",
765
+ "I-unique_id",
766
+ "E-unique_id",
767
+ "S-unique_id",
768
+ "B-url",
769
+ "I-url",
770
+ "E-url",
771
+ "S-url",
772
+ "B-user_name",
773
+ "I-user_name",
774
+ "E-user_name",
775
+ "S-user_name",
776
+ "B-vehicle_identifier",
777
+ "I-vehicle_identifier",
778
+ "E-vehicle_identifier",
779
+ "S-vehicle_identifier"
780
+ ],
781
+ "inference_contract_version": 1
782
+ },
783
+ "classifier_bias": true,
784
+ "_name_or_path": "OpenMed/privacy-filter-nemotron",
785
+ "_mlx_task": "token-classification",
786
+ "_mlx_family": "openai-privacy-filter",
787
+ "_mlx_model_type": "openai-privacy-filter",
788
+ "_mlx_runtime": {
789
+ "experimental": true,
790
+ "decode": "bioes-viterbi",
791
+ "tokenizer": "tiktoken"
792
+ },
793
+ "num_experts": 128,
794
+ "experts_per_token": 4,
795
+ "_mlx_viterbi_biases": {},
796
+ "rope_theta": 150000.0,
797
+ "rope_scaling_factor": 32.0,
798
+ "rope_ntk_alpha": 1.0,
799
+ "rope_ntk_beta": 32.0,
800
+ "encoding": "o200k_base",
801
+ "bidirectional_context": true,
802
+ "bidirectional_left_context": 64,
803
+ "bidirectional_right_context": 63,
804
+ "hidden_dropout_prob": 0.1,
805
+ "attention_probs_dropout_prob": 0.0,
806
+ "layer_norm_eps": 1e-12,
807
+ "swiglu_limit": 7.0,
808
+ "_mlx_weights_format": "safetensors"
809
+ }
id2label.json ADDED
@@ -0,0 +1,223 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0": "O",
3
+ "1": "B-account_number",
4
+ "2": "I-account_number",
5
+ "3": "E-account_number",
6
+ "4": "S-account_number",
7
+ "5": "B-age",
8
+ "6": "I-age",
9
+ "7": "E-age",
10
+ "8": "S-age",
11
+ "9": "B-api_key",
12
+ "10": "I-api_key",
13
+ "11": "E-api_key",
14
+ "12": "S-api_key",
15
+ "13": "B-bank_routing_number",
16
+ "14": "I-bank_routing_number",
17
+ "15": "E-bank_routing_number",
18
+ "16": "S-bank_routing_number",
19
+ "17": "B-biometric_identifier",
20
+ "18": "I-biometric_identifier",
21
+ "19": "E-biometric_identifier",
22
+ "20": "S-biometric_identifier",
23
+ "21": "B-blood_type",
24
+ "22": "I-blood_type",
25
+ "23": "E-blood_type",
26
+ "24": "S-blood_type",
27
+ "25": "B-certificate_license_number",
28
+ "26": "I-certificate_license_number",
29
+ "27": "E-certificate_license_number",
30
+ "28": "S-certificate_license_number",
31
+ "29": "B-city",
32
+ "30": "I-city",
33
+ "31": "E-city",
34
+ "32": "S-city",
35
+ "33": "B-company_name",
36
+ "34": "I-company_name",
37
+ "35": "E-company_name",
38
+ "36": "S-company_name",
39
+ "37": "B-coordinate",
40
+ "38": "I-coordinate",
41
+ "39": "E-coordinate",
42
+ "40": "S-coordinate",
43
+ "41": "B-country",
44
+ "42": "I-country",
45
+ "43": "E-country",
46
+ "44": "S-country",
47
+ "45": "B-county",
48
+ "46": "I-county",
49
+ "47": "E-county",
50
+ "48": "S-county",
51
+ "49": "B-credit_debit_card",
52
+ "50": "I-credit_debit_card",
53
+ "51": "E-credit_debit_card",
54
+ "52": "S-credit_debit_card",
55
+ "53": "B-customer_id",
56
+ "54": "I-customer_id",
57
+ "55": "E-customer_id",
58
+ "56": "S-customer_id",
59
+ "57": "B-cvv",
60
+ "58": "I-cvv",
61
+ "59": "E-cvv",
62
+ "60": "S-cvv",
63
+ "61": "B-date",
64
+ "62": "I-date",
65
+ "63": "E-date",
66
+ "64": "S-date",
67
+ "65": "B-date_of_birth",
68
+ "66": "I-date_of_birth",
69
+ "67": "E-date_of_birth",
70
+ "68": "S-date_of_birth",
71
+ "69": "B-date_time",
72
+ "70": "I-date_time",
73
+ "71": "E-date_time",
74
+ "72": "S-date_time",
75
+ "73": "B-device_identifier",
76
+ "74": "I-device_identifier",
77
+ "75": "E-device_identifier",
78
+ "76": "S-device_identifier",
79
+ "77": "B-education_level",
80
+ "78": "I-education_level",
81
+ "79": "E-education_level",
82
+ "80": "S-education_level",
83
+ "81": "B-email",
84
+ "82": "I-email",
85
+ "83": "E-email",
86
+ "84": "S-email",
87
+ "85": "B-employee_id",
88
+ "86": "I-employee_id",
89
+ "87": "E-employee_id",
90
+ "88": "S-employee_id",
91
+ "89": "B-employment_status",
92
+ "90": "I-employment_status",
93
+ "91": "E-employment_status",
94
+ "92": "S-employment_status",
95
+ "93": "B-fax_number",
96
+ "94": "I-fax_number",
97
+ "95": "E-fax_number",
98
+ "96": "S-fax_number",
99
+ "97": "B-first_name",
100
+ "98": "I-first_name",
101
+ "99": "E-first_name",
102
+ "100": "S-first_name",
103
+ "101": "B-gender",
104
+ "102": "I-gender",
105
+ "103": "E-gender",
106
+ "104": "S-gender",
107
+ "105": "B-health_plan_beneficiary_number",
108
+ "106": "I-health_plan_beneficiary_number",
109
+ "107": "E-health_plan_beneficiary_number",
110
+ "108": "S-health_plan_beneficiary_number",
111
+ "109": "B-http_cookie",
112
+ "110": "I-http_cookie",
113
+ "111": "E-http_cookie",
114
+ "112": "S-http_cookie",
115
+ "113": "B-ipv4",
116
+ "114": "I-ipv4",
117
+ "115": "E-ipv4",
118
+ "116": "S-ipv4",
119
+ "117": "B-ipv6",
120
+ "118": "I-ipv6",
121
+ "119": "E-ipv6",
122
+ "120": "S-ipv6",
123
+ "121": "B-language",
124
+ "122": "I-language",
125
+ "123": "E-language",
126
+ "124": "S-language",
127
+ "125": "B-last_name",
128
+ "126": "I-last_name",
129
+ "127": "E-last_name",
130
+ "128": "S-last_name",
131
+ "129": "B-license_plate",
132
+ "130": "I-license_plate",
133
+ "131": "E-license_plate",
134
+ "132": "S-license_plate",
135
+ "133": "B-mac_address",
136
+ "134": "I-mac_address",
137
+ "135": "E-mac_address",
138
+ "136": "S-mac_address",
139
+ "137": "B-medical_record_number",
140
+ "138": "I-medical_record_number",
141
+ "139": "E-medical_record_number",
142
+ "140": "S-medical_record_number",
143
+ "141": "B-national_id",
144
+ "142": "I-national_id",
145
+ "143": "E-national_id",
146
+ "144": "S-national_id",
147
+ "145": "B-occupation",
148
+ "146": "I-occupation",
149
+ "147": "E-occupation",
150
+ "148": "S-occupation",
151
+ "149": "B-password",
152
+ "150": "I-password",
153
+ "151": "E-password",
154
+ "152": "S-password",
155
+ "153": "B-phone_number",
156
+ "154": "I-phone_number",
157
+ "155": "E-phone_number",
158
+ "156": "S-phone_number",
159
+ "157": "B-pin",
160
+ "158": "I-pin",
161
+ "159": "E-pin",
162
+ "160": "S-pin",
163
+ "161": "B-political_view",
164
+ "162": "I-political_view",
165
+ "163": "E-political_view",
166
+ "164": "S-political_view",
167
+ "165": "B-postcode",
168
+ "166": "I-postcode",
169
+ "167": "E-postcode",
170
+ "168": "S-postcode",
171
+ "169": "B-race_ethnicity",
172
+ "170": "I-race_ethnicity",
173
+ "171": "E-race_ethnicity",
174
+ "172": "S-race_ethnicity",
175
+ "173": "B-religious_belief",
176
+ "174": "I-religious_belief",
177
+ "175": "E-religious_belief",
178
+ "176": "S-religious_belief",
179
+ "177": "B-sexuality",
180
+ "178": "I-sexuality",
181
+ "179": "E-sexuality",
182
+ "180": "S-sexuality",
183
+ "181": "B-ssn",
184
+ "182": "I-ssn",
185
+ "183": "E-ssn",
186
+ "184": "S-ssn",
187
+ "185": "B-state",
188
+ "186": "I-state",
189
+ "187": "E-state",
190
+ "188": "S-state",
191
+ "189": "B-street_address",
192
+ "190": "I-street_address",
193
+ "191": "E-street_address",
194
+ "192": "S-street_address",
195
+ "193": "B-swift_bic",
196
+ "194": "I-swift_bic",
197
+ "195": "E-swift_bic",
198
+ "196": "S-swift_bic",
199
+ "197": "B-tax_id",
200
+ "198": "I-tax_id",
201
+ "199": "E-tax_id",
202
+ "200": "S-tax_id",
203
+ "201": "B-time",
204
+ "202": "I-time",
205
+ "203": "E-time",
206
+ "204": "S-time",
207
+ "205": "B-unique_id",
208
+ "206": "I-unique_id",
209
+ "207": "E-unique_id",
210
+ "208": "S-unique_id",
211
+ "209": "B-url",
212
+ "210": "I-url",
213
+ "211": "E-url",
214
+ "212": "S-url",
215
+ "213": "B-user_name",
216
+ "214": "I-user_name",
217
+ "215": "E-user_name",
218
+ "216": "S-user_name",
219
+ "217": "B-vehicle_identifier",
220
+ "218": "I-vehicle_identifier",
221
+ "219": "E-vehicle_identifier",
222
+ "220": "S-vehicle_identifier"
223
+ }
openmed-mlx.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "format": "openmed-mlx",
3
+ "format_version": 2,
4
+ "task": "token-classification",
5
+ "family": "openai-privacy-filter",
6
+ "source_model_id": "OpenMed/privacy-filter-nemotron",
7
+ "config_path": "config.json",
8
+ "label_map_path": "id2label.json",
9
+ "preferred_weights": "weights.safetensors",
10
+ "fallback_weights": [
11
+ "weights.npz"
12
+ ],
13
+ "available_weights": [
14
+ "weights.safetensors"
15
+ ],
16
+ "weights_format": "safetensors",
17
+ "quantization": null,
18
+ "max_sequence_length": 131072,
19
+ "tokenizer": {
20
+ "path": ".",
21
+ "files": [
22
+ "tokenizer.json",
23
+ "tokenizer_config.json"
24
+ ]
25
+ },
26
+ "runtime": {
27
+ "experimental": true,
28
+ "decode": "bioes-viterbi",
29
+ "tokenizer": "tiktoken"
30
+ }
31
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0614fe83cadab421296e664e1f48f4261fa8fef6e03e63bb75c20f38e37d07d3
3
+ size 27868174
tokenizer_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backend": "tokenizers",
3
+ "eos_token": "<|endoftext|>",
4
+ "is_local": false,
5
+ "model_input_names": [
6
+ "input_ids",
7
+ "attention_mask"
8
+ ],
9
+ "model_max_length": 128000,
10
+ "pad_token": "<|endoftext|>",
11
+ "tokenizer_class": "TokenizersBackend"
12
+ }
weights.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d081f4849fd0d3d288b4ee21459960c9348bc9f2ae0bec7a997f61723a8fb30a
3
+ size 2799225281