dariofinardi commited on
Commit
a255827
·
verified ·
1 Parent(s): e1383b2

Initial commit — ONNX 8-fragment export (FP32 + FP16 + FP16_IOBinding) of fastino/gliner2-privacy-filter-PII-multi

Browse files
.gitattributes CHANGED
@@ -14,6 +14,7 @@
14
  *.npy filter=lfs diff=lfs merge=lfs -text
15
  *.npz filter=lfs diff=lfs merge=lfs -text
16
  *.onnx filter=lfs diff=lfs merge=lfs -text
 
17
  *.ot filter=lfs diff=lfs merge=lfs -text
18
  *.parquet filter=lfs diff=lfs merge=lfs -text
19
  *.pb filter=lfs diff=lfs merge=lfs -text
@@ -33,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
14
  *.npy filter=lfs diff=lfs merge=lfs -text
15
  *.npz filter=lfs diff=lfs merge=lfs -text
16
  *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.onnx.data filter=lfs diff=lfs merge=lfs -text
18
  *.ot filter=lfs diff=lfs merge=lfs -text
19
  *.parquet filter=lfs diff=lfs merge=lfs -text
20
  *.pb filter=lfs diff=lfs merge=lfs -text
 
34
  *.zip filter=lfs diff=lfs merge=lfs -text
35
  *.zst filter=lfs diff=lfs merge=lfs -text
36
  *tfevents* filter=lfs diff=lfs merge=lfs -text
37
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,193 @@
1
  ---
 
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: gliner2
3
  license: apache-2.0
4
+ base_model: fastino/gliner2-privacy-filter-PII-multi
5
+ pipeline_tag: token-classification
6
+ tags:
7
+ - token-classification
8
+ - gliner2
9
+ - gliner
10
+ - onnx
11
+ - rust
12
+ - pii
13
+ - ner
14
+ - privacy
15
+ - redaction
16
+ - information-extraction
17
+ - span-extraction
18
+ - iobinding
19
+ language:
20
+ - en
21
+ - fr
22
+ - es
23
+ - de
24
+ - it
25
+ - pt
26
+ - nl
27
  ---
28
+
29
+ # GLiNER2 Privacy-Filter PII Multi (ONNX Fragmented & IOBinding)
30
+
31
+ This repository contains the **ONNX-exported weights** of [`fastino/gliner2-privacy-filter-PII-multi`](https://huggingface.co/fastino/gliner2-privacy-filter-PII-multi),
32
+ the multilingual **PII detection model** built on GLiNER2 by Fastino AI.
33
+
34
+ The model is exported in a **fragmented format** (encoder, token_gather, span_rep, schema_gather, count_pred_argmax, count_lstm_fixed, scorer, classifier) for direct compatibility with [gliner2-rs](https://github.com/SemplificaAI/gliner2-rs), the official **Zero-Python Native Rust inference engine** for GLiNER2.
35
+
36
+ It supports detection of **42 PII entity types** across **7 languages** (EN, FR, ES, DE, IT, PT, NL).
37
+
38
+ ---
39
+
40
+ ## 🆕 V2 Zero-Copy IOBinding Models
41
+
42
+ Like the [`gliner2-multi-v1-onnx`](https://huggingface.co/SemplificaAI/gliner2-multi-v1-onnx) base release, this repo ships the **V2 fused IOBinding** variant. `Gather`, `ArgMax`, `MatMul` operations are fused directly into the ONNX graphs so that tensors **never leave the GPU/NPU VRAM**, bypassing the PCIe bus and cutting inference latency by ~30 % on discrete GPUs.
43
+
44
+ ## 📂 Available Variants
45
+
46
+ | Variant | Use case | Notes |
47
+ |---|---|---|
48
+ | **`fp16_v2`** *(recommended)* | NVIDIA CUDA · AMD ROCm · Apple CoreML · Qualcomm QNN | Zero-Copy VRAM (IOBinding), full FP16 IO, fused ops |
49
+ | **`fp32_v2`** | CPU (AVX2 / XNNPACK / ARM NEON) | High precision V2 fusions for CPU |
50
+ | **`fp16`** *(standard)* | Legacy compatible, all EPs | FP32 IO (CoreML-compatible), slower on CUDA due to PCIe round-trips |
51
+ | **`fp32`** *(standard)* | Universal fallback | Legacy Float32 |
52
+
53
+ Each variant ships 8 fragments:
54
+
55
+ ```
56
+ encoder_{precision}.onnx ~530–1060 MB
57
+ token_gather_{precision}.onnx ~ <1 MB
58
+ span_rep_{precision}.onnx ~32–63 MB
59
+ schema_gather_{precision}.onnx ~ <1 MB
60
+ count_pred_argmax_{precision}.onnx ~2–5 MB
61
+ count_lstm_fixed_{precision}.onnx ~20–41 MB
62
+ scorer_{precision}.onnx ~ <1 MB
63
+ classifier_{precision}.onnx ~2–5 MB
64
+ ```
65
+
66
+ Total: **~590 MB (FP16)** or **~1.17 GB (FP32)** per variant.
67
+
68
+ ---
69
+
70
+ ## 🎯 Supported PII Labels (42 types)
71
+
72
+ ### Person / Names (6 labels)
73
+ `person`, `full_name`, `first_name`, `middle_name`, `last_name`, `date_of_birth`
74
+
75
+ ### Contact / Address (8 labels)
76
+ `email`, `phone_number`, `address`, `street_address`, `city`, `state_or_region`, `postal_code`, `country`
77
+
78
+ ### Government / Tax IDs (7 labels)
79
+ `government_id`, `national_id_number`, `passport_number`, `drivers_license_number`, `license_number`, `tax_id`, `tax_number`
80
+
81
+ ### Banking / Payment (8 labels)
82
+ `bank_account`, `account_number`, `routing_number`, `iban`, `payment_card`, `card_number`, `card_expiry`, `card_cvv`
83
+
84
+ ### Digital Identity (4 labels)
85
+ `username`, `ip_address`, `account_id`, `sensitive_account_id`
86
+
87
+ ### Secrets / Credentials (5 labels)
88
+ `password`, `secret`, `api_key`, `access_token`, `recovery_code`
89
+
90
+ ### Sensitive Dates (4 labels)
91
+ `sensitive_date`, `document_date`, `expiration_date`, `transaction_date`
92
+
93
+ ---
94
+
95
+ ## 🚀 Usage in Rust (`gliner2-rs`)
96
+
97
+ ```rust
98
+ use gliner2_inference::{Gliner2Engine, ModelType, SchemaTask};
99
+
100
+ // Auto-downloads the V2 FP16 fragments from this HuggingFace repo
101
+ // and switches to the high-performance IOBinding engine.
102
+ let engine = Gliner2Engine::from_pretrained(
103
+ "SemplificaAI/gliner2-privacy-filter-PII-multi",
104
+ Some("fp16_v2"),
105
+ ModelType::HuggingFace,
106
+ )?;
107
+
108
+ let text = "Please contact Maria Jensen at maria.jensen@example.dk or +45 20 12 34 56.";
109
+ let tasks = vec![
110
+ SchemaTask::Entities(vec![
111
+ "person".into(), "email".into(), "phone_number".into(),
112
+ ])
113
+ ];
114
+
115
+ let (entities, _, _) = engine.extract(text, &tasks)?;
116
+ ```
117
+
118
+ Requires **`gliner2-rs >= 0.4.1`** for automatic V2 detection / IOBinding routing.
119
+
120
+ ## 🐍 Usage in Python (`onnxruntime`)
121
+
122
+ Run the 8-fragment pipeline manually (no Python `gliner2` dependency needed):
123
+
124
+ ```python
125
+ import onnxruntime as ort
126
+
127
+ # Per fragment (example for the encoder, CUDA backend)
128
+ encoder = ort.InferenceSession(
129
+ "encoder_fp16_iobinding.onnx",
130
+ providers=["CUDAExecutionProvider"],
131
+ )
132
+ # ...load the other 7 fragments analogously...
133
+
134
+ # Chain them via IOBinding (see validate_onnx_v2.py for a full reference impl)
135
+ ```
136
+
137
+ For a simpler entry point you can keep using the original PyTorch model via the `gliner2` Python package on `fastino/gliner2-privacy-filter-PII-multi`; this ONNX repo is optimised for **production deployment without Python**.
138
+
139
+ ---
140
+
141
+ ## 🛠 Pipeline Wiring (IOBinding chain)
142
+
143
+ ```
144
+ encoder_fp16_iobinding.onnx
145
+
146
+ ├─ token_gather_fp16_iobinding.onnx
147
+ │ └─ span_rep_fp16_iobinding.onnx
148
+
149
+ └─ schema_gather_fp16_iobinding.onnx
150
+ ├─ count_pred_argmax_fp16_iobinding.onnx → pred_count (int64)
151
+ └─ count_lstm_fixed_fp16_iobinding.onnx
152
+ └─ scorer_fp16_iobinding.onnx → entity_scores
153
+
154
+ classifier_fp16_iobinding.onnx (only for classification tasks)
155
+ ```
156
+
157
+ ---
158
+
159
+ ## ⚙️ Technical Notes
160
+
161
+ - **opset 17** (ONNX 1.14+) for maximum execution-provider compatibility.
162
+ - `count_lstm_fixed` exports the GRU **unrolled to 20 fixed steps** at tracing time → compatible with execution providers that don't support dynamic loops (Apple CoreML, Qualcomm QNN).
163
+ - `scorer` uses **fused Reshape + MatMul + Transpose** instead of `Einsum` for compatibility with QNN/CoreML FP16.
164
+ - **INT8 not supported**: the DeBERTa-v3 disentangled-attention activations contain extreme outliers that saturate 8-bit ranges (the same limitation called out by the GLiNER2 maintainers). FP16 remains the optimal compression target.
165
+ - **Encoder size**: ~1.06 GB FP32 → ~530 MB FP16. Larger than the multi-v1 base because of the wider classification head (42 PII labels) and per-language fine-tuning.
166
+
167
+ ## 🪪 License
168
+
169
+ Apache 2.0 — same as the upstream model.
170
+
171
+ ## 🙏 Acknowledgements
172
+
173
+ - Upstream model: [`fastino/gliner2-privacy-filter-PII-multi`](https://huggingface.co/fastino/gliner2-privacy-filter-PII-multi) by Fastino AI.
174
+ - GLiNER2 paper: Zaratiana et al., *GLiNER2: Schema-Driven Multi-Task Learning for Structured Information Extraction*, EMNLP 2025.
175
+ - ONNX fragmentation + IOBinding strategy: Semplifica s.r.l., as used in [`gliner2-multi-v1-onnx`](https://huggingface.co/SemplificaAI/gliner2-multi-v1-onnx).
176
+
177
+ ## 📚 Citation
178
+
179
+ ```bibtex
180
+ @misc{fastino2026gliner2pii,
181
+ title = {GLiNER2-PII: Multilingual PII Extraction via Synthetic Fine-Tuning},
182
+ author = {{Fastino AI Team}},
183
+ year = {2026},
184
+ url = {https://huggingface.co/fastino/gliner2-privacy-filter-PII-multi}
185
+ }
186
+
187
+ @inproceedings{zaratiana-etal-2025-gliner2,
188
+ title = {GLiNER2: Schema-Driven Multi-Task Learning for Structured Information Extraction},
189
+ author = {Zaratiana, Urchade and Pasternak, Gil and Boyd, Oliver and Hurn-Maloney, George and Lewis, Ash},
190
+ booktitle = {Proceedings of EMNLP 2025: System Demonstrations},
191
+ year = {2025}
192
+ }
193
+ ```
classifier_fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:05496dbc57d06ae19a41ee740419880baf00c704ce78ced483732703ce336a6c
3
+ size 2366948
classifier_fp16_iobinding.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c3aad6fa0380622969453a0577a1f93f3df27f401500268407066a8cb2569545
3
+ size 2366665
classifier_fp32.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a59406b58c7bb448174c8df32b46d82e1b77b64bcb497da5b481836600ff3ad9
3
+ size 4731777
count_lstm_fixed_fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:83d4f991fbdb65226ef1dfdf6095f08b960547c3a1a6c1e3d45cd539ff92a1b3
3
+ size 21288145
count_lstm_fixed_fp16_iobinding.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3ac960cd67c202214138419b2fcd5ff8c5f7d59b2f7fcd09dfe0285d91f7e338
3
+ size 21287867
count_lstm_fixed_fp32.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:17710c32f639f28f3193ea286226b92757dbfa4a67d86432cf252d479ed86f7d
3
+ size 42567350
count_pred_argmax_fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a353e6b967a09f78d520e28c237b52ccfda5113de6e4345158d1e5e5cb963c9b
3
+ size 2425035
count_pred_argmax_fp16_iobinding.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:10a09754c9843949eb47d284399fce1689cad3b2a3b4e01e84ad38b6b01f23ef
3
+ size 2424913
count_pred_argmax_fp32.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c67fe0b5b1d1d6b1832a82ccb88153e571033e39f7d46434e661e7befd38294c
3
+ size 4848570
encoder_fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fd75d6cd79523848e788d473a806de918f7fcc9b20b26ef3091238234eedee6c
3
+ size 556324340
encoder_fp16_iobinding.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1225912da466852b8731443e7830851bd2fd0be45d5da67a09a54c2db2cb7502
3
+ size 556324203
encoder_fp32.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7cd07f04235f5c9e879f69e361088be3b07bb6930bf05340050088f7994f638e
3
+ size 1111055954
schema_gather_fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:255437e8b4a2af91951421309208fc188859b74062b88359ea565bdb0309ccdf
3
+ size 2145
schema_gather_fp16_iobinding.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1733420ac986cf5b1bc2ef0891f0011d461920504dfd8da06ebf3493944fa7d7
3
+ size 1743
schema_gather_fp32.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:24ceb4ae477e36a4291f8a29636e56cfe435d857c71b9248245d4eed62d67ab7
3
+ size 1334
scorer_fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ebbdf71ab00f6382f1effe04c586207ab85ac0b941688063005c384fa2e0458e
3
+ size 5876
scorer_fp16_iobinding.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5d725406cdfaab03cdb1a7f6271180d456807096e6f5380ad32c97c69748693e
3
+ size 5445
scorer_fp32.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8fd9305a1d393d0eb9551c2dabaf7707aeea80910d246588b6ff8576bcab8cce
3
+ size 3831
span_rep_fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:df0d2e318daf16e0b85c1c5ed0b223d529f26a09865aea29e805e0191b8041c2
3
+ size 33074792
span_rep_fp16_iobinding.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f06716a2fb0f82fe591fea051353024ea4fc48467a8c6f234378b7a534c3b715
3
+ size 33074494
span_rep_fp32.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a526d00a3ef794223b8683f2ede9b15bda6a5190485539ecec661ef78194d4a1
3
+ size 66121780
token_gather_fp16.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9f69059ca4db687f07d6a11d1c8f3c59780aff374978aec6fc51798beb13313a
3
+ size 524
token_gather_fp16_iobinding.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7b9bd8ed770f2312ed5762b1209034e3c438fdea395eea7ecd9234a4ac72bd5d
3
+ size 252
token_gather_fp32.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:739b66e91492ba6cdcf779497bdc9c322eaa141a9a1e2200c1db1c02ef63133b
3
+ size 252
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f6df10ec83bea993035b2dd7c39345a3d4fcf23421c2adb6cb4ffc1e6d1bc4b5
3
+ size 16020604