dariofinardi commited on 4 days ago

Commit

a255827

verified ·

1 Parent(s): e1383b2

Initial commit — ONNX 8-fragment export (FP32 + FP16 + FP16_IOBinding) of fastino/gliner2-privacy-filter-PII-multi

Browse files

Files changed (27) hide show

.gitattributes +2 -0
README.md +190 -0
classifier_fp16.onnx +3 -0
classifier_fp16_iobinding.onnx +3 -0
classifier_fp32.onnx +3 -0
count_lstm_fixed_fp16.onnx +3 -0
count_lstm_fixed_fp16_iobinding.onnx +3 -0
count_lstm_fixed_fp32.onnx +3 -0
count_pred_argmax_fp16.onnx +3 -0
count_pred_argmax_fp16_iobinding.onnx +3 -0
count_pred_argmax_fp32.onnx +3 -0
encoder_fp16.onnx +3 -0
encoder_fp16_iobinding.onnx +3 -0
encoder_fp32.onnx +3 -0
schema_gather_fp16.onnx +3 -0
schema_gather_fp16_iobinding.onnx +3 -0
schema_gather_fp32.onnx +3 -0
scorer_fp16.onnx +3 -0
scorer_fp16_iobinding.onnx +3 -0
scorer_fp32.onnx +3 -0
span_rep_fp16.onnx +3 -0
span_rep_fp16_iobinding.onnx +3 -0
span_rep_fp32.onnx +3 -0
token_gather_fp16.onnx +3 -0
token_gather_fp16_iobinding.onnx +3 -0
token_gather_fp32.onnx +3 -0
tokenizer.json +3 -0

.gitattributes CHANGED Viewed

@@ -14,6 +14,7 @@
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
@@ -33,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
+*.onnx.data filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,193 @@
 ---
 license: apache-2.0
 ---

 ---
+library_name: gliner2
 license: apache-2.0
+base_model: fastino/gliner2-privacy-filter-PII-multi
+pipeline_tag: token-classification
+tags:
+  - token-classification
+  - gliner2
+  - gliner
+  - onnx
+  - rust
+  - pii
+  - ner
+  - privacy
+  - redaction
+  - information-extraction
+  - span-extraction
+  - iobinding
+language:
+  - en
+  - fr
+  - es
+  - de
+  - it
+  - pt
+  - nl
 ---
+# GLiNER2 Privacy-Filter PII Multi (ONNX Fragmented & IOBinding)
+This repository contains the **ONNX-exported weights** of [`fastino/gliner2-privacy-filter-PII-multi`](https://huggingface.co/fastino/gliner2-privacy-filter-PII-multi),
+the multilingual **PII detection model** built on GLiNER2 by Fastino AI.
+The model is exported in a **fragmented format** (encoder, token_gather, span_rep, schema_gather, count_pred_argmax, count_lstm_fixed, scorer, classifier) for direct compatibility with [gliner2-rs](https://github.com/SemplificaAI/gliner2-rs), the official **Zero-Python Native Rust inference engine** for GLiNER2.
+It supports detection of **42 PII entity types** across **7 languages** (EN, FR, ES, DE, IT, PT, NL).
+---
+## 🆕 V2 Zero-Copy IOBinding Models
+Like the [`gliner2-multi-v1-onnx`](https://huggingface.co/SemplificaAI/gliner2-multi-v1-onnx) base release, this repo ships the **V2 fused IOBinding** variant. `Gather`, `ArgMax`, `MatMul` operations are fused directly into the ONNX graphs so that tensors **never leave the GPU/NPU VRAM**, bypassing the PCIe bus and cutting inference latency by ~30 % on discrete GPUs.
+## 📂 Available Variants
+| Variant | Use case | Notes |
+|---|---|---|
+| **`fp16_v2`** *(recommended)* | NVIDIA CUDA · AMD ROCm · Apple CoreML · Qualcomm QNN | Zero-Copy VRAM (IOBinding), full FP16 IO, fused ops |
+| **`fp32_v2`** | CPU (AVX2 / XNNPACK / ARM NEON) | High precision V2 fusions for CPU |
+| **`fp16`** *(standard)* | Legacy compatible, all EPs | FP32 IO (CoreML-compatible), slower on CUDA due to PCIe round-trips |
+| **`fp32`** *(standard)* | Universal fallback | Legacy Float32 |
+Each variant ships 8 fragments:
+```
+encoder_{precision}.onnx          ~530–1060 MB
+token_gather_{precision}.onnx     ~ <1 MB
+span_rep_{precision}.onnx         ~32–63 MB
+schema_gather_{precision}.onnx    ~ <1 MB
+count_pred_argmax_{precision}.onnx ~2–5 MB
+count_lstm_fixed_{precision}.onnx ~20–41 MB
+scorer_{precision}.onnx           ~ <1 MB
+classifier_{precision}.onnx       ~2–5 MB
+```
+Total: **~590 MB (FP16)** or **~1.17 GB (FP32)** per variant.
+---
+## 🎯 Supported PII Labels (42 types)
+### Person / Names (6 labels)
+`person`, `full_name`, `first_name`, `middle_name`, `last_name`, `date_of_birth`
+### Contact / Address (8 labels)
+`email`, `phone_number`, `address`, `street_address`, `city`, `state_or_region`, `postal_code`, `country`
+### Government / Tax IDs (7 labels)
+`government_id`, `national_id_number`, `passport_number`, `drivers_license_number`, `license_number`, `tax_id`, `tax_number`
+### Banking / Payment (8 labels)
+`bank_account`, `account_number`, `routing_number`, `iban`, `payment_card`, `card_number`, `card_expiry`, `card_cvv`
+### Digital Identity (4 labels)
+`username`, `ip_address`, `account_id`, `sensitive_account_id`
+### Secrets / Credentials (5 labels)
+`password`, `secret`, `api_key`, `access_token`, `recovery_code`
+### Sensitive Dates (4 labels)
+`sensitive_date`, `document_date`, `expiration_date`, `transaction_date`
+---
+## 🚀 Usage in Rust (`gliner2-rs`)
+```rust
+use gliner2_inference::{Gliner2Engine, ModelType, SchemaTask};
+// Auto-downloads the V2 FP16 fragments from this HuggingFace repo
+// and switches to the high-performance IOBinding engine.
+let engine = Gliner2Engine::from_pretrained(
+    "SemplificaAI/gliner2-privacy-filter-PII-multi",
+    Some("fp16_v2"),
+    ModelType::HuggingFace,
+)?;
+let text = "Please contact Maria Jensen at maria.jensen@example.dk or +45 20 12 34 56.";
+let tasks = vec![
+    SchemaTask::Entities(vec![
+        "person".into(), "email".into(), "phone_number".into(),
+    ])
+];
+let (entities, _, _) = engine.extract(text, &tasks)?;
+```
+Requires **`gliner2-rs >= 0.4.1`** for automatic V2 detection / IOBinding routing.
+## 🐍 Usage in Python (`onnxruntime`)
+Run the 8-fragment pipeline manually (no Python `gliner2` dependency needed):
+```python
+import onnxruntime as ort
+# Per fragment (example for the encoder, CUDA backend)
+encoder = ort.InferenceSession(
+    "encoder_fp16_iobinding.onnx",
+    providers=["CUDAExecutionProvider"],
+)
+# ...load the other 7 fragments analogously...
+# Chain them via IOBinding (see validate_onnx_v2.py for a full reference impl)
+```
+For a simpler entry point you can keep using the original PyTorch model via the `gliner2` Python package on `fastino/gliner2-privacy-filter-PII-multi`; this ONNX repo is optimised for **production deployment without Python**.
+---
+## 🛠 Pipeline Wiring (IOBinding chain)
+```
+encoder_fp16_iobinding.onnx
+    │
+    ├─ token_gather_fp16_iobinding.onnx
+    │       └─ span_rep_fp16_iobinding.onnx
+    │
+    └─ schema_gather_fp16_iobinding.onnx
+            ├─ count_pred_argmax_fp16_iobinding.onnx  →  pred_count (int64)
+            └─ count_lstm_fixed_fp16_iobinding.onnx
+                    └─ scorer_fp16_iobinding.onnx     →  entity_scores
+classifier_fp16_iobinding.onnx (only for classification tasks)
+```
+---
+## ⚙️ Technical Notes
+- **opset 17** (ONNX 1.14+) for maximum execution-provider compatibility.
+- `count_lstm_fixed` exports the GRU **unrolled to 20 fixed steps** at tracing time → compatible with execution providers that don't support dynamic loops (Apple CoreML, Qualcomm QNN).
+- `scorer` uses **fused Reshape + MatMul + Transpose** instead of `Einsum` for compatibility with QNN/CoreML FP16.
+- **INT8 not supported**: the DeBERTa-v3 disentangled-attention activations contain extreme outliers that saturate 8-bit ranges (the same limitation called out by the GLiNER2 maintainers). FP16 remains the optimal compression target.
+- **Encoder size**: ~1.06 GB FP32 → ~530 MB FP16. Larger than the multi-v1 base because of the wider classification head (42 PII labels) and per-language fine-tuning.
+## 🪪 License
+Apache 2.0 — same as the upstream model.
+## 🙏 Acknowledgements
+- Upstream model: [`fastino/gliner2-privacy-filter-PII-multi`](https://huggingface.co/fastino/gliner2-privacy-filter-PII-multi) by Fastino AI.
+- GLiNER2 paper: Zaratiana et al., *GLiNER2: Schema-Driven Multi-Task Learning for Structured Information Extraction*, EMNLP 2025.
+- ONNX fragmentation + IOBinding strategy: Semplifica s.r.l., as used in [`gliner2-multi-v1-onnx`](https://huggingface.co/SemplificaAI/gliner2-multi-v1-onnx).
+## 📚 Citation
+```bibtex
+@misc{fastino2026gliner2pii,
+  title   = {GLiNER2-PII: Multilingual PII Extraction via Synthetic Fine-Tuning},
+  author  = {{Fastino AI Team}},
+  year    = {2026},
+  url     = {https://huggingface.co/fastino/gliner2-privacy-filter-PII-multi}
+}
+@inproceedings{zaratiana-etal-2025-gliner2,
+  title     = {GLiNER2: Schema-Driven Multi-Task Learning for Structured Information Extraction},
+  author    = {Zaratiana, Urchade and Pasternak, Gil and Boyd, Oliver and Hurn-Maloney, George and Lewis, Ash},
+  booktitle = {Proceedings of EMNLP 2025: System Demonstrations},
+  year      = {2025}
+}
+```

classifier_fp16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:05496dbc57d06ae19a41ee740419880baf00c704ce78ced483732703ce336a6c
+size 2366948

classifier_fp16_iobinding.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c3aad6fa0380622969453a0577a1f93f3df27f401500268407066a8cb2569545
+size 2366665

classifier_fp32.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a59406b58c7bb448174c8df32b46d82e1b77b64bcb497da5b481836600ff3ad9
+size 4731777

count_lstm_fixed_fp16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:83d4f991fbdb65226ef1dfdf6095f08b960547c3a1a6c1e3d45cd539ff92a1b3
+size 21288145

count_lstm_fixed_fp16_iobinding.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3ac960cd67c202214138419b2fcd5ff8c5f7d59b2f7fcd09dfe0285d91f7e338
+size 21287867

count_lstm_fixed_fp32.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:17710c32f639f28f3193ea286226b92757dbfa4a67d86432cf252d479ed86f7d
+size 42567350

count_pred_argmax_fp16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a353e6b967a09f78d520e28c237b52ccfda5113de6e4345158d1e5e5cb963c9b
+size 2425035

count_pred_argmax_fp16_iobinding.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:10a09754c9843949eb47d284399fce1689cad3b2a3b4e01e84ad38b6b01f23ef
+size 2424913

count_pred_argmax_fp32.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c67fe0b5b1d1d6b1832a82ccb88153e571033e39f7d46434e661e7befd38294c
+size 4848570

encoder_fp16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fd75d6cd79523848e788d473a806de918f7fcc9b20b26ef3091238234eedee6c
+size 556324340

encoder_fp16_iobinding.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1225912da466852b8731443e7830851bd2fd0be45d5da67a09a54c2db2cb7502
+size 556324203

encoder_fp32.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7cd07f04235f5c9e879f69e361088be3b07bb6930bf05340050088f7994f638e
+size 1111055954

schema_gather_fp16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:255437e8b4a2af91951421309208fc188859b74062b88359ea565bdb0309ccdf
+size 2145

schema_gather_fp16_iobinding.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1733420ac986cf5b1bc2ef0891f0011d461920504dfd8da06ebf3493944fa7d7
+size 1743

schema_gather_fp32.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:24ceb4ae477e36a4291f8a29636e56cfe435d857c71b9248245d4eed62d67ab7
+size 1334

scorer_fp16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ebbdf71ab00f6382f1effe04c586207ab85ac0b941688063005c384fa2e0458e
+size 5876

scorer_fp16_iobinding.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5d725406cdfaab03cdb1a7f6271180d456807096e6f5380ad32c97c69748693e
+size 5445

scorer_fp32.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8fd9305a1d393d0eb9551c2dabaf7707aeea80910d246588b6ff8576bcab8cce
+size 3831

span_rep_fp16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:df0d2e318daf16e0b85c1c5ed0b223d529f26a09865aea29e805e0191b8041c2
+size 33074792

span_rep_fp16_iobinding.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f06716a2fb0f82fe591fea051353024ea4fc48467a8c6f234378b7a534c3b715
+size 33074494

span_rep_fp32.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a526d00a3ef794223b8683f2ede9b15bda6a5190485539ecec661ef78194d4a1
+size 66121780

token_gather_fp16.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9f69059ca4db687f07d6a11d1c8f3c59780aff374978aec6fc51798beb13313a
+size 524

token_gather_fp16_iobinding.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7b9bd8ed770f2312ed5762b1209034e3c438fdea395eea7ecd9234a4ac72bd5d
+size 252

token_gather_fp32.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:739b66e91492ba6cdcf779497bdc9c322eaa141a9a1e2200c1db1c02ef63133b
+size 252

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f6df10ec83bea993035b2dd7c39345a3d4fcf23421c2adb6cb4ffc1e6d1bc4b5
+size 16020604