dariofinardi's picture
Initial commit β€” ONNX 8-fragment export (FP32 + FP16 + FP16_IOBinding) of fastino/gliner2-privacy-filter-PII-multi
a255827 verified
---
library_name: gliner2
license: apache-2.0
base_model: fastino/gliner2-privacy-filter-PII-multi
pipeline_tag: token-classification
tags:
- token-classification
- gliner2
- gliner
- onnx
- rust
- pii
- ner
- privacy
- redaction
- information-extraction
- span-extraction
- iobinding
language:
- en
- fr
- es
- de
- it
- pt
- nl
---
# GLiNER2 Privacy-Filter PII Multi (ONNX Fragmented & IOBinding)
This repository contains the **ONNX-exported weights** of [`fastino/gliner2-privacy-filter-PII-multi`](https://huggingface.co/fastino/gliner2-privacy-filter-PII-multi),
the multilingual **PII detection model** built on GLiNER2 by Fastino AI.
The model is exported in a **fragmented format** (encoder, token_gather, span_rep, schema_gather, count_pred_argmax, count_lstm_fixed, scorer, classifier) for direct compatibility with [gliner2-rs](https://github.com/SemplificaAI/gliner2-rs), the official **Zero-Python Native Rust inference engine** for GLiNER2.
It supports detection of **42 PII entity types** across **7 languages** (EN, FR, ES, DE, IT, PT, NL).
---
## πŸ†• V2 Zero-Copy IOBinding Models
Like the [`gliner2-multi-v1-onnx`](https://huggingface.co/SemplificaAI/gliner2-multi-v1-onnx) base release, this repo ships the **V2 fused IOBinding** variant. `Gather`, `ArgMax`, `MatMul` operations are fused directly into the ONNX graphs so that tensors **never leave the GPU/NPU VRAM**, bypassing the PCIe bus and cutting inference latency by ~30 % on discrete GPUs.
## πŸ“‚ Available Variants
| Variant | Use case | Notes |
|---|---|---|
| **`fp16_v2`** *(recommended)* | NVIDIA CUDA Β· AMD ROCm Β· Apple CoreML Β· Qualcomm QNN | Zero-Copy VRAM (IOBinding), full FP16 IO, fused ops |
| **`fp32_v2`** | CPU (AVX2 / XNNPACK / ARM NEON) | High precision V2 fusions for CPU |
| **`fp16`** *(standard)* | Legacy compatible, all EPs | FP32 IO (CoreML-compatible), slower on CUDA due to PCIe round-trips |
| **`fp32`** *(standard)* | Universal fallback | Legacy Float32 |
Each variant ships 8 fragments:
```
encoder_{precision}.onnx ~530–1060 MB
token_gather_{precision}.onnx ~ <1 MB
span_rep_{precision}.onnx ~32–63 MB
schema_gather_{precision}.onnx ~ <1 MB
count_pred_argmax_{precision}.onnx ~2–5 MB
count_lstm_fixed_{precision}.onnx ~20–41 MB
scorer_{precision}.onnx ~ <1 MB
classifier_{precision}.onnx ~2–5 MB
```
Total: **~590 MB (FP16)** or **~1.17 GB (FP32)** per variant.
---
## 🎯 Supported PII Labels (42 types)
### Person / Names (6 labels)
`person`, `full_name`, `first_name`, `middle_name`, `last_name`, `date_of_birth`
### Contact / Address (8 labels)
`email`, `phone_number`, `address`, `street_address`, `city`, `state_or_region`, `postal_code`, `country`
### Government / Tax IDs (7 labels)
`government_id`, `national_id_number`, `passport_number`, `drivers_license_number`, `license_number`, `tax_id`, `tax_number`
### Banking / Payment (8 labels)
`bank_account`, `account_number`, `routing_number`, `iban`, `payment_card`, `card_number`, `card_expiry`, `card_cvv`
### Digital Identity (4 labels)
`username`, `ip_address`, `account_id`, `sensitive_account_id`
### Secrets / Credentials (5 labels)
`password`, `secret`, `api_key`, `access_token`, `recovery_code`
### Sensitive Dates (4 labels)
`sensitive_date`, `document_date`, `expiration_date`, `transaction_date`
---
## πŸš€ Usage in Rust (`gliner2-rs`)
```rust
use gliner2_inference::{Gliner2Engine, ModelType, SchemaTask};
// Auto-downloads the V2 FP16 fragments from this HuggingFace repo
// and switches to the high-performance IOBinding engine.
let engine = Gliner2Engine::from_pretrained(
"SemplificaAI/gliner2-privacy-filter-PII-multi",
Some("fp16_v2"),
ModelType::HuggingFace,
)?;
let text = "Please contact Maria Jensen at maria.jensen@example.dk or +45 20 12 34 56.";
let tasks = vec![
SchemaTask::Entities(vec![
"person".into(), "email".into(), "phone_number".into(),
])
];
let (entities, _, _) = engine.extract(text, &tasks)?;
```
Requires **`gliner2-rs >= 0.4.1`** for automatic V2 detection / IOBinding routing.
## 🐍 Usage in Python (`onnxruntime`)
Run the 8-fragment pipeline manually (no Python `gliner2` dependency needed):
```python
import onnxruntime as ort
# Per fragment (example for the encoder, CUDA backend)
encoder = ort.InferenceSession(
"encoder_fp16_iobinding.onnx",
providers=["CUDAExecutionProvider"],
)
# ...load the other 7 fragments analogously...
# Chain them via IOBinding (see validate_onnx_v2.py for a full reference impl)
```
For a simpler entry point you can keep using the original PyTorch model via the `gliner2` Python package on `fastino/gliner2-privacy-filter-PII-multi`; this ONNX repo is optimised for **production deployment without Python**.
---
## πŸ›  Pipeline Wiring (IOBinding chain)
```
encoder_fp16_iobinding.onnx
β”‚
β”œβ”€ token_gather_fp16_iobinding.onnx
β”‚ └─ span_rep_fp16_iobinding.onnx
β”‚
└─ schema_gather_fp16_iobinding.onnx
β”œβ”€ count_pred_argmax_fp16_iobinding.onnx β†’ pred_count (int64)
└─ count_lstm_fixed_fp16_iobinding.onnx
└─ scorer_fp16_iobinding.onnx β†’ entity_scores
classifier_fp16_iobinding.onnx (only for classification tasks)
```
---
## βš™οΈ Technical Notes
- **opset 17** (ONNX 1.14+) for maximum execution-provider compatibility.
- `count_lstm_fixed` exports the GRU **unrolled to 20 fixed steps** at tracing time β†’ compatible with execution providers that don't support dynamic loops (Apple CoreML, Qualcomm QNN).
- `scorer` uses **fused Reshape + MatMul + Transpose** instead of `Einsum` for compatibility with QNN/CoreML FP16.
- **INT8 not supported**: the DeBERTa-v3 disentangled-attention activations contain extreme outliers that saturate 8-bit ranges (the same limitation called out by the GLiNER2 maintainers). FP16 remains the optimal compression target.
- **Encoder size**: ~1.06 GB FP32 β†’ ~530 MB FP16. Larger than the multi-v1 base because of the wider classification head (42 PII labels) and per-language fine-tuning.
## πŸͺͺ License
Apache 2.0 β€” same as the upstream model.
## πŸ™ Acknowledgements
- Upstream model: [`fastino/gliner2-privacy-filter-PII-multi`](https://huggingface.co/fastino/gliner2-privacy-filter-PII-multi) by Fastino AI.
- GLiNER2 paper: Zaratiana et al., *GLiNER2: Schema-Driven Multi-Task Learning for Structured Information Extraction*, EMNLP 2025.
- ONNX fragmentation + IOBinding strategy: Semplifica s.r.l., as used in [`gliner2-multi-v1-onnx`](https://huggingface.co/SemplificaAI/gliner2-multi-v1-onnx).
## πŸ“š Citation
```bibtex
@misc{fastino2026gliner2pii,
title = {GLiNER2-PII: Multilingual PII Extraction via Synthetic Fine-Tuning},
author = {{Fastino AI Team}},
year = {2026},
url = {https://huggingface.co/fastino/gliner2-privacy-filter-PII-multi}
}
@inproceedings{zaratiana-etal-2025-gliner2,
title = {GLiNER2: Schema-Driven Multi-Task Learning for Structured Information Extraction},
author = {Zaratiana, Urchade and Pasternak, Gil and Boyd, Oliver and Hurn-Maloney, George and Lewis, Ash},
booktitle = {Proceedings of EMNLP 2025: System Demonstrations},
year = {2025}
}
```