Token Classification
GLiNER2
ONNX
GLiNER
Rust
pii
ner
privacy
redaction
information-extraction
span-extraction
iobinding
Instructions to use SemplificaAI/gliner2-privacy-filter-PII-multi with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- GLiNER2
How to use SemplificaAI/gliner2-privacy-filter-PII-multi with GLiNER2:
from gliner2 import GLiNER2 model = GLiNER2.from_pretrained("SemplificaAI/gliner2-privacy-filter-PII-multi") # Extract entities text = "Apple CEO Tim Cook announced iPhone 15 in Cupertino yesterday." result = extractor.extract_entities(text, ["company", "person", "product", "location"]) print(result) - GLiNER
How to use SemplificaAI/gliner2-privacy-filter-PII-multi with GLiNER:
from gliner import GLiNER model = GLiNER.from_pretrained("SemplificaAI/gliner2-privacy-filter-PII-multi") - Notebooks
- Google Colab
- Kaggle
Initial commit β ONNX 8-fragment export (FP32 + FP16 + FP16_IOBinding) of fastino/gliner2-privacy-filter-PII-multi
a255827 verified | library_name: gliner2 | |
| license: apache-2.0 | |
| base_model: fastino/gliner2-privacy-filter-PII-multi | |
| pipeline_tag: token-classification | |
| tags: | |
| - token-classification | |
| - gliner2 | |
| - gliner | |
| - onnx | |
| - rust | |
| - pii | |
| - ner | |
| - privacy | |
| - redaction | |
| - information-extraction | |
| - span-extraction | |
| - iobinding | |
| language: | |
| - en | |
| - fr | |
| - es | |
| - de | |
| - it | |
| - pt | |
| - nl | |
| # GLiNER2 Privacy-Filter PII Multi (ONNX Fragmented & IOBinding) | |
| This repository contains the **ONNX-exported weights** of [`fastino/gliner2-privacy-filter-PII-multi`](https://huggingface.co/fastino/gliner2-privacy-filter-PII-multi), | |
| the multilingual **PII detection model** built on GLiNER2 by Fastino AI. | |
| The model is exported in a **fragmented format** (encoder, token_gather, span_rep, schema_gather, count_pred_argmax, count_lstm_fixed, scorer, classifier) for direct compatibility with [gliner2-rs](https://github.com/SemplificaAI/gliner2-rs), the official **Zero-Python Native Rust inference engine** for GLiNER2. | |
| It supports detection of **42 PII entity types** across **7 languages** (EN, FR, ES, DE, IT, PT, NL). | |
| --- | |
| ## π V2 Zero-Copy IOBinding Models | |
| Like the [`gliner2-multi-v1-onnx`](https://huggingface.co/SemplificaAI/gliner2-multi-v1-onnx) base release, this repo ships the **V2 fused IOBinding** variant. `Gather`, `ArgMax`, `MatMul` operations are fused directly into the ONNX graphs so that tensors **never leave the GPU/NPU VRAM**, bypassing the PCIe bus and cutting inference latency by ~30 % on discrete GPUs. | |
| ## π Available Variants | |
| | Variant | Use case | Notes | | |
| |---|---|---| | |
| | **`fp16_v2`** *(recommended)* | NVIDIA CUDA Β· AMD ROCm Β· Apple CoreML Β· Qualcomm QNN | Zero-Copy VRAM (IOBinding), full FP16 IO, fused ops | | |
| | **`fp32_v2`** | CPU (AVX2 / XNNPACK / ARM NEON) | High precision V2 fusions for CPU | | |
| | **`fp16`** *(standard)* | Legacy compatible, all EPs | FP32 IO (CoreML-compatible), slower on CUDA due to PCIe round-trips | | |
| | **`fp32`** *(standard)* | Universal fallback | Legacy Float32 | | |
| Each variant ships 8 fragments: | |
| ``` | |
| encoder_{precision}.onnx ~530β1060 MB | |
| token_gather_{precision}.onnx ~ <1 MB | |
| span_rep_{precision}.onnx ~32β63 MB | |
| schema_gather_{precision}.onnx ~ <1 MB | |
| count_pred_argmax_{precision}.onnx ~2β5 MB | |
| count_lstm_fixed_{precision}.onnx ~20β41 MB | |
| scorer_{precision}.onnx ~ <1 MB | |
| classifier_{precision}.onnx ~2β5 MB | |
| ``` | |
| Total: **~590 MB (FP16)** or **~1.17 GB (FP32)** per variant. | |
| --- | |
| ## π― Supported PII Labels (42 types) | |
| ### Person / Names (6 labels) | |
| `person`, `full_name`, `first_name`, `middle_name`, `last_name`, `date_of_birth` | |
| ### Contact / Address (8 labels) | |
| `email`, `phone_number`, `address`, `street_address`, `city`, `state_or_region`, `postal_code`, `country` | |
| ### Government / Tax IDs (7 labels) | |
| `government_id`, `national_id_number`, `passport_number`, `drivers_license_number`, `license_number`, `tax_id`, `tax_number` | |
| ### Banking / Payment (8 labels) | |
| `bank_account`, `account_number`, `routing_number`, `iban`, `payment_card`, `card_number`, `card_expiry`, `card_cvv` | |
| ### Digital Identity (4 labels) | |
| `username`, `ip_address`, `account_id`, `sensitive_account_id` | |
| ### Secrets / Credentials (5 labels) | |
| `password`, `secret`, `api_key`, `access_token`, `recovery_code` | |
| ### Sensitive Dates (4 labels) | |
| `sensitive_date`, `document_date`, `expiration_date`, `transaction_date` | |
| --- | |
| ## π Usage in Rust (`gliner2-rs`) | |
| ```rust | |
| use gliner2_inference::{Gliner2Engine, ModelType, SchemaTask}; | |
| // Auto-downloads the V2 FP16 fragments from this HuggingFace repo | |
| // and switches to the high-performance IOBinding engine. | |
| let engine = Gliner2Engine::from_pretrained( | |
| "SemplificaAI/gliner2-privacy-filter-PII-multi", | |
| Some("fp16_v2"), | |
| ModelType::HuggingFace, | |
| )?; | |
| let text = "Please contact Maria Jensen at maria.jensen@example.dk or +45 20 12 34 56."; | |
| let tasks = vec![ | |
| SchemaTask::Entities(vec![ | |
| "person".into(), "email".into(), "phone_number".into(), | |
| ]) | |
| ]; | |
| let (entities, _, _) = engine.extract(text, &tasks)?; | |
| ``` | |
| Requires **`gliner2-rs >= 0.4.1`** for automatic V2 detection / IOBinding routing. | |
| ## π Usage in Python (`onnxruntime`) | |
| Run the 8-fragment pipeline manually (no Python `gliner2` dependency needed): | |
| ```python | |
| import onnxruntime as ort | |
| # Per fragment (example for the encoder, CUDA backend) | |
| encoder = ort.InferenceSession( | |
| "encoder_fp16_iobinding.onnx", | |
| providers=["CUDAExecutionProvider"], | |
| ) | |
| # ...load the other 7 fragments analogously... | |
| # Chain them via IOBinding (see validate_onnx_v2.py for a full reference impl) | |
| ``` | |
| For a simpler entry point you can keep using the original PyTorch model via the `gliner2` Python package on `fastino/gliner2-privacy-filter-PII-multi`; this ONNX repo is optimised for **production deployment without Python**. | |
| --- | |
| ## π Pipeline Wiring (IOBinding chain) | |
| ``` | |
| encoder_fp16_iobinding.onnx | |
| β | |
| ββ token_gather_fp16_iobinding.onnx | |
| β ββ span_rep_fp16_iobinding.onnx | |
| β | |
| ββ schema_gather_fp16_iobinding.onnx | |
| ββ count_pred_argmax_fp16_iobinding.onnx β pred_count (int64) | |
| ββ count_lstm_fixed_fp16_iobinding.onnx | |
| ββ scorer_fp16_iobinding.onnx β entity_scores | |
| classifier_fp16_iobinding.onnx (only for classification tasks) | |
| ``` | |
| --- | |
| ## βοΈ Technical Notes | |
| - **opset 17** (ONNX 1.14+) for maximum execution-provider compatibility. | |
| - `count_lstm_fixed` exports the GRU **unrolled to 20 fixed steps** at tracing time β compatible with execution providers that don't support dynamic loops (Apple CoreML, Qualcomm QNN). | |
| - `scorer` uses **fused Reshape + MatMul + Transpose** instead of `Einsum` for compatibility with QNN/CoreML FP16. | |
| - **INT8 not supported**: the DeBERTa-v3 disentangled-attention activations contain extreme outliers that saturate 8-bit ranges (the same limitation called out by the GLiNER2 maintainers). FP16 remains the optimal compression target. | |
| - **Encoder size**: ~1.06 GB FP32 β ~530 MB FP16. Larger than the multi-v1 base because of the wider classification head (42 PII labels) and per-language fine-tuning. | |
| ## πͺͺ License | |
| Apache 2.0 β same as the upstream model. | |
| ## π Acknowledgements | |
| - Upstream model: [`fastino/gliner2-privacy-filter-PII-multi`](https://huggingface.co/fastino/gliner2-privacy-filter-PII-multi) by Fastino AI. | |
| - GLiNER2 paper: Zaratiana et al., *GLiNER2: Schema-Driven Multi-Task Learning for Structured Information Extraction*, EMNLP 2025. | |
| - ONNX fragmentation + IOBinding strategy: Semplifica s.r.l., as used in [`gliner2-multi-v1-onnx`](https://huggingface.co/SemplificaAI/gliner2-multi-v1-onnx). | |
| ## π Citation | |
| ```bibtex | |
| @misc{fastino2026gliner2pii, | |
| title = {GLiNER2-PII: Multilingual PII Extraction via Synthetic Fine-Tuning}, | |
| author = {{Fastino AI Team}}, | |
| year = {2026}, | |
| url = {https://huggingface.co/fastino/gliner2-privacy-filter-PII-multi} | |
| } | |
| @inproceedings{zaratiana-etal-2025-gliner2, | |
| title = {GLiNER2: Schema-Driven Multi-Task Learning for Structured Information Extraction}, | |
| author = {Zaratiana, Urchade and Pasternak, Gil and Boyd, Oliver and Hurn-Maloney, George and Lewis, Ash}, | |
| booktitle = {Proceedings of EMNLP 2025: System Demonstrations}, | |
| year = {2025} | |
| } | |
| ``` | |