SemplificaAI
/

gliner2-multi-v1-onnx

+---
+language:
+- en
+- fr
+- it
+- es
+- de
+- pt
+tags:
+- gliner
+- ner
+- information-extraction
+- onnx
+- rust
+pipeline_tag: token-classification
+library_name: gliner2-rs
+---
+# GLiNER2 Multi V1 (ONNX)
+This repository contains the ONNX-exported weights for the official [fastino/gliner2-multi-v1](https://huggingface.co/fastino/gliner2-multi-v1) model.
+The model has been specifically exported, fragmented, and optimized to be used natively in **Rust** using the [SemplificaAI/gliner2-rs](https://github.com/SemplificaAI/gliner2-rs) inference engine, powered by ONNX Runtime.
+## Model Formats Available
+To overcome ONNX static graph limitations with GLiNER2's dynamic routing, the model is split into 5 fragments (`encoder`, `span_rep`, `count_pred`, `count_lstm`, `classifier`).
+- **`fp16/`**: Half-precision ONNX weights (~580MB total). Highly recommended for Edge Devices, NPUs (Qualcomm Snapdragon X Elite / Apple Neural Engine) and GPUs.
+- **`fp32/`**: Full-precision ONNX weights (~1.2GB total). Recommended for standard CPU execution where FP16 is not natively accelerated.
+## 🚀 Usage with Rust (`gliner2-rs`)
+This model is designed to be used with the zero-Python Rust inference engine, allowing you to run complex multi-task NLP pipelines with native performance and hardware acceleration.
+### 1. Installation
+Add the engine to your `Cargo.toml`:
+```toml
+[dependencies]
+gliner2_inference = { git = "https://github.com/SemplificaAI/gliner2-rs" }
+ort = { version = "2.0.0-rc.9", features = ["cuda", "half"] } # Or specific Execution Providers
+```
+### 2. Download Weights
+Download the contents of the `fp16/` folder to a local directory, for example `./models/gliner2_multi_v1_fp16/`.
+### 3. Rust Inference Example
+```rust
+use gliner2_inference::{Gliner2Engine, Gliner2Config, SchemaTask, ModelType};
+fn main() -> anyhow::Result<()> {
+    // 1. Initialize ONNX Runtime with desired Execution Providers (CPU, CUDA, QNN, CoreML, etc.)
+    ort::init().with_name("GLiNER2_Engine").commit()?;
+    // 2. Configure engine pointing to the downloaded FP16 fragments
+    let config = Gliner2Config {
+        models_dir: "./models/gliner2_multi_v1_fp16".to_string(),
+        max_width: 8, // Max tokens per span
+        model_type: ModelType::HuggingFace, // Automatically routes tensors correctly
+    };
+    // 3. Load Session
+    let engine = Gliner2Engine::new(config)?;
+    let text = "Apple Inc. announced its quarterly earnings report on January 15, 2024, showing a revenue of $119.6 billion.";
+    // 4. Define dynamic Schema Tasks
+    let tasks = vec![
+        SchemaTask::Entities(vec![
+            "person_name".to_string(),
+            "organization_name".to_string(),
+            "date".to_string(),
+            "amount".to_string()
+        ])
+    ];
+    // 5. Extract features in a single forward pass
+    let (entities, relations, classifications) = engine.extract(text, &tasks)?;
+    for entity in entities {
+        println!("Found: {} (Label: {} - Score: {:.2}%)", entity.text, entity.label, entity.score * 100.0);
+    }
+    Ok(())
+}
+```
+## Supported Execution Providers
+Thanks to the fragmented ONNX structure, `gliner2-rs` can route the computation to specialized hardware automatically:
+- **Qualcomm NPU** (`QNNExecutionProvider`)
+- **Apple Silicon** (`CoreMLExecutionProvider`)
+- **Intel / AMD AI** (`OpenVINOExecutionProvider`)
+- **Nvidia GPU** (`CUDAExecutionProvider`)
+- **ARM64 CPU** (`XNNPACKExecutionProvider`)
+## Acknowledgments
+Original model architecture and weights by [Urchade / Fastino](https://huggingface.co/fastino/gliner2-multi-v1).
+ONNX Export Pipeline and Rust Native Engine by [Semplifica s.r.l.](https://semplifica.ai)