GLiNER2 Multi v1 - ONNX

ONNX export of fastino/gliner2-multi-v1 for zero-shot NER + Relation Extraction.

Multilingual (100+ languages via mDeBERTa-v3-base). Tested on English, German, French, Spanish.

Variants

Variant Encoder Total Quality Recommended
fp16 530 MB ~645 MB 99.9999% cosine vs FP32 Yes (default)
fp32 1059 MB ~1.2 GB Baseline When every bit matters

FP16 uses a hybrid approach: weights stored in Float16 on disk, Cast nodes auto-convert to Float32 at runtime. This gives full FP32 precision during inference with half the download/disk size.

Why not INT8?

INT8 quantization is not recommended for this model. Our testing revealed that INT8 destroys precision at special token positions ([R], [E]) used for relation extraction:

Token FP32 vs INT8 cosine FP32 vs FP16 cosine
[R] head 0.805 0.999999
[R] tail 0.787 0.999999
Regular text 0.87-0.95 0.999999

This causes RE scores to flip (e.g., "Tim Cook works_at Apple" becomes "Tim Cook works_at Cupertino"). INT8 is acceptable for NER-only use cases but will produce wrong relation extraction results. The encoder_int8.onnx file is included for NER-only scenarios but not recommended.

Architecture

5 ONNX models (mDeBERTa-v3-base encoder):

Model Inputs Outputs
encoder.onnx / encoder_fp16.onnx input_ids, attention_mask hidden_state (batch, seq, 768)
span_rep.onnx hidden_states, span_start_idx, span_end_idx span_representations (batch, spans, 768)
count_embed.onnx label_embeddings transformed_embeddings
count_pred.onnx schema_embedding count_logits
classifier.onnx hidden_state logits

Special tokens: [P]=250104, [E]=250106, [R]=250107, [SEP_TEXT]=250103

  • NER schema: ( [P] entities ( [E] person [E] company ) ) [SEP_TEXT] <text>
  • RE schema: ( [P] works_at ( [R] head [R] tail ) ) [SEP_TEXT] <text>

Usage with engram (Rust, in-process, no sidecar)

let mut backend = Gliner2Backend::load(&model_dir, "fp16")?;

// NER
let entities = backend.extract_entities(text, &["person", "company", "city"], 0.3)?;

// Relation Extraction
let relations = backend.extract_relations(text, &["works_at", "headquartered_in"], 0.3)?;

// Combined
let (entities, relations) = backend.extract_all(text, &ner_labels, &rel_types, 0.3, 0.3)?;

Test Results (10/10 pass, FP16 hybrid)

Task Lang Result Score
NER: Bill Gates (person), Microsoft (company) EN PASS 100%
NER: Tim Cook, Apple, Cupertino DE PASS 100%
NER: Emmanuel Macron, France, Elysee FR PASS 99%
NER: Elon Musk, Tesla, Austin ES PASS 100%
RE: Bill Gates founded Microsoft EN PASS h:98% t:97%
RE: Tim Cook works_at Apple DE PASS h:100% t:100%
RE: Apple headquartered_in Cupertino DE PASS h:100% t:100%
RE: NATO supports Ukraine DE PASS h:100% t:99%
RE: Macron leads France FR PASS h:100% t:95%
RE: Elon Musk works_at Tesla ES PASS detected

Re-export from PyTorch

Use the included export_gliner2_onnx.py to export any GLiNER2 model:

python export_gliner2_onnx.py fastino/gliner2-multi-v1 output_dir/ --quantize
python export_gliner2_onnx.py fastino/gliner2-large-v1 output_dir/ --quantize

Requirements: pip install gliner2 torch onnx onnxscript onnxruntime

Links

License

Apache-2.0 (same as base model)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dx111ge/gliner2-multi-v1-onnx

Quantized
(5)
this model