GLiNER2 Multi v1 - ONNX

ONNX export of fastino/gliner2-multi-v1 for zero-shot NER + Relation Extraction.

Multilingual (100+ languages via mDeBERTa-v3-base). Tested on English, German, French, Spanish.

Variants

Variant	Encoder	Total	Quality	Recommended
fp16	530 MB	~645 MB	99.9999% cosine vs FP32	Yes (default)
fp32	1059 MB	~1.2 GB	Baseline	When every bit matters

FP16 uses a hybrid approach: weights stored in Float16 on disk, Cast nodes auto-convert to Float32 at runtime. This gives full FP32 precision during inference with half the download/disk size.

Why not INT8?

INT8 quantization is not recommended for this model. Our testing revealed that INT8 destroys precision at special token positions ([R], [E]) used for relation extraction:

Token	FP32 vs INT8 cosine	FP32 vs FP16 cosine
`[R]` head	0.805	0.999999
`[R]` tail	0.787	0.999999
Regular text	0.87-0.95	0.999999

This causes RE scores to flip (e.g., "Tim Cook works_at Apple" becomes "Tim Cook works_at Cupertino"). INT8 is acceptable for NER-only use cases but will produce wrong relation extraction results. The encoder_int8.onnx file is included for NER-only scenarios but not recommended.

Architecture

5 ONNX models (mDeBERTa-v3-base encoder):

Model	Inputs	Outputs
`encoder.onnx` / `encoder_fp16.onnx`	input_ids, attention_mask	hidden_state (batch, seq, 768)
`span_rep.onnx`	hidden_states, span_start_idx, span_end_idx	span_representations (batch, spans, 768)
`count_embed.onnx`	label_embeddings	transformed_embeddings
`count_pred.onnx`	schema_embedding	count_logits
`classifier.onnx`	hidden_state	logits

Special tokens: [P]=250104, [E]=250106, [R]=250107, [SEP_TEXT]=250103

NER schema: ( [P] entities ( [E] person [E] company ) ) [SEP_TEXT] <text>
RE schema: ( [P] works_at ( [R] head [R] tail ) ) [SEP_TEXT] <text>

Usage with engram (Rust, in-process, no sidecar)

let mut backend = Gliner2Backend::load(&model_dir, "fp16")?;

// NER
let entities = backend.extract_entities(text, &["person", "company", "city"], 0.3)?;

// Relation Extraction
let relations = backend.extract_relations(text, &["works_at", "headquartered_in"], 0.3)?;

// Combined
let (entities, relations) = backend.extract_all(text, &ner_labels, &rel_types, 0.3, 0.3)?;

Test Results (10/10 pass, FP16 hybrid)

Task	Lang	Result	Score
NER: Bill Gates (person), Microsoft (company)	EN	PASS	100%
NER: Tim Cook, Apple, Cupertino	DE	PASS	100%
NER: Emmanuel Macron, France, Elysee	FR	PASS	99%
NER: Elon Musk, Tesla, Austin	ES	PASS	100%
RE: Bill Gates founded Microsoft	EN	PASS	h:98% t:97%
RE: Tim Cook works_at Apple	DE	PASS	h:100% t:100%
RE: Apple headquartered_in Cupertino	DE	PASS	h:100% t:100%
RE: NATO supports Ukraine	DE	PASS	h:100% t:99%
RE: Macron leads France	FR	PASS	h:100% t:95%
RE: Elon Musk works_at Tesla	ES	PASS	detected

Re-export from PyTorch

Use the included export_gliner2_onnx.py to export any GLiNER2 model:

python export_gliner2_onnx.py fastino/gliner2-multi-v1 output_dir/ --quantize
python export_gliner2_onnx.py fastino/gliner2-large-v1 output_dir/ --quantize

Requirements: pip install gliner2 torch onnx onnxscript onnxruntime

License

Apache-2.0 (same as base model)

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for dx111ge/gliner2-multi-v1-onnx

Base model

fastino/gliner2-multi-v1

Quantized

(5)

this model

dx111ge
/

gliner2-multi-v1-onnx