GLiNER2 Large ONNX QInt8 for CPU

This repository contains a QInt8 ONNX Runtime derivative of lmo3/gliner2-large-v1-onnx.

The repository keeps the same lightweight config and tokenizer layout as the upstream ONNX release and replaces the ONNX weights with dynamically quantized CPU-oriented variants.

Base model

Quantization

Important compatibility note

This repository now uses onnx_files.qint8 as the canonical config key for the quantized CPU weights.

To avoid breaking older deployments immediately, onnx_files.fp32 is still present as a deprecated compatibility alias that points to the same QInt8 files. It does not indicate float32 weights in this repository.

The deprecated alias will remain for compatibility, but new consumers should select qint8 explicitly.

  • Runtime: ONNX Runtime 1.22.1
  • Method: dynamic quantization
  • Weight type: QInt8
  • Intended provider: CPUExecutionProvider

Files

  • onnx/encoder.onnx
  • onnx/classifier.onnx
  • canonical config key: onnx_files.qint8
  • deprecated compatibility alias: onnx_files.fp32
  • config.json
  • gliner2_config.json
  • tokenizer.json
  • tokenizer_config.json

Benchmark

Benchmarked locally on Intel Core i7-9750H (AVX2) with CPUExecutionProvider.

variant size (MB) avg latency (ms) p95 latency (ms) throughput (req/s)
upstream ONNX 1673.28 211.79 263.74 4.72
this QInt8 repo 622.16 134.98 152.22 7.41

Relative to the upstream ONNX release on this host:

  • throughput: +57.0%
  • average latency: -36.3%
  • artifact size: -62.8%

Validation notes

  • Label agreement with the upstream ONNX release on a 30-sample local benchmark: 30/30
  • Benchmarks are hardware-dependent and should be treated as directional rather than universal.
  • This repository is aimed at CPU inference. For GPU inference, prefer the non-quantized upstream release.

Compatibility

This repository is intended as an ONNX-format CPU-oriented derivative of the upstream model. For task framing, labels, and broader model documentation, refer to the upstream model card.

Downloads last month
493
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for temsa/gliner2-large-v1-onnx-cpu-qint8

Quantized
(2)
this model