GLiNER2 Large ONNX QInt8 for CPU
This repository contains a QInt8 ONNX Runtime derivative of lmo3/gliner2-large-v1-onnx.
The repository keeps the same lightweight config and tokenizer layout as the upstream ONNX release and replaces the ONNX weights with dynamically quantized CPU-oriented variants.
Base model
- Upstream model: lmo3/gliner2-large-v1-onnx
- Upstream revision:
6adb78ae8098685d239dda324cc124d948962c21 - Upstream license:
MIT
Quantization
Important compatibility note
This repository now uses onnx_files.qint8 as the canonical config key for the quantized CPU weights.
To avoid breaking older deployments immediately, onnx_files.fp32 is still present as a deprecated compatibility alias that points to the same QInt8 files. It does not indicate float32 weights in this repository.
The deprecated alias will remain for compatibility, but new consumers should select qint8 explicitly.
- Runtime: ONNX Runtime
1.22.1 - Method: dynamic quantization
- Weight type:
QInt8 - Intended provider:
CPUExecutionProvider
Files
onnx/encoder.onnxonnx/classifier.onnx- canonical config key:
onnx_files.qint8 - deprecated compatibility alias:
onnx_files.fp32 config.jsongliner2_config.jsontokenizer.jsontokenizer_config.json
Benchmark
Benchmarked locally on Intel Core i7-9750H (AVX2) with CPUExecutionProvider.
| variant | size (MB) | avg latency (ms) | p95 latency (ms) | throughput (req/s) |
|---|---|---|---|---|
| upstream ONNX | 1673.28 | 211.79 | 263.74 | 4.72 |
| this QInt8 repo | 622.16 | 134.98 | 152.22 | 7.41 |
Relative to the upstream ONNX release on this host:
- throughput:
+57.0% - average latency:
-36.3% - artifact size:
-62.8%
Validation notes
- Label agreement with the upstream ONNX release on a 30-sample local benchmark:
30/30 - Benchmarks are hardware-dependent and should be treated as directional rather than universal.
- This repository is aimed at CPU inference. For GPU inference, prefer the non-quantized upstream release.
Compatibility
This repository is intended as an ONNX-format CPU-oriented derivative of the upstream model. For task framing, labels, and broader model documentation, refer to the upstream model card.
- Downloads last month
- 493