GLiNER2 Large ONNX Dynamic Q8 for CPU
This repository contains a dynamic_q8 ONNX Runtime derivative of lmo3/gliner2-large-v1-onnx.
It is intended for CPU inference and for runtimes that select the dynamic_q8 precision entry from gliner2_config.json.
Base model
- Upstream model: lmo3/gliner2-large-v1-onnx
- Upstream revision:
6adb78ae8098685d239dda324cc124d948962c21 - Upstream license:
MIT
Quantization
- Runtime: ONNX Runtime
1.24.2 - Method: dynamic quantization
- Weight type:
QInt8 - Precision key:
dynamic_q8 - Intended provider:
CPUExecutionProvider
Files
onnx/classifier.dynamic_q8.onnxonnx/count_embed.dynamic_q8.onnxonnx/encoder.dynamic_q8.onnxonnx/span_rep.dynamic_q8.onnxconfig.jsongliner2_config.jsontokenizer.jsontokenizer_config.json
Notes
- The ONNX file names are differentiated, for example
onnx/encoder.dynamic_q8.onnx. gliner2_config.jsonexposes adynamic_q8precision entry pointing at those files.- This export is generated from the upstream
fp32ONNX bundle and is intended as a CPU-oriented derivative.
Analysisserver
Example environment settings:
ANALYSIS_SERVER_MODEL_ID=temsa/gliner2-large-v1-onnx-cpu-dynamic-q8
ANALYSIS_SERVER_MODEL_PRECISION=dynamic_q8
- Downloads last month
- 42
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support