GLiNER2 Large ONNX Dynamic Q8 for CPU

This repository contains a dynamic_q8 ONNX Runtime derivative of lmo3/gliner2-large-v1-onnx.

It is intended for CPU inference and for runtimes that select the dynamic_q8 precision entry from gliner2_config.json.

Base model

The ONNX file names are differentiated, for example onnx/encoder.dynamic_q8.onnx.
gliner2_config.json exposes a dynamic_q8 precision entry pointing at those files.
This export is generated from the upstream fp32 ONNX bundle and is intended as a CPU-oriented derivative.

Example environment settings:

ANALYSIS_SERVER_MODEL_ID=temsa/gliner2-large-v1-onnx-cpu-dynamic-q8
ANALYSIS_SERVER_MODEL_PRECISION=dynamic_q8

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Quantized

Quantized

(2)

this model