GLiNER2 Large ONNX Dynamic Q8 for CPU

This repository contains a dynamic_q8 ONNX Runtime derivative of lmo3/gliner2-large-v1-onnx.

It is intended for CPU inference and for runtimes that select the dynamic_q8 precision entry from gliner2_config.json.

Base model

Quantization

  • Runtime: ONNX Runtime 1.24.2
  • Method: dynamic quantization
  • Weight type: QInt8
  • Precision key: dynamic_q8
  • Intended provider: CPUExecutionProvider

Files

  • onnx/classifier.dynamic_q8.onnx
  • onnx/count_embed.dynamic_q8.onnx
  • onnx/encoder.dynamic_q8.onnx
  • onnx/span_rep.dynamic_q8.onnx
  • config.json
  • gliner2_config.json
  • tokenizer.json
  • tokenizer_config.json

Notes

  • The ONNX file names are differentiated, for example onnx/encoder.dynamic_q8.onnx.
  • gliner2_config.json exposes a dynamic_q8 precision entry pointing at those files.
  • This export is generated from the upstream fp32 ONNX bundle and is intended as a CPU-oriented derivative.

Analysisserver

Example environment settings:

ANALYSIS_SERVER_MODEL_ID=temsa/gliner2-large-v1-onnx-cpu-dynamic-q8
ANALYSIS_SERVER_MODEL_PRECISION=dynamic_q8
Downloads last month
42
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for temsa/gliner2-large-v1-onnx-cpu-dynamic-q8

Quantized
(2)
this model