GLiNER4j ONNX
ONNX export of fastino-ai/gliner2-base for Java inference via ONNX Runtime.
Part of the gliner4j project.
Repository Structure
βββ gliner4j_config.json # Shared model configuration
βββ tokenizer.json # Shared HuggingFace tokenizer
βββ tokenizer_config.json
βββ onnx/ # Base FP32 (~830 MB)
β βββ encoder.onnx
β βββ span_rep.onnx
β βββ scoring_head.onnx
βββ onnx_fp16/ # FP16 (~416 MB, ~50% smaller)
β βββ encoder.onnx
β βββ span_rep.onnx
β βββ scoring_head.onnx
βββ onnx_quantized/ # INT8 dynamic quantization (~208 MB, ~75% smaller)
βββ encoder.onnx
βββ span_rep.onnx
βββ scoring_head.onnx
Model Architecture
The model is split into 3 ONNX modules for modular inference:
| Module | Description |
|---|---|
encoder.onnx |
DebertaV2 transformer encoder |
span_rep.onnx |
Span representation layer |
scoring_head.onnx |
Count-aware scoring head |
Variants
| Variant | Folder | Precision | Size | Use case |
|---|---|---|---|---|
| Base | onnx/ |
FP32 | ~830 MB | Maximum accuracy |
| FP16 | onnx_fp16/ |
FP16 | ~416 MB | Good accuracy/size trade-off |
| Quantized | onnx_quantized/ |
INT8 | ~208 MB | Smallest footprint, fastest on CPU |
To download a specific variant only:
huggingface-cli download <repo> --include "onnx_fp16/*" "*.json"
Configuration
| Parameter | Value |
|---|---|
| Hidden size | 768 |
| Max span width | 8 |
| Max count | 20 |
| Span mode | SpanMarkerV0 |
| Token pooling | first |
| ONNX opset | 17 |
Usage
Use with gliner4j, a Java library for GLiNER2 inference via ONNX Runtime.
// Base FP32 (default)
var gliner = GLiNER4j.load(modelDir, entities);
// FP16 variant
var gliner = GLiNER4j.load(modelDir, entities, "onnx_fp16");
// Quantized variant
var gliner = GLiNER4j.load(modelDir, entities, "onnx_quantized");
Benchmarks
Measured with JMH (average time, 15 iterations):
4 Entity Types
| Batch Size | Avg Latency (ms/op) | Per-Text (ms) | Throughput (texts/s) |
|---|---|---|---|
| 1 | 26.5 | 26.5 | ~37.7 |
| 4 | 143.5 | 35.9 | ~27.9 |
| 8 | 286.6 | 35.8 | ~27.9 |
8 Entity Types
| Batch Size | Avg Latency (ms/op) | Per-Text (ms) | Throughput (texts/s) |
|---|---|---|---|
| 1 | 34.1 | 34.1 | ~29.3 |
| 4 | 174.6 | 43.7 | ~22.9 |
| 8 | 339.2 | 42.4 | ~23.6 |
License
Apache License 2.0