GLiNER4j ONNX

ONNX export of fastino-ai/gliner2-base for Java inference via ONNX Runtime.

Part of the GLiNER4j project.

Supported Tasks

Task Description
Named Entity Recognition Extract typed entity spans from text with confidence scores
Text Classification Assign labels to text with multi-label support and confidence scores

Both tasks support entity/label descriptions for improved accuracy and per-call overrides without model reloading.

Repository Structure

β”œβ”€β”€ gliner4j_config.json        # Shared model configuration
β”œβ”€β”€ tokenizer.json              # Shared HuggingFace tokenizer
β”œβ”€β”€ tokenizer_config.json
β”œβ”€β”€ onnx/                       # Base FP32 (~830 MB)
β”‚   β”œβ”€β”€ encoder.onnx
β”‚   β”œβ”€β”€ span_rep.onnx
β”‚   β”œβ”€β”€ scoring_head.onnx
β”‚   └── classifier_head.onnx
β”œβ”€β”€ onnx_fp16/                  # FP16 (~416 MB, ~50% smaller)
β”‚   β”œβ”€β”€ encoder.onnx
β”‚   β”œβ”€β”€ span_rep.onnx
β”‚   β”œβ”€β”€ scoring_head.onnx
β”‚   └── classifier_head.onnx
└── onnx_quantized/             # INT8 dynamic quantization (~208 MB, ~75% smaller)
    β”œβ”€β”€ encoder.onnx
    β”œβ”€β”€ span_rep.onnx
    β”œβ”€β”€ scoring_head.onnx
    └── classifier_head.onnx

Model Architecture

The model is split into 4 ONNX modules for modular inference:

Module Description
encoder.onnx DeBERTaV2 transformer encoder (shared)
span_rep.onnx Span representation layer (NER)
scoring_head.onnx Count-aware scoring head (NER)
classifier_head.onnx Classifier head MLP (Classification)

Variants

Variant Folder Precision Size Use case
Base onnx/ FP32 ~830 MB Maximum accuracy
FP16 onnx_fp16/ FP16 ~416 MB Good accuracy/size trade-off
Quantized onnx_quantized/ INT8 ~208 MB Smallest footprint, fastest on CPU

To download a specific variant only:

huggingface-cli download <repo> --include "onnx_fp16/*" "*.json"

Configuration

Parameter Value
Hidden size 768
Max span width 8
Max count 20
Span mode SpanMarkerV0
Token pooling first
ONNX opset 17

Usage

Use with GLiNER4j, a Java library for GLiNER2 inference via ONNX Runtime.

Named Entity Recognition

var entities = List.of(
    new EntityDefinition("person", "Names of individuals"),
    new EntityDefinition("organization", "Company or institution names")
);
var gliner = GLiNER4jNER.load(modelDir, entities);
Map<String, List<EntitySpan>> results = gliner.extract("John works at Google.");

Text Classification

var labels = List.of(
    new ClassificationLabel("positive", "Expresses positive sentiment"),
    new ClassificationLabel("negative", "Expresses negative sentiment")
);
var classifier = GLiNER4jClassifier.load(modelDir, labels);
List<ClassificationResult> results = classifier.classify("Great product!");

Model Variants

// FP16 variant
var gliner = GLiNER4jNER.load(modelDir, entities, "onnx_fp16");
var classifier = GLiNER4jClassifier.load(modelDir, labels, "onnx_fp16");

// Quantized variant
var gliner = GLiNER4jNER.load(modelDir, entities, "onnx_quantized");
var classifier = GLiNER4jClassifier.load(modelDir, labels, "onnx_quantized");

Features

  • Entity/Label Descriptions: Provide natural language descriptions alongside entity types or classification labels to improve model accuracy
  • Per-call Overrides: Change entities or labels at inference time without reloading the model
  • Batch Processing: Batched encoder calls with virtual thread parallelism for scoring
  • OpenTelemetry: Built-in instrumentation for duration, text count, and result count metrics (zero overhead when no OTel SDK is present)
  • Runtime Configuration: Control thread pools, graph optimization level, and model caching

Benchmarks

Measured with JMH (average time, 15 iterations):

4 Entity Types

Batch Size Avg Latency (ms/op) Per-Text (ms) Throughput (texts/s)
1 26.5 26.5 ~37.7
4 143.5 35.9 ~27.9
8 286.6 35.8 ~27.9

8 Entity Types

Batch Size Avg Latency (ms/op) Per-Text (ms) Throughput (texts/s)
1 34.1 34.1 ~29.3
4 174.6 43.7 ~22.9
8 339.2 42.4 ~23.6

License

Apache License 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support