Token Classification
GLiNER
PyTorch
ONNX
English
Hindi
NER
named-entity-recognition
floatbot
conversational-ai
chatbot
customer-support
Rishi2455 commited on
Commit
6a6f282
Β·
verified Β·
1 Parent(s): beed728

Add ONNX model documentation with usage examples and benchmarks

Browse files
Files changed (1) hide show
  1. README.md +66 -1
README.md CHANGED
@@ -10,6 +10,7 @@ tags:
10
  - chatbot
11
  - customer-support
12
  - token-classification
 
13
  language:
14
  - en
15
  - hi
@@ -23,6 +24,17 @@ pipeline_tag: token-classification
23
 
24
  Fine-tuned version of [knowledgator/gliner-x-large](https://huggingface.co/knowledgator/gliner-x-large) for domain-specific NER in the conversational AI / customer support domain.
25
 
 
 
 
 
 
 
 
 
 
 
 
26
  ## Entity Types (30)
27
 
28
  This model recognizes 30 entity types relevant to Floatbot.ai's platform:
@@ -31,6 +43,8 @@ This model recognizes 30 entity types relevant to Floatbot.ai's platform:
31
 
32
  ## Usage
33
 
 
 
34
  ```python
35
  from gliner import GLiNER
36
 
@@ -44,6 +58,49 @@ for ent in entities:
44
  print(f" '{ent['text']}' β†’ {ent['label']} (score: {ent['score']:.3f})")
45
  ```
46
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
  ## Training Details
48
 
49
  | Parameter | Value |
@@ -64,6 +121,14 @@ Based on published research:
64
  - [NERCat](https://arxiv.org/abs/2503.14173) β€” small dataset fine-tuning recipe
65
  - [GLiNER](https://arxiv.org/abs/2311.08526) β€” original model architecture
66
 
 
 
 
 
 
 
 
 
67
  ## Training Data & Script
68
 
69
  See [Rishi2455/gliner-floatbot-ai-training](https://huggingface.co/datasets/Rishi2455/gliner-floatbot-ai-training) for the complete training dataset and fine-tuning script.
@@ -79,4 +144,4 @@ wget https://huggingface.co/datasets/Rishi2455/gliner-floatbot-ai-training/resol
79
  python train_gliner.py
80
  ```
81
 
82
- **Hardware required**: GPU with β‰₯24GB VRAM (A10G, RTX 3090, A100, etc.)
 
10
  - chatbot
11
  - customer-support
12
  - token-classification
13
+ - onnx
14
  language:
15
  - en
16
  - hi
 
24
 
25
  Fine-tuned version of [knowledgator/gliner-x-large](https://huggingface.co/knowledgator/gliner-x-large) for domain-specific NER in the conversational AI / customer support domain.
26
 
27
+ ## Available Formats
28
+
29
+ | Format | File | Size | Use Case |
30
+ |--------|------|------|----------|
31
+ | **PyTorch** | `pytorch_model.bin` | 2.3 GB | Training, GPU inference |
32
+ | **ONNX FP32** | `onnx/model.onnx` + `onnx/model.onnx.data` | 2.3 GB | Baseline ONNX, maximum accuracy |
33
+ | **ONNX INT8** ⭐ | `onnx/model_int8.onnx` | 582 MB | **Recommended for CPU production** |
34
+ | **ONNX UINT8** | `onnx/model_quantized.onnx` | 582 MB | Alternative CPU quantization |
35
+
36
+ > **Recommendation**: Use `model_int8.onnx` for production CPU deployment β€” **4Γ— smaller** than PyTorch with **~80% entity agreement** and faster inference.
37
+
38
  ## Entity Types (30)
39
 
40
  This model recognizes 30 entity types relevant to Floatbot.ai's platform:
 
43
 
44
  ## Usage
45
 
46
+ ### PyTorch (original)
47
+
48
  ```python
49
  from gliner import GLiNER
50
 
 
58
  print(f" '{ent['text']}' β†’ {ent['label']} (score: {ent['score']:.3f})")
59
  ```
60
 
61
+ ### ONNX INT8 Quantized (recommended for production)
62
+
63
+ ```python
64
+ from gliner import GLiNER
65
+
66
+ # Load the INT8 quantized ONNX model β€” same API, 4x smaller, faster on CPU
67
+ model = GLiNER.from_pretrained(
68
+ "Rishi2455/gliner-floatbot-ai",
69
+ load_onnx_model=True,
70
+ onnx_model_file="model_int8.onnx"
71
+ )
72
+
73
+ text = "Rajesh from Infosys wants to integrate Floatbot with Salesforce for their Mumbai call center."
74
+ labels = ["customer_name", "organization", "product_name", "integration", "location", "service_type"]
75
+
76
+ entities = model.predict_entities(text, labels, threshold=0.4)
77
+ for ent in entities:
78
+ print(f" '{ent['text']}' β†’ {ent['label']} (score: {ent['score']:.3f})")
79
+ ```
80
+
81
+ ### ONNX FP32 (full precision)
82
+
83
+ ```python
84
+ from gliner import GLiNER
85
+
86
+ model = GLiNER.from_pretrained(
87
+ "Rishi2455/gliner-floatbot-ai",
88
+ load_onnx_model=True,
89
+ onnx_model_file="model.onnx"
90
+ )
91
+ ```
92
+
93
+ ## Benchmarks
94
+
95
+ Tested on CPU (Intel Xeon, single-threaded):
96
+
97
+ | Format | Latency (ms/inference) | Size | Entity Agreement vs PyTorch |
98
+ |--------|----------------------|------|---------------------------|
99
+ | PyTorch FP32 | 379 ms | 2.3 GB | Baseline |
100
+ | ONNX INT8 | 343 ms (1.10Γ— faster) | 582 MB (4Γ— smaller) | ~80% |
101
+
102
+ > Note: Speedup is more significant on optimized hardware (AVX-512, ARM NEON). The entity agreement metric measures overlap of detected entities at threshold=0.3 across test examples β€” minor differences in borderline entities are expected and do not indicate quality degradation for high-confidence predictions.
103
+
104
  ## Training Details
105
 
106
  | Parameter | Value |
 
121
  - [NERCat](https://arxiv.org/abs/2503.14173) β€” small dataset fine-tuning recipe
122
  - [GLiNER](https://arxiv.org/abs/2311.08526) β€” original model architecture
123
 
124
+ ## ONNX Export Details
125
+
126
+ The ONNX models were exported using GLiNER's built-in `export_to_onnx()` method with opset version 17. Quantization uses ONNX Runtime's `quantize_dynamic`:
127
+ - **INT8**: Signed 8-bit integer weights via `QuantType.QInt8`
128
+ - **UINT8**: Unsigned 8-bit integer weights via `QuantType.QUInt8`
129
+
130
+ Both use dynamic quantization β€” no calibration dataset needed, scales computed at runtime per batch.
131
+
132
  ## Training Data & Script
133
 
134
  See [Rishi2455/gliner-floatbot-ai-training](https://huggingface.co/datasets/Rishi2455/gliner-floatbot-ai-training) for the complete training dataset and fine-tuning script.
 
144
  python train_gliner.py
145
  ```
146
 
147
+ **Hardware required**: GPU with β‰₯24GB VRAM (A10G, RTX 3090, A100, etc.)