Add ONNX model documentation with usage examples and benchmarks
Browse files
README.md
CHANGED
|
@@ -10,6 +10,7 @@ tags:
|
|
| 10 |
- chatbot
|
| 11 |
- customer-support
|
| 12 |
- token-classification
|
|
|
|
| 13 |
language:
|
| 14 |
- en
|
| 15 |
- hi
|
|
@@ -23,6 +24,17 @@ pipeline_tag: token-classification
|
|
| 23 |
|
| 24 |
Fine-tuned version of [knowledgator/gliner-x-large](https://huggingface.co/knowledgator/gliner-x-large) for domain-specific NER in the conversational AI / customer support domain.
|
| 25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
## Entity Types (30)
|
| 27 |
|
| 28 |
This model recognizes 30 entity types relevant to Floatbot.ai's platform:
|
|
@@ -31,6 +43,8 @@ This model recognizes 30 entity types relevant to Floatbot.ai's platform:
|
|
| 31 |
|
| 32 |
## Usage
|
| 33 |
|
|
|
|
|
|
|
| 34 |
```python
|
| 35 |
from gliner import GLiNER
|
| 36 |
|
|
@@ -44,6 +58,49 @@ for ent in entities:
|
|
| 44 |
print(f" '{ent['text']}' β {ent['label']} (score: {ent['score']:.3f})")
|
| 45 |
```
|
| 46 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
## Training Details
|
| 48 |
|
| 49 |
| Parameter | Value |
|
|
@@ -64,6 +121,14 @@ Based on published research:
|
|
| 64 |
- [NERCat](https://arxiv.org/abs/2503.14173) β small dataset fine-tuning recipe
|
| 65 |
- [GLiNER](https://arxiv.org/abs/2311.08526) β original model architecture
|
| 66 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
## Training Data & Script
|
| 68 |
|
| 69 |
See [Rishi2455/gliner-floatbot-ai-training](https://huggingface.co/datasets/Rishi2455/gliner-floatbot-ai-training) for the complete training dataset and fine-tuning script.
|
|
@@ -79,4 +144,4 @@ wget https://huggingface.co/datasets/Rishi2455/gliner-floatbot-ai-training/resol
|
|
| 79 |
python train_gliner.py
|
| 80 |
```
|
| 81 |
|
| 82 |
-
**Hardware required**: GPU with β₯24GB VRAM (A10G, RTX 3090, A100, etc.)
|
|
|
|
| 10 |
- chatbot
|
| 11 |
- customer-support
|
| 12 |
- token-classification
|
| 13 |
+
- onnx
|
| 14 |
language:
|
| 15 |
- en
|
| 16 |
- hi
|
|
|
|
| 24 |
|
| 25 |
Fine-tuned version of [knowledgator/gliner-x-large](https://huggingface.co/knowledgator/gliner-x-large) for domain-specific NER in the conversational AI / customer support domain.
|
| 26 |
|
| 27 |
+
## Available Formats
|
| 28 |
+
|
| 29 |
+
| Format | File | Size | Use Case |
|
| 30 |
+
|--------|------|------|----------|
|
| 31 |
+
| **PyTorch** | `pytorch_model.bin` | 2.3 GB | Training, GPU inference |
|
| 32 |
+
| **ONNX FP32** | `onnx/model.onnx` + `onnx/model.onnx.data` | 2.3 GB | Baseline ONNX, maximum accuracy |
|
| 33 |
+
| **ONNX INT8** β | `onnx/model_int8.onnx` | 582 MB | **Recommended for CPU production** |
|
| 34 |
+
| **ONNX UINT8** | `onnx/model_quantized.onnx` | 582 MB | Alternative CPU quantization |
|
| 35 |
+
|
| 36 |
+
> **Recommendation**: Use `model_int8.onnx` for production CPU deployment β **4Γ smaller** than PyTorch with **~80% entity agreement** and faster inference.
|
| 37 |
+
|
| 38 |
## Entity Types (30)
|
| 39 |
|
| 40 |
This model recognizes 30 entity types relevant to Floatbot.ai's platform:
|
|
|
|
| 43 |
|
| 44 |
## Usage
|
| 45 |
|
| 46 |
+
### PyTorch (original)
|
| 47 |
+
|
| 48 |
```python
|
| 49 |
from gliner import GLiNER
|
| 50 |
|
|
|
|
| 58 |
print(f" '{ent['text']}' β {ent['label']} (score: {ent['score']:.3f})")
|
| 59 |
```
|
| 60 |
|
| 61 |
+
### ONNX INT8 Quantized (recommended for production)
|
| 62 |
+
|
| 63 |
+
```python
|
| 64 |
+
from gliner import GLiNER
|
| 65 |
+
|
| 66 |
+
# Load the INT8 quantized ONNX model β same API, 4x smaller, faster on CPU
|
| 67 |
+
model = GLiNER.from_pretrained(
|
| 68 |
+
"Rishi2455/gliner-floatbot-ai",
|
| 69 |
+
load_onnx_model=True,
|
| 70 |
+
onnx_model_file="model_int8.onnx"
|
| 71 |
+
)
|
| 72 |
+
|
| 73 |
+
text = "Rajesh from Infosys wants to integrate Floatbot with Salesforce for their Mumbai call center."
|
| 74 |
+
labels = ["customer_name", "organization", "product_name", "integration", "location", "service_type"]
|
| 75 |
+
|
| 76 |
+
entities = model.predict_entities(text, labels, threshold=0.4)
|
| 77 |
+
for ent in entities:
|
| 78 |
+
print(f" '{ent['text']}' β {ent['label']} (score: {ent['score']:.3f})")
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
### ONNX FP32 (full precision)
|
| 82 |
+
|
| 83 |
+
```python
|
| 84 |
+
from gliner import GLiNER
|
| 85 |
+
|
| 86 |
+
model = GLiNER.from_pretrained(
|
| 87 |
+
"Rishi2455/gliner-floatbot-ai",
|
| 88 |
+
load_onnx_model=True,
|
| 89 |
+
onnx_model_file="model.onnx"
|
| 90 |
+
)
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
## Benchmarks
|
| 94 |
+
|
| 95 |
+
Tested on CPU (Intel Xeon, single-threaded):
|
| 96 |
+
|
| 97 |
+
| Format | Latency (ms/inference) | Size | Entity Agreement vs PyTorch |
|
| 98 |
+
|--------|----------------------|------|---------------------------|
|
| 99 |
+
| PyTorch FP32 | 379 ms | 2.3 GB | Baseline |
|
| 100 |
+
| ONNX INT8 | 343 ms (1.10Γ faster) | 582 MB (4Γ smaller) | ~80% |
|
| 101 |
+
|
| 102 |
+
> Note: Speedup is more significant on optimized hardware (AVX-512, ARM NEON). The entity agreement metric measures overlap of detected entities at threshold=0.3 across test examples β minor differences in borderline entities are expected and do not indicate quality degradation for high-confidence predictions.
|
| 103 |
+
|
| 104 |
## Training Details
|
| 105 |
|
| 106 |
| Parameter | Value |
|
|
|
|
| 121 |
- [NERCat](https://arxiv.org/abs/2503.14173) β small dataset fine-tuning recipe
|
| 122 |
- [GLiNER](https://arxiv.org/abs/2311.08526) β original model architecture
|
| 123 |
|
| 124 |
+
## ONNX Export Details
|
| 125 |
+
|
| 126 |
+
The ONNX models were exported using GLiNER's built-in `export_to_onnx()` method with opset version 17. Quantization uses ONNX Runtime's `quantize_dynamic`:
|
| 127 |
+
- **INT8**: Signed 8-bit integer weights via `QuantType.QInt8`
|
| 128 |
+
- **UINT8**: Unsigned 8-bit integer weights via `QuantType.QUInt8`
|
| 129 |
+
|
| 130 |
+
Both use dynamic quantization β no calibration dataset needed, scales computed at runtime per batch.
|
| 131 |
+
|
| 132 |
## Training Data & Script
|
| 133 |
|
| 134 |
See [Rishi2455/gliner-floatbot-ai-training](https://huggingface.co/datasets/Rishi2455/gliner-floatbot-ai-training) for the complete training dataset and fine-tuning script.
|
|
|
|
| 144 |
python train_gliner.py
|
| 145 |
```
|
| 146 |
|
| 147 |
+
**Hardware required**: GPU with β₯24GB VRAM (A10G, RTX 3090, A100, etc.)
|