File size: 5,409 Bytes
fd5e469 6a6f282 fd5e469 6a6f282 fd5e469 6a6f282 fd5e469 6a6f282 fd5e469 6a6f282 fd5e469 6a6f282 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | ---
license: apache-2.0
base_model: knowledgator/gliner-x-large
tags:
- gliner
- NER
- named-entity-recognition
- floatbot
- conversational-ai
- chatbot
- customer-support
- token-classification
- onnx
language:
- en
- hi
datasets:
- Rishi2455/gliner-floatbot-ai-training
library_name: gliner
pipeline_tag: token-classification
---
# GLiNER Fine-Tuned for Floatbot.ai
Fine-tuned version of [knowledgator/gliner-x-large](https://huggingface.co/knowledgator/gliner-x-large) for domain-specific NER in the conversational AI / customer support domain.
## Available Formats
| Format | File | Size | Use Case |
|--------|------|------|----------|
| **PyTorch** | `pytorch_model.bin` | 2.3 GB | Training, GPU inference |
| **ONNX FP32** | `onnx/model.onnx` + `onnx/model.onnx.data` | 2.3 GB | Baseline ONNX, maximum accuracy |
| **ONNX INT8** ⭐ | `onnx/model_int8.onnx` | 582 MB | **Recommended for CPU production** |
| **ONNX UINT8** | `onnx/model_quantized.onnx` | 582 MB | Alternative CPU quantization |
> **Recommendation**: Use `model_int8.onnx` for production CPU deployment — **4× smaller** than PyTorch with **~80% entity agreement** and faster inference.
## Entity Types (30)
This model recognizes 30 entity types relevant to Floatbot.ai's platform:
`customer_name` · `organization` · `product_name` · `service_type` · `channel` · `date` · `time` · `monetary_amount` · `order_id` · `ticket_id` · `account_number` · `phone_number` · `email_address` · `complaint_category` · `intent_keyword` · `department` · `plan_name` · `feature_name` · `api_endpoint` · `bot_name` · `language` · `platform` · `integration` · `metric_name` · `percentage` · `duration` · `location` · `priority_level` · `status` · `error_type`
## Usage
### PyTorch (original)
```python
from gliner import GLiNER
model = GLiNER.from_pretrained("Rishi2455/gliner-floatbot-ai")
text = "Rajesh from Infosys wants to integrate Floatbot with Salesforce for their Mumbai call center."
labels = ["customer_name", "organization", "product_name", "integration", "location", "service_type"]
entities = model.predict_entities(text, labels, threshold=0.4)
for ent in entities:
print(f" '{ent['text']}' → {ent['label']} (score: {ent['score']:.3f})")
```
### ONNX INT8 Quantized (recommended for production)
```python
from gliner import GLiNER
# Load the INT8 quantized ONNX model — same API, 4x smaller, faster on CPU
model = GLiNER.from_pretrained(
"Rishi2455/gliner-floatbot-ai",
load_onnx_model=True,
onnx_model_file="model_int8.onnx"
)
text = "Rajesh from Infosys wants to integrate Floatbot with Salesforce for their Mumbai call center."
labels = ["customer_name", "organization", "product_name", "integration", "location", "service_type"]
entities = model.predict_entities(text, labels, threshold=0.4)
for ent in entities:
print(f" '{ent['text']}' → {ent['label']} (score: {ent['score']:.3f})")
```
### ONNX FP32 (full precision)
```python
from gliner import GLiNER
model = GLiNER.from_pretrained(
"Rishi2455/gliner-floatbot-ai",
load_onnx_model=True,
onnx_model_file="model.onnx"
)
```
## Benchmarks
Tested on CPU (Intel Xeon, single-threaded):
| Format | Latency (ms/inference) | Size | Entity Agreement vs PyTorch |
|--------|----------------------|------|---------------------------|
| PyTorch FP32 | 379 ms | 2.3 GB | Baseline |
| ONNX INT8 | 343 ms (1.10× faster) | 582 MB (4× smaller) | ~80% |
> Note: Speedup is more significant on optimized hardware (AVX-512, ARM NEON). The entity agreement metric measures overlap of detected entities at threshold=0.3 across test examples — minor differences in borderline entities are expected and do not indicate quality degradation for high-confidence predictions.
## Training Details
| Parameter | Value |
|-----------|-------|
| Base model | knowledgator/gliner-x-large (1.3B params) |
| Training samples | 86 |
| Entity types | 30 |
| Learning rate (encoder) | 5e-6 |
| Learning rate (others) | 1e-5 |
| Loss | Focal loss (α=0.75, γ=2) |
| Epochs | 12 |
| Effective batch size | 8 |
## Training Recipe
Based on published research:
- [GLiNER-BioMed](https://arxiv.org/abs/2504.00676) — domain adaptation blueprint
- [NERCat](https://arxiv.org/abs/2503.14173) — small dataset fine-tuning recipe
- [GLiNER](https://arxiv.org/abs/2311.08526) — original model architecture
## ONNX Export Details
The ONNX models were exported using GLiNER's built-in `export_to_onnx()` method with opset version 17. Quantization uses ONNX Runtime's `quantize_dynamic`:
- **INT8**: Signed 8-bit integer weights via `QuantType.QInt8`
- **UINT8**: Unsigned 8-bit integer weights via `QuantType.QUInt8`
Both use dynamic quantization — no calibration dataset needed, scales computed at runtime per batch.
## Training Data & Script
See [Rishi2455/gliner-floatbot-ai-training](https://huggingface.co/datasets/Rishi2455/gliner-floatbot-ai-training) for the complete training dataset and fine-tuning script.
## How to Run Training
```bash
pip install gliner torch transformers accelerate trackio huggingface_hub
huggingface-cli login
# Download and run the training script
wget https://huggingface.co/datasets/Rishi2455/gliner-floatbot-ai-training/resolve/main/train_gliner.py
python train_gliner.py
```
**Hardware required**: GPU with ≥24GB VRAM (A10G, RTX 3090, A100, etc.)
|