Token Classification
GLiNER
PyTorch
ONNX
English
Hindi
NER
named-entity-recognition
floatbot
conversational-ai
chatbot
customer-support
File size: 5,409 Bytes
fd5e469
 
 
 
 
 
 
 
 
 
 
 
6a6f282
fd5e469
 
 
 
 
 
 
 
 
 
 
 
 
6a6f282
 
 
 
 
 
 
 
 
 
 
fd5e469
 
 
 
 
 
 
 
6a6f282
 
fd5e469
 
 
 
 
 
 
 
 
 
 
 
 
6a6f282
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fd5e469
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6a6f282
 
 
 
 
 
 
 
fd5e469
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6a6f282
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
license: apache-2.0
base_model: knowledgator/gliner-x-large
tags:
  - gliner
  - NER
  - named-entity-recognition
  - floatbot
  - conversational-ai
  - chatbot
  - customer-support
  - token-classification
  - onnx
language:
  - en
  - hi
datasets:
  - Rishi2455/gliner-floatbot-ai-training
library_name: gliner
pipeline_tag: token-classification
---

# GLiNER Fine-Tuned for Floatbot.ai

Fine-tuned version of [knowledgator/gliner-x-large](https://huggingface.co/knowledgator/gliner-x-large) for domain-specific NER in the conversational AI / customer support domain.

## Available Formats

| Format | File | Size | Use Case |
|--------|------|------|----------|
| **PyTorch** | `pytorch_model.bin` | 2.3 GB | Training, GPU inference |
| **ONNX FP32** | `onnx/model.onnx` + `onnx/model.onnx.data` | 2.3 GB | Baseline ONNX, maximum accuracy |
| **ONNX INT8** ⭐ | `onnx/model_int8.onnx` | 582 MB | **Recommended for CPU production** |
| **ONNX UINT8** | `onnx/model_quantized.onnx` | 582 MB | Alternative CPU quantization |

> **Recommendation**: Use `model_int8.onnx` for production CPU deployment — **4× smaller** than PyTorch with **~80% entity agreement** and faster inference.

## Entity Types (30)

This model recognizes 30 entity types relevant to Floatbot.ai's platform:

`customer_name` · `organization` · `product_name` · `service_type` · `channel` · `date` · `time` · `monetary_amount` · `order_id` · `ticket_id` · `account_number` · `phone_number` · `email_address` · `complaint_category` · `intent_keyword` · `department` · `plan_name` · `feature_name` · `api_endpoint` · `bot_name` · `language` · `platform` · `integration` · `metric_name` · `percentage` · `duration` · `location` · `priority_level` · `status` · `error_type`

## Usage

### PyTorch (original)

```python
from gliner import GLiNER

model = GLiNER.from_pretrained("Rishi2455/gliner-floatbot-ai")

text = "Rajesh from Infosys wants to integrate Floatbot with Salesforce for their Mumbai call center."
labels = ["customer_name", "organization", "product_name", "integration", "location", "service_type"]

entities = model.predict_entities(text, labels, threshold=0.4)
for ent in entities:
    print(f"  '{ent['text']}' → {ent['label']} (score: {ent['score']:.3f})")
```

### ONNX INT8 Quantized (recommended for production)

```python
from gliner import GLiNER

# Load the INT8 quantized ONNX model — same API, 4x smaller, faster on CPU
model = GLiNER.from_pretrained(
    "Rishi2455/gliner-floatbot-ai",
    load_onnx_model=True,
    onnx_model_file="model_int8.onnx"
)

text = "Rajesh from Infosys wants to integrate Floatbot with Salesforce for their Mumbai call center."
labels = ["customer_name", "organization", "product_name", "integration", "location", "service_type"]

entities = model.predict_entities(text, labels, threshold=0.4)
for ent in entities:
    print(f"  '{ent['text']}' → {ent['label']} (score: {ent['score']:.3f})")
```

### ONNX FP32 (full precision)

```python
from gliner import GLiNER

model = GLiNER.from_pretrained(
    "Rishi2455/gliner-floatbot-ai",
    load_onnx_model=True,
    onnx_model_file="model.onnx"
)
```

## Benchmarks

Tested on CPU (Intel Xeon, single-threaded):

| Format | Latency (ms/inference) | Size | Entity Agreement vs PyTorch |
|--------|----------------------|------|---------------------------|
| PyTorch FP32 | 379 ms | 2.3 GB | Baseline |
| ONNX INT8 | 343 ms (1.10× faster) | 582 MB (4× smaller) | ~80% |

> Note: Speedup is more significant on optimized hardware (AVX-512, ARM NEON). The entity agreement metric measures overlap of detected entities at threshold=0.3 across test examples — minor differences in borderline entities are expected and do not indicate quality degradation for high-confidence predictions.

## Training Details

| Parameter | Value |
|-----------|-------|
| Base model | knowledgator/gliner-x-large (1.3B params) |
| Training samples | 86 |
| Entity types | 30 |
| Learning rate (encoder) | 5e-6 |
| Learning rate (others) | 1e-5 |
| Loss | Focal loss (α=0.75, γ=2) |
| Epochs | 12 |
| Effective batch size | 8 |

## Training Recipe

Based on published research:
- [GLiNER-BioMed](https://arxiv.org/abs/2504.00676) — domain adaptation blueprint
- [NERCat](https://arxiv.org/abs/2503.14173) — small dataset fine-tuning recipe
- [GLiNER](https://arxiv.org/abs/2311.08526) — original model architecture

## ONNX Export Details

The ONNX models were exported using GLiNER's built-in `export_to_onnx()` method with opset version 17. Quantization uses ONNX Runtime's `quantize_dynamic`:
- **INT8**: Signed 8-bit integer weights via `QuantType.QInt8`
- **UINT8**: Unsigned 8-bit integer weights via `QuantType.QUInt8`

Both use dynamic quantization — no calibration dataset needed, scales computed at runtime per batch.

## Training Data & Script

See [Rishi2455/gliner-floatbot-ai-training](https://huggingface.co/datasets/Rishi2455/gliner-floatbot-ai-training) for the complete training dataset and fine-tuning script.

## How to Run Training

```bash
pip install gliner torch transformers accelerate trackio huggingface_hub
huggingface-cli login

# Download and run the training script
wget https://huggingface.co/datasets/Rishi2455/gliner-floatbot-ai-training/resolve/main/train_gliner.py
python train_gliner.py
```

**Hardware required**: GPU with ≥24GB VRAM (A10G, RTX 3090, A100, etc.)