Add AgentIntentRouter model — DeBERTa-v3-base fine-tuned for agent intent classification

Browse files

Files changed (6) hide show

README.md +157 -0
config.json +48 -0
label_mapping.json +32 -0
model.safetensors +3 -0
tokenizer.json +0 -0
tokenizer_config.json +14 -0

README.md ADDED Viewed

	@@ -0,0 +1,157 @@

+---
+license: apache-2.0
+base_model: distilbert-base-uncased
+tags:
+  - text-classification
+  - intent-detection
+  - agent-routing
+  - mcp
+  - ai-agents
+  - distilbert
+  - tool-use
+datasets:
+  - custom
+language:
+  - en
+metrics:
+  - accuracy
+  - f1
+pipeline_tag: text-classification
+library_name: transformers
+---
+# AgentIntentRouter
+A fast, lightweight intent classifier for AI agent and MCP tool routing. Given a user message, it predicts which tool or capability the agent should invoke — in under 50ms on CPU.
+Built on DistilBERT (66M params), fine-tuned on 12K+ diverse examples across 8 intent categories.
+## Why This Exists
+Every agent framework (LangChain, LangGraph, CrewAI, AutoGen) wastes an entire LLM call just to figure out *what the user wants*. That's 1-3 seconds and ~$0.01 per request — just for routing.
+AgentIntentRouter replaces that first LLM call with a 66M classifier that runs in **~10ms on CPU** and **~2ms on GPU**. Use it as the first step in your agent pipeline to instantly route to the right tool.
+## Intent Categories
+| Label | Description | Example |
+|-------|-------------|---------|
+| `code_generation` | User wants code written, debugged, or refactored | "Write a Python function to parse CSV" |
+| `web_search` | User wants to find information online | "What's the latest news on AI regulation" |
+| `math_calculation` | User needs computation or conversion | "Calculate 15% of 4500" |
+| `file_operation` | User wants to read, write, or manage files | "Read the config.json file" |
+| `api_call` | User wants to interact with an external API | "Send a Slack message to the team" |
+| `creative_writing` | User wants text composed or drafted | "Write a professional email to the client" |
+| `data_analysis` | User wants data interpreted or compared | "Compare React vs Vue performance" |
+| `general_chat` | Casual conversation, greetings, feedback | "Hey, how are you?" |
+## Quick Start
+```python
+from transformers import pipeline
+router = pipeline("text-classification", model="tripathyShaswata/AgentIntentRouter")
+# Single prediction
+result = router("Write a Python function to sort a list")
+print(result)
+# [{'label': 'code_generation', 'score': 0.98}]
+# Batch prediction
+messages = [
+    "Search for the latest AI papers",
+    "What's 25% of 1200?",
+    "Draft an email to my boss about the deadline",
+    "Hello!",
+]
+results = router(messages)
+for msg, res in zip(messages, results):
+    print(f"  {res['label']:>20} ({res['score']:.2f}) — {msg}")
+```
+## Use as Agent Router
+```python
+from transformers import pipeline
+router = pipeline("text-classification", model="tripathyShaswata/AgentIntentRouter")
+TOOL_MAP = {
+    "code_generation": handle_code_request,
+    "web_search": handle_search,
+    "math_calculation": handle_calculation,
+    "file_operation": handle_file_ops,
+    "api_call": handle_api_call,
+    "creative_writing": handle_writing,
+    "data_analysis": handle_analysis,
+    "general_chat": handle_chat,
+}
+def route(user_message: str):
+    intent = router(user_message)[0]
+    if intent["score"] < 0.5:
+        # Low confidence — fall back to LLM for routing
+        return fallback_llm_route(user_message)
+    handler = TOOL_MAP[intent["label"]]
+    return handler(user_message)
+```
+## Performance
+- **Inference speed:** ~10ms on CPU, ~2ms on GPU
+- **Model size:** ~260MB (DistilBERT-base)
+- **Accuracy:** 100% on test set
+### Evaluation Results
+*Results on held-out test set (1,124 examples):*
+| Metric | Score |
+|--------|-------|
+| Accuracy | 1.000 |
+| F1 (weighted) | 1.000 |
+*Per-class performance:*
+| Intent | Precision | Recall | F1 | Support |
+|--------|-----------|--------|-----|---------|
+| code_generation | 1.000 | 1.000 | 1.000 | 130 |
+| web_search | 1.000 | 1.000 | 1.000 | 151 |
+| math_calculation | 1.000 | 1.000 | 1.000 | 153 |
+| file_operation | 1.000 | 1.000 | 1.000 | 154 |
+| api_call | 1.000 | 1.000 | 1.000 | 133 |
+| creative_writing | 1.000 | 1.000 | 1.000 | 160 |
+| data_analysis | 1.000 | 1.000 | 1.000 | 168 |
+| general_chat | 1.000 | 1.000 | 1.000 | 75 |
+> **Note:** These results are on synthetic test data from the same distribution as training. Real-world performance will vary. Use the confidence score threshold to handle ambiguous inputs gracefully.
+## Training Details
+- **Base model:** distilbert-base-uncased
+- **Training data:** 8,987 examples (synthetic, template-generated with natural language variation)
+- **Validation:** 1,123 examples
+- **Test:** 1,124 examples
+- **Epochs:** 3 (with early stopping, patience=2)
+- **Learning rate:** 2e-5
+- **Batch size:** 32
+- **Max sequence length:** 128
+- **Training time:** ~100 seconds on NVIDIA RTX 4070
+- **Loss:** 0.0015 (training) / 0.0017 (validation)
+## Limitations
+- Trained on English text only
+- Template-generated training data may not cover all edge cases
+- Ambiguous messages (e.g., "help me with the API code") may get lower confidence scores — use the confidence threshold to fall back to an LLM
+- Not designed for multi-intent messages (e.g., "search for X and write code for Y")
+## License
+Apache 2.0 — use it however you want, commercial included.
+## Citation
+If you use this model, a star on the repo is appreciated!

config.json ADDED Viewed

	@@ -0,0 +1,48 @@

+{
+  "activation": "gelu",
+  "architectures": [
+    "DistilBertForSequenceClassification"
+  ],
+  "attention_dropout": 0.1,
+  "bos_token_id": null,
+  "dim": 768,
+  "dropout": 0.1,
+  "dtype": "float32",
+  "eos_token_id": null,
+  "hidden_dim": 3072,
+  "id2label": {
+    "0": "code_generation",
+    "1": "web_search",
+    "2": "math_calculation",
+    "3": "file_operation",
+    "4": "api_call",
+    "5": "creative_writing",
+    "6": "data_analysis",
+    "7": "general_chat"
+  },
+  "initializer_range": 0.02,
+  "label2id": {
+    "api_call": 4,
+    "code_generation": 0,
+    "creative_writing": 5,
+    "data_analysis": 6,
+    "file_operation": 3,
+    "general_chat": 7,
+    "math_calculation": 2,
+    "web_search": 1
+  },
+  "max_position_embeddings": 512,
+  "model_type": "distilbert",
+  "n_heads": 12,
+  "n_layers": 6,
+  "pad_token_id": 0,
+  "problem_type": "single_label_classification",
+  "qa_dropout": 0.1,
+  "seq_classif_dropout": 0.2,
+  "sinusoidal_pos_embds": false,
+  "tie_weights_": true,
+  "tie_word_embeddings": true,
+  "transformers_version": "5.5.3",
+  "use_cache": false,
+  "vocab_size": 30522
+}

label_mapping.json ADDED Viewed

	@@ -0,0 +1,32 @@

+{
+  "labels": [
+    "code_generation",
+    "web_search",
+    "math_calculation",
+    "file_operation",
+    "api_call",
+    "creative_writing",
+    "data_analysis",
+    "general_chat"
+  ],
+  "label2id": {
+    "code_generation": 0,
+    "web_search": 1,
+    "math_calculation": 2,
+    "file_operation": 3,
+    "api_call": 4,
+    "creative_writing": 5,
+    "data_analysis": 6,
+    "general_chat": 7
+  },
+  "id2label": {
+    "0": "code_generation",
+    "1": "web_search",
+    "2": "math_calculation",
+    "3": "file_operation",
+    "4": "api_call",
+    "5": "creative_writing",
+    "6": "data_analysis",
+    "7": "general_chat"
+  }
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2860c1008d96d52d06405b41a62cd0327f4a40828f06b2987c0caec1ec161292
+size 267851024

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "backend": "tokenizers",
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "is_local": false,
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}