MiniLM-L12-Grape-Route: Semantic Router for SysAdmin Voice Commands

Model Description

Grape-Route is a specialized text classification model designed to act as a semantic router for a system administration voice assistant. It is fine-tuned from microsoft/Multilingual-MiniLM-L12-H384.

The model's primary goal is to interpret natural language commands (Spanish input with technical English terms) and route them to the appropriate technical subsystem. It has been specifically trained to be robust against phonetic errors and typos typical of Speech-to-Text (STT) engines like Vosk (e.g., interpreting "docar" as "docker", "pin" as "ping", or "ese ese ache" as "ssh").

Intended Use

This model is intended to be the first layer of an intent recognition pipeline. It takes a raw string (transcribed from voice) and returns a categorical label with a confidence score.

Supported Categories (Labels)

The model classifies inputs into 8 specific distinct intents, coded with wine-based code names:

Label Domain Description
malbec Docker Management Containers, images, volumes, logs (e.g., "run nginx", "stop db").
syrah Networking Connectivity, ping, ports, IP, DNS (e.g., "check internet", "my ip").
tempranillo SysAdmin System processes, users, services, resources (e.g., "kill process", "create user", "check ram").
pinot Search File search, grep, find, localization (e.g., "find logs", "where is python").
chardonnay File Management Local file manipulation (e.g., "create folder", "delete file", "list directory").
cabernet Remote Access SSH connections, SCP transfers, tunneling (e.g., "connect to server", "send file to vps").
gemma General / Chat General knowledge questions, trivia, chit-chat (e.g., "tell me a joke", "capital of France").
null Out of Domain Irrelevant queries, personal questions, or noise (e.g., "order pizza", "call mom").

Training Data

The model was trained on a custom dataset containing approximately 1,500+ samples, consisting of:

  1. Real CLI Commands: Translated from natural language (based on datasets like nl2bash).
  2. Synthetic Variations: Grammatical variations of common sysadmin requests.
  3. Adversarial STT Noise: The dataset was heavily augmented with phonetic corruptions to simulate Vosk errors in Spanish (e.g., "doquer", "pines", "rut", "suda").

Performance and Reliability

Based on inference tests, the model exhibits the following behavior:

  • High Confidence (>85%): Technical commands for Docker (malbec), Files (chardonnay), and SysAdmin (tempranillo) are detected with high precision, even with misspellings.
  • Medium Confidence (~55-65%): Remote file transfers (SCP) involving long sentences may sometimes overlap with local file management. It is recommended to implement a confirmation logic if confidence is below 75% for destructive or remote actions.
  • OOD Rejection: Non-technical inputs are reliably classified as gemma (General) or null (Noise), typically with low confidence scores, allowing the system to safely ignore them.

How to Get Started

You can use this model with the Hugging Face pipeline API:

from transformers import pipeline

# Load the model
router = pipeline("text-classification", model="jrodriiguezg/minilm-l12-grape-route")

# Inference examples
commands = [
    "levanta un contenedor de nginx",       # Standard Docker command
    "haz un pin a google",                  # Network command with STT noise ("pin" instead of "ping")
    "borra el archivo de configuracion",    # File management
    "cuentame un chiste"                    # General chat
]

for cmd in commands:
    result = router(cmd)
    print(f"Command: {cmd} -> {result}")

Limitations Language: The model is optimized for Spanish inputs containing English technical jargon. It may not perform well on pure English sentences or other languages.

Context Window: As a BERT-based model, it analyzes single sentences. It does not retain conversational context (history).

SCP Ambiguity: Complex sentences requesting file transfers to remote servers ("move this file to the server") may occasionally be misclassified as local file management (chardonnay) instead of remote (cabernet).

Downloads last month
10
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jrodriiguezg/minilm-l12-grape-route

Finetuned
(33)
this model

Collection including jrodriiguezg/minilm-l12-grape-route