MiniLM-L12-Grape-Route: Semantic Router for SysAdmin Voice Commands

Model Description

Grape-Route is a specialized text classification model designed to act as a semantic router for a system administration voice assistant. It is fine-tuned from microsoft/Multilingual-MiniLM-L12-H384.

The model's primary goal is to interpret natural language commands (Spanish input with technical English terms) and route them to the appropriate technical subsystem. It has been specifically trained to be robust against phonetic errors and typos typical of Speech-to-Text (STT) engines like Vosk (e.g., interpreting "docar" as "docker", "pin" as "ping", or "ese ese ache" as "ssh").

Intended Use

This model is intended to be the first layer of an intent recognition pipeline. It takes a raw string (transcribed from voice) and returns a categorical label with a confidence score.

Supported Categories (Labels)

The model classifies inputs into 8 specific distinct intents, coded with wine-based code names:

Label	Domain	Description
malbec	Docker Management	Containers, images, volumes, logs (e.g., "run nginx", "stop db").
syrah	Networking	Connectivity, ping, ports, IP, DNS (e.g., "check internet", "my ip").
tempranillo	SysAdmin	System processes, users, services, resources (e.g., "kill process", "create user", "check ram").
pinot	Search	File search, grep, find, localization (e.g., "find logs", "where is python").
chardonnay	File Management	Local file manipulation (e.g., "create folder", "delete file", "list directory").
cabernet	Remote Access	SSH connections, SCP transfers, tunneling (e.g., "connect to server", "send file to vps").
gemma	General / Chat	General knowledge questions, trivia, chit-chat (e.g., "tell me a joke", "capital of France").
null	Out of Domain	Irrelevant queries, personal questions, or noise (e.g., "order pizza", "call mom").

Training Data

The model was trained on a custom dataset containing approximately 1,500+ samples, consisting of:

Real CLI Commands: Translated from natural language (based on datasets like nl2bash).
Synthetic Variations: Grammatical variations of common sysadmin requests.
Adversarial STT Noise: The dataset was heavily augmented with phonetic corruptions to simulate Vosk errors in Spanish (e.g., "doquer", "pines", "rut", "suda").

Performance and Reliability

Based on inference tests, the model exhibits the following behavior:

High Confidence (>85%): Technical commands for Docker (malbec), Files (chardonnay), and SysAdmin (tempranillo) are detected with high precision, even with misspellings.
Medium Confidence (~55-65%): Remote file transfers (SCP) involving long sentences may sometimes overlap with local file management. It is recommended to implement a confirmation logic if confidence is below 75% for destructive or remote actions.
OOD Rejection: Non-technical inputs are reliably classified as gemma (General) or null (Noise), typically with low confidence scores, allowing the system to safely ignore them.

How to Get Started

You can use this model with the Hugging Face pipeline API:

from transformers import pipeline

# Load the model
router = pipeline("text-classification", model="jrodriiguezg/minilm-l12-grape-route")

# Inference examples
commands = [
    "levanta un contenedor de nginx",       # Standard Docker command
    "haz un pin a google",                  # Network command with STT noise ("pin" instead of "ping")
    "borra el archivo de configuracion",    # File management
    "cuentame un chiste"                    # General chat
]

for cmd in commands:
    result = router(cmd)
    print(f"Command: {cmd} -> {result}")

Limitations Language: The model is optimized for Spanish inputs containing English technical jargon. It may not perform well on pure English sentences or other languages.

Context Window: As a BERT-based model, it analyzes single sentences. It does not retain conversational context (history).

SCP Ambiguity: Complex sentences requesting file transfers to remote servers ("move this file to the server") may occasionally be misclassified as local file management (chardonnay) instead of remote (cabernet).

Downloads last month: 10

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for jrodriiguezg/minilm-l12-grape-route

Base model

microsoft/Multilingual-MiniLM-L12-H384

Finetuned

(33)

this model

Collection including jrodriiguezg/minilm-l12-grape-route

Grape-models

Collection

Grape models used on WatermelonD proyect • 5 items • Updated Feb 2 • 1