Distil-Qwen3-0.6B-Voice-Assistant-Banking

A fine-tuned Qwen3-0.6B model for multi-turn intent classification and slot extraction in a banking voice assistant. Trained using knowledge distillation from a 120B teacher model, this 0.6B model delivers 90.9% tool call accuracy — exceeding the teacher — while running at ~40ms inference, enabling real-time voice pipelines under 400ms total latency.

For the GGUF version (for llama.cpp deployment), see distil-labs/distil-qwen3-0.6b-voice-assistant-banking-gguf.

Results

Model Parameters Tool Call Accuracy ROUGE
GPT-oss-120B (teacher) 120B 87.5% 94.4%
This model (tuned) 0.6B 90.9% 97.8%
Qwen3-0.6B (base) 0.6B 48.7% 66.3%

The fine-tuned model exceeds the 120B teacher on tool call accuracy while being 200x smaller. The base Qwen3-0.6B achieves only 48.7% — fine-tuning is essential for reliable multi-turn conversations.

Quick Start

Using Transformers

import json
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("distil-labs/distil-qwen3-0.6b-voice-assistant-banking")
tokenizer = AutoTokenizer.from_pretrained("distil-labs/distil-qwen3-0.6b-voice-assistant-banking")

TOOLS = [
    {"type": "function", "function": {"name": "check_balance", "description": "Check the balance of a bank account", "parameters": {"type": "object", "properties": {"account_type": {"type": "string", "enum": ["checking", "savings", "credit"]}}, "required": [], "additionalProperties": False}}},
    {"type": "function", "function": {"name": "transfer_money", "description": "Transfer money between accounts", "parameters": {"type": "object", "properties": {"amount": {"type": "number"}, "from_account": {"type": "string", "enum": ["checking", "savings"]}, "to_account": {"type": "string", "enum": ["checking", "savings"]}}, "required": [], "additionalProperties": False}}},
    {"type": "function", "function": {"name": "cancel_card", "description": "Cancel a bank card", "parameters": {"type": "object", "properties": {"card_type": {"type": "string", "enum": ["credit", "debit"]}, "card_last_four": {"type": "string"}, "reason": {"type": "string", "enum": ["lost", "stolen", "damaged", "other"]}}, "required": [], "additionalProperties": False}}},
    {"type": "function", "function": {"name": "intent_unclear", "description": "Use when the user's intent cannot be determined", "parameters": {"type": "object", "properties": {}, "required": [], "additionalProperties": False}}},
    {"type": "function", "function": {"name": "greeting", "description": "User is greeting", "parameters": {"type": "object", "properties": {}, "required": [], "additionalProperties": False}}},
    {"type": "function", "function": {"name": "goodbye", "description": "User is ending the conversation", "parameters": {"type": "object", "properties": {}, "required": [], "additionalProperties": False}}},
]

messages = [
    {"role": "system", "content": "You are a tool-calling model working on:\n<task_description>You are a voice assistant for BankCo, a retail bank. The user input is automatically transcribed speech from an ASR system, so it may contain transcription errors, homophones, filler words, or unusual phrasings. Parse the user's request and return the appropriate function call despite any transcription artifacts. If you can identify the intent, call the matching function. Extract any mentioned argument values; omit arguments not mentioned. If you cannot understand what the user wants, call intent_unclear(). Use conversation history to understand context from previous turns.</task_description>\n\nRespond to the conversation history by generating an appropriate tool call that satisfies the user request. Generate only the tool call according to the provided tool schema, do not generate anything else. Always respond with a tool call.\n\n"},
    {"role": "user", "content": "I need to cancel my credit card ending in 1234"},
]

text = tokenizer.apply_chat_template(
    messages, tools=TOOLS, tokenize=False, add_generation_prompt=True,
    enable_thinking=False,
)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
# <tool_call>
# {"name": "cancel_card", "arguments": {"card_type": "credit", "card_last_four": "1234"}}
# </tool_call>

Using with the Demo App

This model powers the BankCo Voice Assistant demo — a full ASR -> SLM -> TTS voice pipeline that runs locally.

Using llama.cpp

Download the GGUF version and serve it with llama.cpp:

huggingface-cli download distil-labs/distil-qwen3-0.6b-voice-assistant-banking-gguf \
    --local-dir distil-model

llama-server \
    --model distil-model/Qwen3-voice-assistant-slm-0.6B.gguf \
    --port 8000 \
    --jinja

Then query via the OpenAI-compatible API at http://127.0.0.1:8000/v1.

Model Details

Property Value
Base Model Qwen/Qwen3-0.6B
Parameters 0.6 billion
Architecture Qwen3ForCausalLM
Context Length 40,960 tokens
Precision bfloat16
Training Data 77 seed conversations, synthetically expanded
Teacher Model GPT-oss-120B
Task Multi-turn tool calling (closed book)

Training

This model was trained using the Distil Labs platform:

  1. Seed Data: 77 hand-written multi-turn conversations covering 14 banking functions, including ASR transcription artifacts (filler words, homophones, word splits)
  2. Synthetic Expansion: Expanded to thousands of examples using a 120B teacher model
  3. Fine-tuning: Multi-turn tool calling distillation on Qwen3-0.6B

What the Model Does

The model acts as a function caller for a banking voice assistant. Given a user utterance (potentially with ASR errors) and conversation history, it outputs a structured tool call:

User: "Trans fur 500 from my savin to checkin"
Model: {"name": "transfer_money", "arguments": {"amount": 500, "from_account": "savings", "to_account": "checking"}}

User: "I wanna cancel my card"
Model: {"name": "cancel_card", "arguments": {}}

User: "What about that thing from last week"
Model: {"name": "intent_unclear", "arguments": {}}

Supported Functions

The model handles 14 banking operations:

Function Description
check_balance Check account balance
get_statement Request account statement
transfer_money Transfer between accounts
pay_bill Pay a bill
cancel_card Cancel a card
replace_card Request replacement card
activate_card Activate a new card
report_fraud Report fraudulent transaction
reset_pin Reset card PIN
speak_to_human Connect to a human agent
greeting Conversation start
goodbye Conversation end
thank_you Express gratitude
intent_unclear Cannot determine intent

Use Cases

  • Real-time voice banking assistants (ASR -> SLM -> TTS pipeline)
  • Text-based banking chatbots with structured intent routing
  • Edge deployment for on-device voice processing
  • Any multi-turn tool calling task with bounded intent taxonomy

Limitations

  • Trained on English banking intents only
  • Covers 14 specific banking functions — not a general-purpose tool caller
  • ASR artifact handling is tuned for common speech-to-text errors, not all possible transcription formats
  • 90.9% accuracy means ~1 in 10 function calls will be incorrect

License

This model is released under the Apache 2.0 license.

Links

Citation

@misc{distil-qwen3-0.6b-voice-assistant-banking,
  author = {Distil Labs},
  title = {Distil-Qwen3-0.6B-Voice-Assistant-Banking: A Fine-tuned SLM for Banking Voice Assistants},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/distil-labs/distil-qwen3-0.6b-voice-assistant-banking}
}
Downloads last month
7
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for distil-labs/distil-qwen3-0.6b-voice-assistant-banking

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(794)
this model