Distil-Qwen3-0.6B-Voice-Assistant-Banking

A fine-tuned Qwen3-0.6B model for multi-turn intent classification and slot extraction in a banking voice assistant. Trained using knowledge distillation from a 120B teacher model, this 0.6B model delivers 90.9% tool call accuracy — exceeding the teacher — while running at ~40ms inference, enabling real-time voice pipelines under 400ms total latency.

For the GGUF version (for llama.cpp deployment), see distil-labs/distil-qwen3-0.6b-voice-assistant-banking-gguf.

Results

Model	Parameters	Tool Call Accuracy	ROUGE
GPT-oss-120B (teacher)	120B	87.5%	94.4%
This model (tuned)	0.6B	90.9%	97.8%
Qwen3-0.6B (base)	0.6B	48.7%	66.3%

The fine-tuned model exceeds the 120B teacher on tool call accuracy while being 200x smaller. The base Qwen3-0.6B achieves only 48.7% — fine-tuning is essential for reliable multi-turn conversations.

Quick Start

Using Transformers

import json
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("distil-labs/distil-qwen3-0.6b-voice-assistant-banking")
tokenizer = AutoTokenizer.from_pretrained("distil-labs/distil-qwen3-0.6b-voice-assistant-banking")

TOOLS = [
    {"type": "function", "function": {"name": "check_balance", "description": "Check the balance of a bank account", "parameters": {"type": "object", "properties": {"account_type": {"type": "string", "enum": ["checking", "savings", "credit"]}}, "required": [], "additionalProperties": False}}},
    {"type": "function", "function": {"name": "transfer_money", "description": "Transfer money between accounts", "parameters": {"type": "object", "properties": {"amount": {"type": "number"}, "from_account": {"type": "string", "enum": ["checking", "savings"]}, "to_account": {"type": "string", "enum": ["checking", "savings"]}}, "required": [], "additionalProperties": False}}},
    {"type": "function", "function": {"name": "cancel_card", "description": "Cancel a bank card", "parameters": {"type": "object", "properties": {"card_type": {"type": "string", "enum": ["credit", "debit"]}, "card_last_four": {"type": "string"}, "reason": {"type": "string", "enum": ["lost", "stolen", "damaged", "other"]}}, "required": [], "additionalProperties": False}}},
    {"type": "function", "function": {"name": "intent_unclear", "description": "Use when the user's intent cannot be determined", "parameters": {"type": "object", "properties": {}, "required": [], "additionalProperties": False}}},
    {"type": "function", "function": {"name": "greeting", "description": "User is greeting", "parameters": {"type": "object", "properties": {}, "required": [], "additionalProperties": False}}},
    {"type": "function", "function": {"name": "goodbye", "description": "User is ending the conversation", "parameters": {"type": "object", "properties": {}, "required": [], "additionalProperties": False}}},
]

messages = [
    {"role": "system", "content": "You are a tool-calling model working on:\n<task_description>You are a voice assistant for BankCo, a retail bank. The user input is automatically transcribed speech from an ASR system, so it may contain transcription errors, homophones, filler words, or unusual phrasings. Parse the user's request and return the appropriate function call despite any transcription artifacts. If you can identify the intent, call the matching function. Extract any mentioned argument values; omit arguments not mentioned. If you cannot understand what the user wants, call intent_unclear(). Use conversation history to understand context from previous turns.</task_description>\n\nRespond to the conversation history by generating an appropriate tool call that satisfies the user request. Generate only the tool call according to the provided tool schema, do not generate anything else. Always respond with a tool call.\n\n"},
    {"role": "user", "content": "I need to cancel my credit card ending in 1234"},
]

text = tokenizer.apply_chat_template(
    messages, tools=TOOLS, tokenize=False, add_generation_prompt=True,
    enable_thinking=False,
)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
# <tool_call>
# {"name": "cancel_card", "arguments": {"card_type": "credit", "card_last_four": "1234"}}
# </tool_call>

Using with the Demo App

This model powers the BankCo Voice Assistant demo — a full ASR -> SLM -> TTS voice pipeline that runs locally.

Using llama.cpp

Download the GGUF version and serve it with llama.cpp:

huggingface-cli download distil-labs/distil-qwen3-0.6b-voice-assistant-banking-gguf \
    --local-dir distil-model

llama-server \
    --model distil-model/Qwen3-voice-assistant-slm-0.6B.gguf \
    --port 8000 \
    --jinja

Then query via the OpenAI-compatible API at http://127.0.0.1:8000/v1.

Model Details

Property	Value
Base Model	Qwen/Qwen3-0.6B
Parameters	0.6 billion
Architecture	Qwen3ForCausalLM
Context Length	40,960 tokens
Precision	bfloat16
Training Data	77 seed conversations, synthetically expanded
Teacher Model	GPT-oss-120B
Task	Multi-turn tool calling (closed book)

Training

This model was trained using the Distil Labs platform:

Seed Data: 77 hand-written multi-turn conversations covering 14 banking functions, including ASR transcription artifacts (filler words, homophones, word splits)
Synthetic Expansion: Expanded to thousands of examples using a 120B teacher model
Fine-tuning: Multi-turn tool calling distillation on Qwen3-0.6B

What the Model Does

The model acts as a function caller for a banking voice assistant. Given a user utterance (potentially with ASR errors) and conversation history, it outputs a structured tool call:

User: "Trans fur 500 from my savin to checkin"
Model: {"name": "transfer_money", "arguments": {"amount": 500, "from_account": "savings", "to_account": "checking"}}

User: "I wanna cancel my card"
Model: {"name": "cancel_card", "arguments": {}}

User: "What about that thing from last week"
Model: {"name": "intent_unclear", "arguments": {}}

Supported Functions

The model handles 14 banking operations:

Function	Description
`check_balance`	Check account balance
`get_statement`	Request account statement
`transfer_money`	Transfer between accounts
`pay_bill`	Pay a bill
`cancel_card`	Cancel a card
`replace_card`	Request replacement card
`activate_card`	Activate a new card
`report_fraud`	Report fraudulent transaction
`reset_pin`	Reset card PIN
`speak_to_human`	Connect to a human agent
`greeting`	Conversation start
`goodbye`	Conversation end
`thank_you`	Express gratitude
`intent_unclear`	Cannot determine intent

Use Cases

Real-time voice banking assistants (ASR -> SLM -> TTS pipeline)
Text-based banking chatbots with structured intent routing
Edge deployment for on-device voice processing
Any multi-turn tool calling task with bounded intent taxonomy

Limitations

Trained on English banking intents only
Covers 14 specific banking functions — not a general-purpose tool caller
ASR artifact handling is tuned for common speech-to-text errors, not all possible transcription formats
90.9% accuracy means ~1 in 10 function calls will be incorrect

License

This model is released under the Apache 2.0 license.

Citation

@misc{distil-qwen3-0.6b-voice-assistant-banking,
  author = {Distil Labs},
  title = {Distil-Qwen3-0.6B-Voice-Assistant-Banking: A Fine-tuned SLM for Banking Voice Assistants},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/distil-labs/distil-qwen3-0.6b-voice-assistant-banking}
}

Downloads last month: 7

Safetensors

Model size

0.6B params

Tensor type

BF16

Model tree for distil-labs/distil-qwen3-0.6b-voice-assistant-banking

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B