Malaysian-Turn-Detector-Qwen3-0.6B

Fine-tuned Qwen3-0.6B for real-time turn-end detection in Malaysian multilingual call center conversations.

The model predicts P(<|im_end|>) โ€” the probability that a speaker has finished their turn. Designed for low-latency voice agent pipelines (e.g. LiveKit) to determine when to respond.

How It Works

Given a conversation so far, the model outputs the probability of <|im_end|> as the next token:

  • P(im_end) > 0.5 โ†’ speaker is done talking (turn complete)
  • P(im_end) < 0.5 โ†’ speaker is still talking (turn incomplete)

Usage

import torch
import math
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Scicom-intl/Malaysian-Turn-Detector-Qwen3-0.6B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16).cuda().eval()

IM_END_ID = tokenizer.convert_tokens_to_ids("<|im_end|>")

def get_turn_end_prob(text):
    """Returns probability that the speaker's turn is complete."""
    # Strip trailing <|im_end|> so the model predicts whether to emit it
    if text.endswith("<|im_end|>"):
        text = text[:-len("<|im_end|>")]
    inputs = tokenizer(text, return_tensors="pt").to("cuda")
    with torch.no_grad():
        logits = model(**inputs).logits
    prob = F.softmax(logits[0, -1], dim=-1)[IM_END_ID].item()
    return prob

# Complete turn - should be high probability
text = "<|im_start|>user\nHello, saya nak tanya pasal bil saya.<|im_end|>\n<|im_start|>assistant\nBoleh, sila berikan nombor akaun anda."
prob = get_turn_end_prob(text)
print(f"P(turn complete) = {prob:.4f}")  # ~0.74

# Incomplete turn - should be low probability
text = "<|im_start|>user\nHello, saya nak tanya pasal bil saya.<|im_end|>\n<|im_start|>assistant\nBoleh, sila berikan nombor"
prob = get_turn_end_prob(text)
print(f"P(turn complete) = {prob:.4f}")  # ~0.00

Eval Results

Test set: 1200 samples (600 positive + 600 negative), 50 conversations per language pair.

Overall (threshold = 0.5)

Metric Score
Accuracy 97.00%
Precision 100.00%
Recall 94.00%
F1 96.91%

Per Language

Language Pair Overall Positive Negative
chinese-english 97.00% 94.00% 100.00%
chinese-malay 97.00% 94.00% 100.00%
chinese-tamil 97.00% 94.00% 100.00%
english-chinese 100.00% 100.00% 100.00%
english-malay 97.00% 94.00% 100.00%
english-tamil 94.00% 88.00% 100.00%
malay-chinese 97.00% 94.00% 100.00%
malay-english 93.00% 86.00% 100.00%
malay-tamil 95.00% 90.00% 100.00%
tamil-chinese 100.00% 100.00% 100.00%
tamil-english 98.00% 96.00% 100.00%
tamil-malay 99.00% 98.00% 100.00%

Threshold Sweep

Threshold Accuracy Precision Recall F1
0.1 99.33% 99.66% 99.00% 99.33%
0.2 99.00% 99.66% 98.33% 98.99%
0.3 98.83% 100.00% 97.67% 98.82%
0.4 98.50% 100.00% 97.00% 98.48%
0.5 97.00% 100.00% 94.00% 96.91%
0.6 96.08% 100.00% 92.17% 95.92%
0.7 94.67% 100.00% 89.33% 94.37%
0.8 90.92% 100.00% 81.83% 90.01%
0.9 83.92% 100.00% 67.83% 80.83%

Probability Distribution

Class Mean Median Min Max
Positive (turn complete) 0.8817 0.9569 0.0046 0.9997
Negative (turn incomplete) 0.0010 0.0000 0.0000 0.2509

Training

  • Base model: Qwen/Qwen3-0.6B
  • Training data: Positive samples only (complete conversations ending with <|im_end|>)
  • Loss: Liger Fused Linear Cross Entropy
  • Attention: Flash Attention 2
  • Precision: bfloat16
  • Block size: 8192 (multipacked)
  • Batch size: 4 x 8 gradient accumulation
  • Learning rate: 2e-5 (constant)
  • Epochs: 1

Training Data Sources

Dataset Source
Call Center Language Switching Scicom-intl/Call-Center-Language-Switching
Function Call Scicom-intl/Function-Call
Malaysian Multiturn Chat Assistant mesolitica/Malaysian-Multiturn-Chat-Assistant
Malaysian Speech Instructions mesolitica/Malaysian-Speech-Instructions

WandB

Source code

Source code at https://github.com/Scicom-AI-Enterprise-Organization/small-ablation/tree/main/turn-detector

Downloads last month
345
Safetensors
Model size
0.6B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Scicom-intl/Malaysian-Turn-Detector-Qwen3-0.6B

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(794)
this model