Turn Detector Qwen3-1.7B

Fine-tuned Qwen3-1.7B for real-time turn-end detection in multilingual call center conversations.

The model predicts P(<|im_end|>) โ€” the probability that a speaker has finished their turn. Designed for low-latency voice agent pipelines (e.g. LiveKit) to determine when to respond.

How It Works

Given a conversation so far, the model outputs the probability of <|im_end|> as the next token:

  • P(im_end) > 0.5 โ†’ speaker is done talking (turn complete)
  • P(im_end) < 0.5 โ†’ speaker is still talking (turn incomplete)

Usage

import torch
import math
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Scicom-intl/Malaysian-Turn-Detector-Qwen3-1.7B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16).cuda().eval()

IM_END_ID = tokenizer.convert_tokens_to_ids("<|im_end|>")

def get_turn_end_prob(text):
    if text.endswith("<|im_end|>"):
        text = text[:-len("<|im_end|>")]
    inputs = tokenizer(text, return_tensors="pt").to("cuda")
    with torch.no_grad():
        logits = model(**inputs).logits
    prob = F.softmax(logits[0, -1], dim=-1)[IM_END_ID].item()
    return prob

Eval Results

Test set: 1200 samples (600 positive + 600 negative), 50 conversations per language pair.

Overall (threshold = 0.5)

Metric Score
Accuracy 96.67%
Precision 99.82%
Recall 93.50%
F1 96.56%

Per Language

Language Pair Overall Positive Negative
chinese-english 95.00% 90.00% 100.00%
chinese-malay 97.00% 94.00% 100.00%
chinese-tamil 97.00% 94.00% 100.00%
english-chinese 97.00% 96.00% 98.00%
english-malay 94.00% 88.00% 100.00%
english-tamil 95.00% 90.00% 100.00%
malay-chinese 97.00% 94.00% 100.00%
malay-english 96.00% 92.00% 100.00%
malay-tamil 97.00% 94.00% 100.00%
tamil-chinese 100.00% 100.00% 100.00%
tamil-english 97.00% 94.00% 100.00%
tamil-malay 98.00% 96.00% 100.00%

Threshold Sweep

Threshold Accuracy Precision Recall F1
0.1 99.00% 99.66% 98.33% 98.99%
0.2 98.67% 99.66% 97.67% 98.65%
0.3 98.00% 99.66% 96.33% 97.97%
0.4 97.58% 99.65% 95.50% 97.53%
0.5 96.67% 99.82% 93.50% 96.56%
0.6 95.50% 99.82% 91.17% 95.30%
0.7 93.67% 99.81% 87.50% 93.25%
0.8 91.17% 100.00% 82.33% 90.31%
0.9 83.83% 100.00% 67.67% 80.72%

Confusion Matrix (threshold = 0.5)

Pred Pos Pred Neg
Actual Pos 561 39
Actual Neg 1 599

Probability Distribution

Class Mean Median Min Max
Positive (turn complete) 0.8813 0.9673 0.0063 1.0000
Negative (turn incomplete) 0.0020 0.0000 0.0000 0.7022

Dataset

Tokenized parquet datasets (chinidataset format) available at Scicom-intl/turn-detector-Qwen3-0.6B-dataset.

turn-detector-Qwen3-0.6B-dataset/
โ”œโ”€โ”€ train-merged/
โ”œโ”€โ”€ train/
โ””โ”€โ”€ test/

Training

  • Base model: Qwen/Qwen3-1.7B
  • Training data: Positive samples only (complete conversations ending with <|im_end|>)
  • Loss: Liger Fused Linear Cross Entropy
  • Attention: Flash Attention 3
  • Precision: bfloat16
  • Block size: 8192 (multipacked)
  • Batch size: 2 x 16 gradient accumulation
  • Learning rate: 2e-5 (constant)
  • Epochs: 1

Training Data Sources


Downloads last month
507
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Scicom-intl/Malaysian-Turn-Detector-Qwen3-1.7B

Finetuned
Qwen/Qwen3-1.7B
Finetuned
(623)
this model