RAS1981/qwen3-0.6b-turn-detection-v3

This model is a fine-tuned version of RAS1981/qwen3-0.6b-turn-detection-v1. It has been trained on an expanded dataset, incorporating approximately 50,000 additional examples to improve robustness and generalization in conversational turn detection.

Model Details

  • Base Model: RAS1981/qwen3-0.6b-turn-detection-v1
  • Training Data: Original V1 dataset + ~50k new examples.
  • Task: Turn Detection (Binary Classification via Next-Token Prediction).
  • Language: Russian (primary evaluation context), English.
  • Architecture: Qwen3-0.6B (Transformer).

Intended Use

This model is designed for real-time voice bots to detect when a user has finished speaking. It predicts the probability of the <|im_end|> token at the end of a text segment.

  • Input: ASR transcript of the user's speech.
  • Output: Probability of turn completion.
  • Threshold: 0.5 (EOS Probability > 0.5 indicates "Turn Finished").

Evaluation Results

The model was evaluated on the same 75-sample test set used for V2, categorized into:

  • G1 (FINISHED): Completed sentences (Expected: END).
  • G2 (UNFINISHED): Incomplete sentences (Expected: WAIT).
  • G3 (PAUSE): Pauses/fillers (Expected: WAIT).

Summary Metrics

  • Total Samples: 75
  • Correct Predictions: 43 (57.3%)
  • Failures: 32 (42.7%)
  • Threshold: 0.5
Metric Count Percentage Description
True Negative 20 26.7% Correctly identified incomplete turn (WAIT)
False Positive 32 42.7% Incorrectly identified incomplete turn as finished (Interruption)
False Negative 0 0.0% Incorrectly identified finished turn as incomplete (Latency)
True Positive 23 30.7% Correctly identified finished turn (END)

Performance by Group

Group Total Correct Incorrect Accuracy Precision Recall F1
G1 (Finished) 23 23 0 100.0% 1.00 1.00 1.00
G2 (Unfinished) 42 20 22 47.6% 0.00 0.00 0.00
G3 (Pause) 10 0 10 0.0% 0.00 0.00 0.00

Analysis & Comparison to V2

Despite the addition of 50k training examples, V3 shows slightly lower accuracy (57.3% vs 60.0%) compared to V2 on this specific test set.

  • G1 (Finished): Maintains perfect performance (100%). The model never misses a true end-of-turn.
  • G2 (Unfinished): Accuracy dropped slightly (47.6% vs 50.0%). The model remains overly aggressive in predicting completion.
  • G3 (Pause): Performance dropped to 0%. The model now misclassifies all pauses/fillers as completed turns.

Key Observations

The increased dataset size seems to have biased the model further towards predicting "Complete." This might be due to:

  1. Data Imbalance: The new 50k examples likely contain a high proportion of completed turns or "clean" text, reinforcing the bias against incomplete/messy speech.
  2. Overfitting to Completeness: The model has become extremely confident in predicting EOS, often assigning >99% probability to incomplete sentences (e.g., "Поэтому для начала очень важно, чтобы там находилось это." -> 99.27%).

Failure Patterns

The failures are identical in nature to V2 but often with higher confidence:

  • Text: "...чтобы там находилось это." (EOS: 0.99)
  • Text: "...какие варианты у вас." (EOS: 0.99)
  • Text: "...для меня слишком." (EOS: 0.99)

The model ignores semantic cues ("для начала", "чтобы") that signal continuation, treating almost any syntactically plausible clause end as a turn end.

Recommendations for V4

To correct this "interruption bias":

  1. Resample Training Data: Drastically reduce the number of "Complete" examples or upsample "Incomplete" examples.
  2. Hard Negative Mining: Generate synthetic "Incomplete" examples by cutting off valid sentences at high-probability points (conjunctions, prepositions) and labeling them as WAIT.
  3. Pause-Specific Training: Explicitly fine-tune on a dataset of fillers ("э-э", "ну", "м-м") labeled as incomplete.

How to Use (Inference)

from unsloth import FastLanguageModel
import torch

model_name = "RAS1981/qwen3-0.6b-turn-detection-v3"
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    load_in_4bit=True,
    max_seq_length=2048,
)
EOS_ID = 151645 # <|im_end|>

def get_turn_probability(text):
    messages = [
        {"role": "system", "content": "Ты определяешь конец реплики пользователя по смыслу."},
        {"role": "user", "content": text}
    ]
    # Important: Disable thinking and strip trailing EOS for prediction
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False, enable_thinking=False)
    inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to("cuda")
    
    # Strip auto-added EOS if present
    if inputs.input_ids[0][-1] == EOS_ID:
        inputs.input_ids = inputs.input_ids[:, :-1]
        
    with torch.no_grad():
        logits = model(**inputs).logits[:, -1, :]
        probs = torch.softmax(logits, dim=-1)
        eos_prob = probs[0, EOS_ID].item()
        
    return eos_prob

text = "Алло, здравствуйте"
print(f"Turn Probability: {get_turn_probability(text):.4f}")
Downloads last month
3
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RAS1981/qwen3-0.6b-turn-detection-v3

Finetuned
(1)
this model