RAS1981/qwen3-0.6b-turn-detection-v3
This model is a fine-tuned version of RAS1981/qwen3-0.6b-turn-detection-v1. It has been trained on an expanded dataset, incorporating approximately 50,000 additional examples to improve robustness and generalization in conversational turn detection.
Model Details
- Base Model: RAS1981/qwen3-0.6b-turn-detection-v1
- Training Data: Original V1 dataset + ~50k new examples.
- Task: Turn Detection (Binary Classification via Next-Token Prediction).
- Language: Russian (primary evaluation context), English.
- Architecture: Qwen3-0.6B (Transformer).
Intended Use
This model is designed for real-time voice bots to detect when a user has finished speaking. It predicts the probability of the <|im_end|> token at the end of a text segment.
- Input: ASR transcript of the user's speech.
- Output: Probability of turn completion.
- Threshold: 0.5 (EOS Probability > 0.5 indicates "Turn Finished").
Evaluation Results
The model was evaluated on the same 75-sample test set used for V2, categorized into:
- G1 (FINISHED): Completed sentences (Expected: END).
- G2 (UNFINISHED): Incomplete sentences (Expected: WAIT).
- G3 (PAUSE): Pauses/fillers (Expected: WAIT).
Summary Metrics
- Total Samples: 75
- Correct Predictions: 43 (57.3%)
- Failures: 32 (42.7%)
- Threshold: 0.5
| Metric | Count | Percentage | Description |
|---|---|---|---|
| True Negative | 20 | 26.7% | Correctly identified incomplete turn (WAIT) |
| False Positive | 32 | 42.7% | Incorrectly identified incomplete turn as finished (Interruption) |
| False Negative | 0 | 0.0% | Incorrectly identified finished turn as incomplete (Latency) |
| True Positive | 23 | 30.7% | Correctly identified finished turn (END) |
Performance by Group
| Group | Total | Correct | Incorrect | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|---|---|---|
| G1 (Finished) | 23 | 23 | 0 | 100.0% | 1.00 | 1.00 | 1.00 |
| G2 (Unfinished) | 42 | 20 | 22 | 47.6% | 0.00 | 0.00 | 0.00 |
| G3 (Pause) | 10 | 0 | 10 | 0.0% | 0.00 | 0.00 | 0.00 |
Analysis & Comparison to V2
Despite the addition of 50k training examples, V3 shows slightly lower accuracy (57.3% vs 60.0%) compared to V2 on this specific test set.
- G1 (Finished): Maintains perfect performance (100%). The model never misses a true end-of-turn.
- G2 (Unfinished): Accuracy dropped slightly (47.6% vs 50.0%). The model remains overly aggressive in predicting completion.
- G3 (Pause): Performance dropped to 0%. The model now misclassifies all pauses/fillers as completed turns.
Key Observations
The increased dataset size seems to have biased the model further towards predicting "Complete." This might be due to:
- Data Imbalance: The new 50k examples likely contain a high proportion of completed turns or "clean" text, reinforcing the bias against incomplete/messy speech.
- Overfitting to Completeness: The model has become extremely confident in predicting EOS, often assigning >99% probability to incomplete sentences (e.g., "Поэтому для начала очень важно, чтобы там находилось это." -> 99.27%).
Failure Patterns
The failures are identical in nature to V2 but often with higher confidence:
- Text: "...чтобы там находилось это." (EOS: 0.99)
- Text: "...какие варианты у вас." (EOS: 0.99)
- Text: "...для меня слишком." (EOS: 0.99)
The model ignores semantic cues ("для начала", "чтобы") that signal continuation, treating almost any syntactically plausible clause end as a turn end.
Recommendations for V4
To correct this "interruption bias":
- Resample Training Data: Drastically reduce the number of "Complete" examples or upsample "Incomplete" examples.
- Hard Negative Mining: Generate synthetic "Incomplete" examples by cutting off valid sentences at high-probability points (conjunctions, prepositions) and labeling them as WAIT.
- Pause-Specific Training: Explicitly fine-tune on a dataset of fillers ("э-э", "ну", "м-м") labeled as incomplete.
How to Use (Inference)
from unsloth import FastLanguageModel
import torch
model_name = "RAS1981/qwen3-0.6b-turn-detection-v3"
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=model_name,
load_in_4bit=True,
max_seq_length=2048,
)
EOS_ID = 151645 # <|im_end|>
def get_turn_probability(text):
messages = [
{"role": "system", "content": "Ты определяешь конец реплики пользователя по смыслу."},
{"role": "user", "content": text}
]
# Important: Disable thinking and strip trailing EOS for prediction
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False, enable_thinking=False)
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to("cuda")
# Strip auto-added EOS if present
if inputs.input_ids[0][-1] == EOS_ID:
inputs.input_ids = inputs.input_ids[:, :-1]
with torch.no_grad():
logits = model(**inputs).logits[:, -1, :]
probs = torch.softmax(logits, dim=-1)
eos_prob = probs[0, EOS_ID].item()
return eos_prob
text = "Алло, здравствуйте"
print(f"Turn Probability: {get_turn_probability(text):.4f}")
- Downloads last month
- 3
Model tree for RAS1981/qwen3-0.6b-turn-detection-v3
Base model
RAS1981/qwen3-0.6b-turn-detection-v1