Qwen3-1.7B does not reliably enter assistant role without explicit generation prompt, unlike Qwen3-4B

#17

by Ki-Seki - opened Jan 22

Jan 22

Description

Hi Qwen team, thanks for releasing Qwen3 models.

I would like to report an inconsistency in generation behavior between Qwen3-1.7B and Qwen3-4B-Instruct, related to assistant role entry when no explicit generation prompt is added.

Minimal Reproduction

from transformers import AutoModelForCausalLM, AutoTokenizer

def run(model_name):
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(
        model_name, dtype="auto"
    ).to("cuda")

    query = "who are you"
    prompt = tokenizer.apply_chat_template(
        [{"role": "user", "content": query}],
        tokenize=False,
        # add_generation_prompt is intentionally NOT set
    )

    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)
    print(tokenizer.decode(outputs[0], skip_special_tokens=False))


run("Qwen/Qwen3-4B-Instruct-2507")
run("Qwen/Qwen3-1.7B")

Observed Behavior

Qwen3-4B-Instruct-2507
- The generated output includes <|im_start|>assistant
- The model correctly enters the assistant role even without an explicit generation prompt
Qwen3-1.7B
- The generated output does NOT include <|im_start|>assistant
- The model often fails to properly start an assistant response unless add_generation_prompt=True is used

Expected Behavior

Consistent behavior across Qwen3 models, or at least documented expectations that:

Smaller models (e.g. 1.7B) require an explicit generation prompt
Larger models may implicitly recover the assistant role boundary

Additional Context

This issue becomes particularly noticeable when:

Training or fine-tuning uses train_on_responses_only
Prompt tokens are masked with -100
The model is never trained to enter the assistant role, only to continue within it

The issue appears much more severe for Qwen3-1.7B, while larger models seem more robust.

Suggested Clarification / Fix

Document that add_generation_prompt=True is required for Qwen3-1.7B at inference time
or
Consider improving assistant role entry robustness for smaller Qwen3 variants

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment