Qwen3-1.7B does not reliably enter assistant role without explicit generation prompt, unlike Qwen3-4B

#17
by Ki-Seki - opened

Description

Hi Qwen team, thanks for releasing Qwen3 models.

I would like to report an inconsistency in generation behavior between Qwen3-1.7B and Qwen3-4B-Instruct, related to assistant role entry when no explicit generation prompt is added.


Minimal Reproduction

from transformers import AutoModelForCausalLM, AutoTokenizer

def run(model_name):
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(
        model_name, dtype="auto"
    ).to("cuda")

    query = "who are you"
    prompt = tokenizer.apply_chat_template(
        [{"role": "user", "content": query}],
        tokenize=False,
        # add_generation_prompt is intentionally NOT set
    )

    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)
    print(tokenizer.decode(outputs[0], skip_special_tokens=False))


run("Qwen/Qwen3-4B-Instruct-2507")
run("Qwen/Qwen3-1.7B")

Observed Behavior

  • Qwen3-4B-Instruct-2507

    • The generated output includes <|im_start|>assistant
    • The model correctly enters the assistant role even without an explicit generation prompt
  • Qwen3-1.7B

    • The generated output does NOT include <|im_start|>assistant
    • The model often fails to properly start an assistant response unless add_generation_prompt=True is used

Expected Behavior

Consistent behavior across Qwen3 models, or at least documented expectations that:

  • Smaller models (e.g. 1.7B) require an explicit generation prompt
  • Larger models may implicitly recover the assistant role boundary

Additional Context

This issue becomes particularly noticeable when:

  • Training or fine-tuning uses train_on_responses_only
  • Prompt tokens are masked with -100
  • The model is never trained to enter the assistant role, only to continue within it

The issue appears much more severe for Qwen3-1.7B, while larger models seem more robust.


Suggested Clarification / Fix

  • Document that add_generation_prompt=True is required for Qwen3-1.7B at inference time
    or
  • Consider improving assistant role entry robustness for smaller Qwen3 variants

Sign up or log in to comment