Qwen3.5-9B Claude-Distill

A fine-tuned version of Qwen/Qwen3.5-9B through knowledge distillation from Claude. This model is trained with full parameter fine-tuning on curated Claude reasoning traces.

Model Highlights

  • Claude-Distilled Reasoning: Trained on high-quality chain-of-thought reasoning traces distilled from Claude Opus
  • Multi-Domain Coverage: Math, logic, coding, creative writing, STEM, and multi-turn reasoning
  • Dense Architecture: Based on Qwen/Qwen3.5-9B with 9B parameters
  • Multimodal Capable: Inherits vision-language capabilities from Qwen3.5

Model Description

Property Value
Base Model Qwen/Qwen3.5-9B
Model Type Causal Language Model with Vision Encoder
Parameters 9B
Languages English, Chinese
License Apache 2.0
Developer Kassadin88

Training Data

Distilled from Claude on the following datasets:

Dataset Samples Description
Claude Opus 4.5 High Reasoning 250 High reasoning depth samples
Claude Opus 4.6 Reasoning 9,633 Math, logic puzzles, multi-step instructions with CoT
Claude Opus 4.6 High Reasoning 757 Coding and creative writing with adaptive reasoning
Claude Opus 4.6 Extended Reasoning 500 Extended reasoning across STEM and practical domains
Claude Opus 4.6 Extended Reasoning 887x 887 Tool calling, bullshit detection, multi-turn traces
Claude Sonnet & Opus 4.6 Reasoning 524 Natural human-written prompts from Reddit & Stack Overflow
Opus 4.6 Reasoning Filtered 2,326 Filtered reasoning traces (refusals removed)

Total: ~14.9K samples

Data Composition

Domain Percentage Description
Math & Logic ~40% Multi-step problem solving with chain-of-thought
Coding ~25% Code generation, debugging, and algorithm design
STEM ~15% Science, engineering, and extended reasoning
Creative Writing ~10% Adaptive reasoning for creative tasks
Multi-turn / Tool Use ~10% Tool calling, clarification, and dialogue

Benchmark Results

Benchmark Results

For detailed benchmark results and model architecture, please refer to the original Qwen/Qwen3.5-9B model card.

Quickstart

For full usage guide, please refer to the original Qwen/Qwen3.5-9B model card.

Using with vLLM

vllm serve Kassadin88/Qwen3.5-9B-Claude-distill \
    --port 8000 \
    --tensor-parallel-size 2 \
    --max-model-len 32768 \
    --trust-remote-code \
    --reasoning-parser qwen3

Using with SGLang

python -m sglang.launch_server \
    --model-path Kassadin88/Qwen3.5-9B-Claude-distill \
    --port 8000 \
    --tp-size 2 \
    --mem-fraction-static 0.8 \
    --context-length 32768 \
    --reasoning-parser qwen3

Using with Hugging Face Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Kassadin88/Qwen3.5-9B-Claude-distill"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True
)

messages = [
    {"role": "user", "content": "Hello, how are you?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Usage Tips

For Reasoning Tasks

messages = [
    {"role": "user", "content": "Solve step by step: What is the sum of all prime numbers less than 100?"}
]
# Model will use chain-of-thought reasoning from Claude distillation

For Coding Tasks

messages = [
    {"role": "user", "content": "Implement a binary search tree with insert, delete, and find operations in Python."}
]
# Model benefits from Claude's coding reasoning traces

Enabling / Disabling Thinking

# Enable thinking mode (recommended for reasoning tasks)
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=True)

# Disable thinking mode (for simple tasks, faster inference)
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)

Limitations

  • This model is distilled from Claude and may inherit biases from the training data
  • The distillation dataset is relatively small (~14.9K samples), which may limit generalization
  • Should not be used for medical, legal, or financial advice without verification
  • The model's reasoning capabilities are constrained by the quality and diversity of the distillation data

Citation

@misc{qwen3.5-9b-claude-distill,
    author = {Kassadin88},
    title = {Qwen3.5-9B Claude-Distill: A Claude-Distilled Fine-Tuned Model},
    year = {2026},
    publisher = {HuggingFace},
    url = {https://huggingface.co/Kassadin88/Qwen3.5-9B-Claude-distill}
}

Acknowledgments

  • Base Model: Qwen Team for Qwen3.5
  • Training Data: Various Claude Opus reasoning datasets on HuggingFace
  • Training Framework: DeepSpeed

Note: This model is intended for research and educational purposes. Please use responsibly.

Downloads last month
626
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kassadin88/Qwen3.5-9B-Claude-distill

Finetuned
Qwen/Qwen3.5-9B
Finetuned
(192)
this model