Qwen3.5-9B Opus 4.6 Reasoning Distill

A reasoning-focused fine-tune of Qwen/Qwen3.5-9B by Empero AI, trained to produce detailed chain-of-thought reasoning inside <think> tags before providing its final answer.

This model was distilled from Claude Opus 4.6 and Qwen3.5 reasoning traces, teaching the model to think step-by-step through problems before responding.

GGUF quantizations available at: empero-ai/Qwen3.5-9B-Claude-Opus-4.6-Distill-GGUF

How to Use

from transformers import AutoTokenizer, AutoModelForCausalLM
 
model_name = "empero-ai/qwen3.5-9b-opus-4.6-distill"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", trust_remote_code=True)
 
messages = [
    {"role": "system", "content": "You are a deep reasoning AI. When given a problem, you think through it carefully and methodically inside <think> tags before providing your final answer."},
    {"role": "user", "content": "What is the probability of rolling at least one six in four dice rolls?"}
]
 
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Output Format

The model outputs its reasoning process inside <think> tags, followed by a clear final answer:

<think>
Let me work through this step by step.
 
The probability of NOT rolling a six on a single die is 5/6.
The probability of NOT rolling a six on any of four rolls is (5/6)^4.
 
(5/6)^4 = 625/1296 ≈ 0.4823
 
So the probability of rolling at least one six is:
1 - 625/1296 = 671/1296 ≈ 0.5177
 
That's just over 50% — slightly more likely than not.
</think>
 
The probability of rolling at least one six in four dice rolls is **671/1296**, or approximately **51.77%**.
 
This is calculated by finding the complement: the chance of getting no sixes at all is (5/6)⁴ ≈ 48.23%, so the chance of at least one six is 1 − (5/6)⁴ ≈ 51.77%.

Training Details

Method

  • Base model: Qwen/Qwen3.5-9B
  • Technique: QLoRA 4-bit (rank 64, alpha 128) — Supervised Fine-Tuning
  • Epochs: 3 (best checkpoint selected at epoch 2)
  • Hardware: RTX 5090 (32GB VRAM)
  • Attention: SDPA
  • Optimizer: Paged AdamW 8-bit
  • Learning rate: 1e-4 with cosine schedule
  • Effective batch size: 8
  • Max sequence length: 4096
  • Training time: ~4.5 hours

Results

Metric Epoch 1 Epoch 2 (best) Epoch 3
Eval Loss 0.5205 0.4809 0.4915
Eval Token Accuracy 0.8494 0.8615 0.8617
Eval Entropy 0.508 0.434 0.394

Datasets

Dataset Examples Format
nohurry/Opus-4.6-Reasoning-3000x-filtered 2,326 Problem → thinking → solution
Jackrong/Qwen3.5-reasoning-700x 633 ShareGPT with <think> tags
TeichAI/claude-4.5-opus-high-reasoning-250x 250 Messages with <think> tags
Roman1111111/claude-opus-4.6-10000x 9,631 Messages with reasoning traces
Total 12,840

About Empero AI

This model was developed by Empero AI. We build open-source AI tools and models focused on advancing reasoning capabilities in smaller, efficient language models.

License

This model inherits the Apache 2.0 license from Qwen3.5-9B.

Downloads last month
959
Safetensors
Model size
9B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for empero-ai/Qwen3.5-9B-Claude-Opus-4.6-Distill

Finetuned
Qwen/Qwen3.5-9B
Finetuned
(166)
this model
Finetunes
1 model
Quantizations
1 model

Datasets used to train empero-ai/Qwen3.5-9B-Claude-Opus-4.6-Distill

Collection including empero-ai/Qwen3.5-9B-Claude-Opus-4.6-Distill