Qwen3.5-9B Opus 4.6 Reasoning Distill

A reasoning-focused fine-tune of Qwen/Qwen3.5-9B by Empero AI, trained to produce detailed chain-of-thought reasoning inside <think> tags before providing its final answer.

This model was distilled from Claude Opus 4.6 and Qwen3.5 reasoning traces, teaching the model to think step-by-step through problems before responding.

GGUF quantizations available at: empero-ai/Qwen3.5-9B-Claude-Opus-4.6-Distill-GGUF

How to Use

from transformers import AutoTokenizer, AutoModelForCausalLM
 
model_name = "empero-ai/qwen3.5-9b-opus-4.6-distill"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", trust_remote_code=True)
 
messages = [
    {"role": "system", "content": "You are a deep reasoning AI. When given a problem, you think through it carefully and methodically inside <think> tags before providing your final answer."},
    {"role": "user", "content": "What is the probability of rolling at least one six in four dice rolls?"}
]
 
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Output Format

The model outputs its reasoning process inside <think> tags, followed by a clear final answer:

<think>
Let me work through this step by step.
 
The probability of NOT rolling a six on a single die is 5/6.
The probability of NOT rolling a six on any of four rolls is (5/6)^4.
 
(5/6)^4 = 625/1296 ≈ 0.4823
 
So the probability of rolling at least one six is:
1 - 625/1296 = 671/1296 ≈ 0.5177
 
That's just over 50% — slightly more likely than not.
</think>
 
The probability of rolling at least one six in four dice rolls is **671/1296**, or approximately **51.77%**.
 
This is calculated by finding the complement: the chance of getting no sixes at all is (5/6)⁴ ≈ 48.23%, so the chance of at least one six is 1 − (5/6)⁴ ≈ 51.77%.

Training Details

Method

Base model: Qwen/Qwen3.5-9B
Technique: QLoRA 4-bit (rank 64, alpha 128) — Supervised Fine-Tuning
Epochs: 3 (best checkpoint selected at epoch 2)
Hardware: RTX 5090 (32GB VRAM)
Attention: SDPA
Optimizer: Paged AdamW 8-bit
Learning rate: 1e-4 with cosine schedule
Effective batch size: 8
Max sequence length: 4096
Training time: ~4.5 hours

Results

Metric	Epoch 1	Epoch 2 (best)	Epoch 3
Eval Loss	0.5205	0.4809	0.4915
Eval Token Accuracy	0.8494	0.8615	0.8617
Eval Entropy	0.508	0.434	0.394

Datasets

Dataset	Examples	Format
nohurry/Opus-4.6-Reasoning-3000x-filtered	2,326	Problem → thinking → solution
Jackrong/Qwen3.5-reasoning-700x	633	ShareGPT with `<think>` tags
TeichAI/claude-4.5-opus-high-reasoning-250x	250	Messages with `<think>` tags
Roman1111111/claude-opus-4.6-10000x	9,631	Messages with reasoning traces
Total	12,840

About Empero AI

This model was developed by Empero AI. We build open-source AI tools and models focused on advancing reasoning capabilities in smaller, efficient language models.