π Qwen3.5-0.8B-Claude-4.6-Opus-Reasoning-Distilled
π’ Release Note Build Environment:
- Training Method: Supervised Fine-Tuning (SFT)
- Base Model: Qwen3.5-0.8B
- Training Libraries: Hugging Face Transformers + TRL
π‘ Model Introduction
Qwen3.5-0.8B-Claude-4.6-Opus-Reasoning-Distilled is a reasoning-focused model fine-tuned from Qwen3.5-0.8B using structured reasoning traces derived from Claude-4.6 Opus style reasoning datasets.
The model is trained to produce structured chain-of-thought reasoning, enabling it to:
- Break complex problems into logical steps
- Produce internal reasoning inside
<think>blocks - Deliver accurate final answers after reasoning
The training dataset contains curated reasoning examples designed to teach the model step-by-step analytical thinking.
π How to Run the Model
You can run the model using the transformers library.
1οΈβ£ Install Dependencies
pip install transformers torch accelerate
2οΈβ£ Run the Model
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "Ishant06/Qwen3.5-0.8B-Claude-4.6-Opus-Reasoning-Distilled"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
# Load model
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
prompt = "Explain why the sky is blue."
messages = [
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": prompt}
]
# Apply chat template (important for Qwen models)
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Tested with:
- transformers >= 4.40
- torch >= 2.0
π§ Example Reasoning Pattern
The model follows a structured reasoning scaffold such as:
Let me analyze this request carefully:
1. Understand the problem.
2. Break it into smaller steps.
3. Analyze each step logically.
4. Combine the reasoning.
5. Produce the final answer.
πΊοΈ Training Pipeline Overview
Base Model (Qwen3.5-0.8B)
β
βΌ
Supervised Fine-Tuning (SFT)
β
βΌ
Final Model (Claude-4.6-Opus-Reasoning-Distilled)
π Training Details
πΉ Supervised Fine-Tuning (SFT)
Framework: Hugging Face Transformers + TRL
Training Strategy: Instruction β Response SFT
Goal: Teach the model structured reasoning and step-by-step problem solving.
Format Used During Training:
<think>
internal reasoning
</think>
final answer
π Dataset Used
| Dataset | Description |
|---|---|
| crownelius/Opus-4.6-Reasoning-3300x | Claude-4.6 Opus style reasoning dataset containing structured chain-of-thought examples |
π Capabilities
The model performs well in tasks requiring reasoning such as:
- Logical problem solving
- Mathematical reasoning
- Coding explanations
- Step-by-step analysis
- Instruction following
β οΈ Limitations
- The model may still hallucinate factual information.
- Performance is limited by the relatively small 0.8B parameter size.
- Best suited for experimentation, lightweight reasoning tasks, and research.
π Acknowledgements
- Qwen Team for the base model.
- The open-source community for providing reasoning datasets.
π Citation
@misc{ishant_qwen35_opus_reasoning,
title = {Qwen3.5-0.8B-Claude-4.6-Opus-Reasoning-Distilled},
author = {Ishant Dere},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Ishant06/Qwen3.5-0.8B-Claude-4.6-Opus-Reasoning-Distilled}}
}
- Downloads last month
- 81
