🌟 Qwen3.5-0.8B-Claude-4.6-Opus-Reasoning-Distilled

Open In Colab

πŸ“’ Release Note Build Environment:

  • Training Method: Supervised Fine-Tuning (SFT)
  • Base Model: Qwen3.5-0.8B
  • Training Libraries: Hugging Face Transformers + TRL

HB8AleUaMAArNyM


πŸ’‘ Model Introduction

Qwen3.5-0.8B-Claude-4.6-Opus-Reasoning-Distilled is a reasoning-focused model fine-tuned from Qwen3.5-0.8B using structured reasoning traces derived from Claude-4.6 Opus style reasoning datasets.

The model is trained to produce structured chain-of-thought reasoning, enabling it to:

  • Break complex problems into logical steps
  • Produce internal reasoning inside <think> blocks
  • Deliver accurate final answers after reasoning

The training dataset contains curated reasoning examples designed to teach the model step-by-step analytical thinking.


πŸš€ How to Run the Model

You can run the model using the transformers library.

1️⃣ Install Dependencies

pip install transformers torch accelerate

2️⃣ Run the Model

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "Ishant06/Qwen3.5-0.8B-Claude-4.6-Opus-Reasoning-Distilled"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

# Load model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

prompt = "Explain why the sky is blue."

messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": prompt}
]

# Apply chat template (important for Qwen models)
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        top_p=0.9
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(response)

Tested with:

  • transformers >= 4.40
  • torch >= 2.0

🧠 Example Reasoning Pattern

The model follows a structured reasoning scaffold such as:

Let me analyze this request carefully:

1. Understand the problem.
2. Break it into smaller steps.
3. Analyze each step logically.
4. Combine the reasoning.
5. Produce the final answer.

πŸ—ΊοΈ Training Pipeline Overview

Base Model (Qwen3.5-0.8B)
 β”‚
 β–Ό
Supervised Fine-Tuning (SFT)
 β”‚
 β–Ό
Final Model (Claude-4.6-Opus-Reasoning-Distilled)

πŸ“‹ Training Details

πŸ”Ή Supervised Fine-Tuning (SFT)

Framework: Hugging Face Transformers + TRL

Training Strategy: Instruction β†’ Response SFT

Goal: Teach the model structured reasoning and step-by-step problem solving.

Format Used During Training:

<think>
internal reasoning
</think>
final answer

πŸ“š Dataset Used

Dataset Description
crownelius/Opus-4.6-Reasoning-3300x Claude-4.6 Opus style reasoning dataset containing structured chain-of-thought examples

🌟 Capabilities

The model performs well in tasks requiring reasoning such as:

  • Logical problem solving
  • Mathematical reasoning
  • Coding explanations
  • Step-by-step analysis
  • Instruction following

⚠️ Limitations

  • The model may still hallucinate factual information.
  • Performance is limited by the relatively small 0.8B parameter size.
  • Best suited for experimentation, lightweight reasoning tasks, and research.

πŸ™ Acknowledgements

  • Qwen Team for the base model.
  • The open-source community for providing reasoning datasets.

πŸ“– Citation

@misc{ishant_qwen35_opus_reasoning,
  title        = {Qwen3.5-0.8B-Claude-4.6-Opus-Reasoning-Distilled},
  author       = {Ishant Dere},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Ishant06/Qwen3.5-0.8B-Claude-4.6-Opus-Reasoning-Distilled}}
}
Downloads last month
81
Safetensors
Model size
0.8B params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Ishant06/Qwen3.5-0.8B-Claude-4.6-Opus-Reasoning-Distilled

Finetuned
(159)
this model

Dataset used to train Ishant06/Qwen3.5-0.8B-Claude-4.6-Opus-Reasoning-Distilled

Space using Ishant06/Qwen3.5-0.8B-Claude-4.6-Opus-Reasoning-Distilled 1