Qwen3-0.6B-Reasoning-Opus

This is a fine-tuned version of Qwen3-0.6B optimized for multi-step reasoning. It was trained using QLoRA on a filtered dataset of reasoning traces distilled from Claude 4.6 Opus. The goal of this project was to induce "System 2" (deliberate) thinking in a sub-1B parameter model.

Model Details

  • Developed by: Shreyansh Pathak
  • Institution: Dayananda Sagar College of Engineering (DSCE), Bangalore
  • Model type: Causal Language Model with Chain-of-Thought (CoT) capabilities.
  • Base Model: Qwen/Qwen3-0.6B
  • Language(s): English
  • Fine-tuning Technique: QLoRA (Unsloth)
  • Rank (r): 16
  • Alpha: 16

Performance & Evaluation

The model was evaluated against the base Qwen3-0.6B model on a 50-sample random subset of the GSM8K (Grade School Math) benchmark to measure logical consistency and arithmetic accuracy.

Model GSM8K Accuracy (n=50) Improvement
Base Qwen3-0.6B 26.0% Baseline
Qwen3-0.6B-Reasoning-Opus 32.0% +6.0% (Absolute)

Key Findings

  • Reasoning Activation: The fine-tuned model successfully triggers a <think> block for complex queries, whereas the base model typically provides direct, often incorrect, answers.
  • Alignment Tax: While math accuracy increased, the model exhibits some "overthinking" on simple logic riddles, a common trade-off in small-parameter reasoning models.
  • Relative Gain: The model showed a 23% relative improvement in math problem-solving compared to its pre-trained state.

Training Procedure

  • Hardware: NVIDIA Tesla T4 (via Google Colab)
  • Optimizer: AdamW (8-bit)
  • Learning Rate: 2e-4
  • Batch Size: 2 (Gradient Accumulation: 4)
  • Training Steps: 60
  • Dataset: nohurry/Opus-4.6-Reasoning-3000x-filtered

Usage

This model uses the standard Qwen3 chat template but is optimized to generate reasoning traces inside <think> tags.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Shreyansh327/Qwen3-0.6B-Reasoning-Opus"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "If I have 10 oranges and give 3 to John and 2 to Mary, how many are left?"
messages = [{"role": "user", "content": prompt}]

inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Shreyansh327/Qwen3-0.6B-Reasoning-Opus-LoRA

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(802)
this model

Dataset used to train Shreyansh327/Qwen3-0.6B-Reasoning-Opus-LoRA