Qwen3.5-4b-Opus-4.6-Reasoning-Distilled-v1
First attempt at improving reasoning abilities for Qwen3.5 4b model based on the open crownelius/Opus-4.6-Reasoning-3300x dataset.
Evals
I didn't have the patience to run too many evals, but I definitely noticed vibes wise it was a lot more "opus"-like (in a good way) in it's reasoning and responses, base Qwen3.5 4b kinda rambles on. Seemed smarter... The one eval I did was was the latest LiveBench Reasoning benchmarks, and here are the results:
| Model | Spatial | Zebra Puzzle | Reasoning Avg |
|---|---|---|---|
| qwen3.5-4b (Base) | 4.0 | 18.75 | 11.4 |
| Qwen3.5-4b-Opus-4.6-Reasoning-Distilled-v1 | 24.0 | 19.0 | 21.5 |
| Δ Improvement | +20.0 | +0.25 | +10.1 |
| % Improvement | +500% | +1.3% | +88.6% |
Notes
For v2 of this model, I need to fix the thinking template. It seems like the model ALWAYS does reasoning due to the way I templated the dataset, so I'm doing another training run with explicit think or no think rows. (hopefully that works?).
Also, I don't know much about training models or ML, I'm a Software Engineer who uses a lot of AI. I just started, and this was pretty much the first real model I've ever trained, so please be nice!
Training
I trained this on a single RTX 4060ti with 16GB VRAM, took around 2 or 3 hours.
Acknowledgements
- crownelius for the cleaned dataset
- unsloth for the training arch
- pewdiepie for inspiring me to try training models (seriously, lol)
- Downloads last month
- 51
4-bit