π§ Srikri7/qwen3.5-2b-reasoning
π’ Release Note: Build Environment Upgrades
- Fine-tuning Framework: Unsloth 2026.3.7
- Core Dependencies: Transformers 5.3.0, Torch 2.10.0+cu128
- Hardware: Optimized for Tesla T4 (16GB VRAM) using 4-bit NormalFloat (NF4) quantization.
- Native Developer Role: Support for the "developer" role natively to ensure compatibility with modern coding agents (Claude Code, OpenCode).
- Continuous Thinking: Optimized to run autonomously for over 9 minutes without stalling.
π‘ Model Introduction
qwen3.5-2b-reasoning is a highly efficient reasoning model fine-tuned on the Qwen3.5-2B architecture. Despite its 2-billion parameter count, it leverages high-density Chain-of-Thought (CoT) distillation primarily sourced from Claude-4.6 Opus trajectories.
The model is specifically trained to avoid the "repetitive loop" failure common in small models by enforcing a strict hierarchy of analytical thought within <think> tags.
π§ Learned Reasoning Scaffold
The model adopts a streamlined structured thinking pattern to ensure deep analytical capacity without redundant cognitive loops:
<think>
1. [Understanding]: Restate the core objective and identify key numerical constraints (e.g., "252 students", "41-seater bus").
2. [Plan]: Identify necessary strategies or math rules (e.g., Product Rule, Rounding-up logic).
3. [Step-by-step Reasoning]: Execute transformations with intermediate justifications.
4. [Verification]: Cross-check the final result against the initial constraints.
</think>
[Final Answer]