⚠️ CRITICAL: Ollama Inference Flag Required
If you serve this model via Ollama with the qwen3.5 RENDERER (the standard recommended setup), you MUST pass
"think": falsein the/api/chatrequest body for chat / instruction following / tool use.curl -X POST http://localhost:11434/api/chat \ -d '{"model": "...", "think": false, "messages": [...], "stream": false}'Without this flag, the renderer auto-injects
<think>tags into every chat completion. On longer prompts the model can stay inside the<think>block past the response budget, never emit</think>, and produce zero answer tokens on 25-46% of requests.Set
think: true(or omit) only when you DO want chain-of-thought reasoning (math, planning, complex multi-step). This is Qwen3 dual-mode operation per https://qwenlm.github.io/blog/qwen3/.See the dataset
cudabenchmarktest/r9-research-framework_OLLAMA_INFERENCE_WARNING.mdfor the full explanation.
Qwen3.5-9B Reasoning Distilled GGUF (R3 Crown)
GGUF quantized version of fine-tuned Qwen3.5-9B with distilled Opus 4.6 reasoning traces. Early iteration (R3) — superseded by R7 (86.8% diverse eval).
Training
- Base model: Qwen/Qwen3.5-9B
- Method: LoRA SFT (r=32, alpha=64, LR=2e-4)
- Data: Crownelius/Opus-4.6-Reasoning-3300x (2160 samples)
- Training suite: robit-man/fine_tuning_suite
Note
This early iteration had regressions in instruction following due to monoculture training data. See the training suite for the improved R5/R7 approach.
Successors
| Model | Eval | Link |
|---|---|---|
| R7 Research | 86.8% | cudabenchmarktest/qwen3.5-9b-r7-research |
| R7 Vision | 86.8% | cudabenchmarktest/qwen3.5-9b-r7-research-vision |
License
Apache 2.0 (inherited from Qwen3.5-9B).
- Downloads last month
- 5,893
We're not able to determine the quantization variants.