⚠️ CRITICAL: Ollama Inference Flag Required

If you serve this model via Ollama with the qwen3.5 RENDERER (the standard recommended setup), you MUST pass "think": false in the /api/chat request body for chat / instruction following / tool use.

curl -X POST http://localhost:11434/api/chat \
  -d '{"model": "...", "think": false, "messages": [...], "stream": false}'

Without this flag, the renderer auto-injects <think> tags into every chat completion. On longer prompts the model can stay inside the <think> block past the response budget, never emit </think>, and produce zero answer tokens on 25-46% of requests.

Set think: true (or omit) only when you DO want chain-of-thought reasoning (math, planning, complex multi-step). This is Qwen3 dual-mode operation per https://qwenlm.github.io/blog/qwen3/.

See the dataset cudabenchmarktest/r9-research-framework _OLLAMA_INFERENCE_WARNING.md for the full explanation.


Qwen3.5-9B Reasoning Distilled GGUF (R3 Crown)

GGUF quantized version of fine-tuned Qwen3.5-9B with distilled Opus 4.6 reasoning traces. Early iteration (R3) — superseded by R7 (86.8% diverse eval).

Training

Note

This early iteration had regressions in instruction following due to monoculture training data. See the training suite for the improved R5/R7 approach.

Successors

License

Apache 2.0 (inherited from Qwen3.5-9B).

Downloads last month
5,893
GGUF
Model size
10B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cudabenchmarktest/qwen3.5-9b-qwen3.6-reasoning-distilled-GGUF

Finetuned
Qwen/Qwen3.5-9B
Adapter
(130)
this model

Dataset used to train cudabenchmarktest/qwen3.5-9b-qwen3.6-reasoning-distilled-GGUF