⚠️ CRITICAL: Ollama Inference Flag Required

If you serve this model via Ollama with the qwen3.5 RENDERER (the standard recommended setup), you MUST pass "think": false in the /api/chat request body for chat / instruction following / tool use.

curl -X POST http://localhost:11434/api/chat \
  -d '{"model": "...", "think": false, "messages": [...], "stream": false}'

Without this flag, the renderer auto-injects <think> tags into every chat completion. On longer prompts the model can stay inside the <think> block past the response budget, never emit </think>, and produce zero answer tokens on 25-46% of requests.

Set think: true (or omit) only when you DO want chain-of-thought reasoning (math, planning, complex multi-step). This is Qwen3 dual-mode operation per https://qwenlm.github.io/blog/qwen3/.

See the dataset cudabenchmarktest/r9-research-framework _OLLAMA_INFERENCE_WARNING.md for the full explanation.


Qwen3.5-9B R7 Research (FP16)

Fine-tuned Qwen3.5-9B with distilled reasoning from research-backed datasets. Text-only model (no vision encoder). For the vision-capable version, see cudabenchmarktest/qwen3.5-9b-r7-research-vision.

This is the pre-GGUF FP16 safetensors checkpoint. For quantized GGUF versions ready for Ollama, see robit/qwen3.5-9b-r7-research on Ollama.

Capabilities

  • Thinking — produces structured reasoning in <think> blocks
  • Tool calling — structured function calls when given tool definitions
  • Instruction following — concise answers, format constraints, system prompt adherence

Eval Results

Benchmark Score
Diverse stochastic eval (38 tests, 9 categories) 86.8%
Base qwen3.5:9b on same eval 79.0%

Training Details

Training Suite

Full training pipeline, evaluation scripts, and documentation: robit-man/fine_tuning_suite

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "cudabenchmarktest/qwen3.5-9b-r7-research",
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("cudabenchmarktest/qwen3.5-9b-r7-research")

messages = [{"role": "user", "content": "What is the capital of France?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

License

Apache 2.0 (inherited from Qwen3.5-9B). Training data licenses vary by source.

Downloads last month
458
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cudabenchmarktest/qwen3.5-9b-r7-research

Finetuned
Qwen/Qwen3.5-9B
Adapter
(123)
this model

Datasets used to train cudabenchmarktest/qwen3.5-9b-r7-research

Collection including cudabenchmarktest/qwen3.5-9b-r7-research