Uploaded finetuned model
- Developed by: ykarout
- License: apache-2.0
This qwen3_5 model was trained 2x faster with Unsloth and Huggingface's TRL library.
Qwen3.5-9B-Opus-OpenClaw-Distilled
Qwen3.5-9B-Opus-OpenClaw-Distilled is a reasoning-first, agentically-tuned derivative of Qwen3.5-9b, built to fuse two strengths into one model identity:
- OpenClaw & CoPaw’s operational / agentic instincts
- Claude Opus-style structured reasoning distillation
The goal is simple: Use a strong base mode --> tune for agentic harness like openclaw and agentscope --> distill opus-4.6 level reasoning --> best of both worlds
TL;DR
It is designed for users who want:
- preserved chat + strong agentic usefulness from the Openclaw / CoPaw lineage
- a model that feels more “planner + operator” than just “chatbot”
Recommended sampling parameters:
temperature=0.6top_p=0.95min_p=0.0top_k=20repeat_penalty=1.0presence_penalty=0.0
Special Instructions for Ollama Only
The GGUF that works correctly with Ollama is the vision-merged file as Ollama only accepts a single GGUF for loading from Modelfile
Qwen3.5-9b-Opus-Openclaw-Distilled-vision-merged-Q6_K.gguf
This file already has the multimodal weights merged into the main GGUF, so you should use it directly as the FROM target in your Modelfile.
1. Download the GGUF
If you want to download it manually from this repo:
huggingface-cli download ykarout/Qwen3.5-9b-Opus-Openclaw-Distilled-GGUF Qwen3.5-9b-Opus-Openclaw-Distilled-vision-merged-Q6_K.gguf --local-dir .
2. Create a Modelfile
Save the following as Modelfile in the same folder as the GGUF:
FROM ./Qwen3.5-9b-Opus-Openclaw-Distilled-vision-merged-Q6_K.gguf
TEMPLATE {{ .Prompt }}
RENDERER qwen3.5
PARSER qwen3.5
PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER min_p 0.0
PARAMETER top_k 20
PARAMETER repeat_penalty 1.0
PARAMETER presence_penalty 0.0
3. Create the Ollama model
ollama create qwen3.5-9b-opus-openclaw:Q6_K -f Modelfile
4. Run it
ollama run qwen3.5-9b-opus-openclaw:Q6_K
Notes
- Use the
vision-mergedGGUF is for Ollama only. - All other GGUFs work out of the box with LMStudio and llama.cpp with text + vision
Why this model exists
CoPaw-Flash-9B is already a highly interesting Qwen3.5-based model family member with explicit optimization for agentic behavior such as tool invocation, command execution, memory management, and multi-step planning. Opus builds on top of that foundation instead of starting from a plain base model. The idea is to preserve that practical “gets things done” behavior while injecting denser and more structured reasoning traces through supervised fine-tuning.
At the same time, the inspiration for the reasoning side of this model comes from recent Qwen3.5 reasoning distillations trained with Opus-derived trajectories. In particular, models like Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled emphasize <think>-structured reasoning, response-only training, and normalized reasoning/answer formatting.
Model identity
The Model is intended to sit at the intersection of:
- agentic chat utility
- structured reasoning
- practical local deployment
- Qwen3.5 ecosystem compatibility
The intended vibe is:
Qwen3.5 + OpenClaw reflexes + Opus-style reasoning scaffolds
In practice, that means the model is aimed at tasks like:
- analytical QA
- coding support
- workflow planning
- terminal / tool-oriented prompting
- multi-step decomposition
- logic-heavy conversations
- “think first, answer second” style interactions
Base model
- Base family: Qwen3.5-9b
- Immediate base:
CoPaw-Flash-9B--> A fine-tune that exhibits much higher agentic capabilities in harnesses like OpenClaw and AgentScope
Training concept
This model was trained as a text-only reasoning SFT derivative focused on preserving and reinforcing the format:
<think>
...
</think>
final answer
The overall training philosophy is aligned with reasoning-distillation approaches that emphasize:
- response-only loss masking
- explicit
<think>formatting - structured step-by-step reasoning before the final answer
Datasets
The training recipe is centered on a unified reasoning dataset built from:
Roman1111111/claude-opus-4.6-10000xCrownelius/Opus-4.6-Reasoning-3300x
These were normalized into a single conversational SFT dataset:
ykarout/Opus-4.6-reasoning-sft-12k
Dataset processing highlights
The dataset was cleaned and unified so both sources follow the same final structure:
messages-based conversational schema- assistant output normalized into:
<think> ... </think>- followed by the final answer
- generic repeated system prompts removed where appropriate
- token-length profile measured after rendering with the target tokenizer chat template
This makes the training corpus more consistent and more directly usable in TRL / Unsloth conversational SFT pipelines.
Training recipe
High-level recipe:
- Framework: Unsloth
- Method: LoRA SFT
- Objective: improve structured reasoning while retaining CoPaw-style usefulness
- Loss behavior: train on assistant responses / completions only
- Format target: explicit
<think>reasoning followed by answer - Text-only setup: no vision-layer fine-tuning path used
What changed versus CoPaw-Flash-9B
This is not presented as a replacement for CoPaw-Flash-9B’s original design goals.
Instead, it pushes the model further toward:
- more explicit reasoning traces
- more deliberate planning language
- cleaner internal decomposition on complex tasks
- stronger “reason-then-answer” behavior
Intended use
Model is best suited for:
- Agentic harnesses like OpenClaw, Claude Code, OpenCode, AgentScope etc..
- deep analytical prompting
- code and debugging assistance
- local agent workflows
- logic / math / structured breakdown tasks
It is a particularly natural fit for prompts where you want the model to:
- parse the task carefully
- build a plan
- utilize different tools
- then produce a clean answer or action
Limitations
- This is still an autoregressive language model and can hallucinate.
- Strong reasoning style does not guarantee factual correctness.
- More visible reasoning can sometimes increase verbosity.
- Distillation can improve structure without perfectly reproducing frontier-model judgment.
- Depending on the prompt mix, some behaviors may lean more “reasoning-first” than “tool-first.”
Acknowledgements
Huge credit goes to the upstream work that made this possible:
agentscope-ai/CoPaw-Flash-9BRoman1111111/claude-opus-4.6-10000xCrownelius/Opus-4.6-Reasoning-3300xJackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled- the broader Qwen / Unsloth ecosystem
Citation / lineage notes
If you use this model, please also acknowledge the upstream projects and datasets it builds on.
Qwen3.5-9B-Opus-OpenClaw-Distilled is for people who want a model that doesn’t just answer — it locks in, thinks cleanly, and then strikes.
- Downloads last month
- 4,493
4-bit
5-bit
6-bit
8-bit
16-bit
