Uploaded finetuned model

  • Developed by: ykarout
  • License: apache-2.0

This qwen3_5 model was trained 2x faster with Unsloth and Huggingface's TRL library.

Qwen3.5-9B-Opus-OpenClaw-Distilled

Qwen3.5-9B-Opus-OpenClaw-Distilled is a reasoning-first, agentically-tuned derivative of Qwen3.5-9b, built to fuse two strengths into one model identity:

  • OpenClaw & CoPaw’s operational / agentic instincts
  • Claude Opus-style structured reasoning distillation

The goal is simple: Use a strong base mode --> tune for agentic harness like openclaw and agentscope --> distill opus-4.6 level reasoning --> best of both worlds

TL;DR

It is designed for users who want:

  • preserved chat + strong agentic usefulness from the Openclaw / CoPaw lineage
  • a model that feels more “planner + operator” than just “chatbot”

Recommended sampling parameters:

  • temperature=0.6
  • top_p=0.95
  • min_p=0.0
  • top_k=20
  • repeat_penalty=1.0
  • presence_penalty=0.0

Special Instructions for Ollama Only

The GGUF that works correctly with Ollama is the vision-merged file as Ollama only accepts a single GGUF for loading from Modelfile

  • Qwen3.5-9b-Opus-Openclaw-Distilled-vision-merged-Q6_K.gguf

This file already has the multimodal weights merged into the main GGUF, so you should use it directly as the FROM target in your Modelfile.

1. Download the GGUF

If you want to download it manually from this repo:

huggingface-cli download ykarout/Qwen3.5-9b-Opus-Openclaw-Distilled-GGUF Qwen3.5-9b-Opus-Openclaw-Distilled-vision-merged-Q6_K.gguf --local-dir .

2. Create a Modelfile

Save the following as Modelfile in the same folder as the GGUF:

FROM ./Qwen3.5-9b-Opus-Openclaw-Distilled-vision-merged-Q6_K.gguf
TEMPLATE {{ .Prompt }}
RENDERER qwen3.5
PARSER qwen3.5
PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER min_p 0.0
PARAMETER top_k 20
PARAMETER repeat_penalty 1.0
PARAMETER presence_penalty 0.0

3. Create the Ollama model

ollama create qwen3.5-9b-opus-openclaw:Q6_K -f Modelfile

4. Run it

ollama run qwen3.5-9b-opus-openclaw:Q6_K

Notes

  • Use the vision-merged GGUF is for Ollama only.
  • All other GGUFs work out of the box with LMStudio and llama.cpp with text + vision

Why this model exists

CoPaw-Flash-9B is already a highly interesting Qwen3.5-based model family member with explicit optimization for agentic behavior such as tool invocation, command execution, memory management, and multi-step planning. Opus builds on top of that foundation instead of starting from a plain base model. The idea is to preserve that practical “gets things done” behavior while injecting denser and more structured reasoning traces through supervised fine-tuning.

At the same time, the inspiration for the reasoning side of this model comes from recent Qwen3.5 reasoning distillations trained with Opus-derived trajectories. In particular, models like Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled emphasize <think>-structured reasoning, response-only training, and normalized reasoning/answer formatting.

Model identity

The Model is intended to sit at the intersection of:

  • agentic chat utility
  • structured reasoning
  • practical local deployment
  • Qwen3.5 ecosystem compatibility

The intended vibe is:

Qwen3.5 + OpenClaw reflexes + Opus-style reasoning scaffolds

In practice, that means the model is aimed at tasks like:

  • analytical QA
  • coding support
  • workflow planning
  • terminal / tool-oriented prompting
  • multi-step decomposition
  • logic-heavy conversations
  • “think first, answer second” style interactions

Base model

  • Base family: Qwen3.5-9b
  • Immediate base: CoPaw-Flash-9B --> A fine-tune that exhibits much higher agentic capabilities in harnesses like OpenClaw and AgentScope

Training concept

This model was trained as a text-only reasoning SFT derivative focused on preserving and reinforcing the format:

<think>
...
</think>

final answer

The overall training philosophy is aligned with reasoning-distillation approaches that emphasize:

  • response-only loss masking
  • explicit <think> formatting
  • structured step-by-step reasoning before the final answer

Datasets

The training recipe is centered on a unified reasoning dataset built from:

  • Roman1111111/claude-opus-4.6-10000x
  • Crownelius/Opus-4.6-Reasoning-3300x

These were normalized into a single conversational SFT dataset:

  • ykarout/Opus-4.6-reasoning-sft-12k

Dataset processing highlights

The dataset was cleaned and unified so both sources follow the same final structure:

  • messages-based conversational schema
  • assistant output normalized into:
    • <think> ... </think>
    • followed by the final answer
  • generic repeated system prompts removed where appropriate
  • token-length profile measured after rendering with the target tokenizer chat template

This makes the training corpus more consistent and more directly usable in TRL / Unsloth conversational SFT pipelines.

Training recipe

High-level recipe:

  • Framework: Unsloth
  • Method: LoRA SFT
  • Objective: improve structured reasoning while retaining CoPaw-style usefulness
  • Loss behavior: train on assistant responses / completions only
  • Format target: explicit <think> reasoning followed by answer
  • Text-only setup: no vision-layer fine-tuning path used

What changed versus CoPaw-Flash-9B

This is not presented as a replacement for CoPaw-Flash-9B’s original design goals.

Instead, it pushes the model further toward:

  • more explicit reasoning traces
  • more deliberate planning language
  • cleaner internal decomposition on complex tasks
  • stronger “reason-then-answer” behavior

Intended use

Model is best suited for:

  • Agentic harnesses like OpenClaw, Claude Code, OpenCode, AgentScope etc..
  • deep analytical prompting
  • code and debugging assistance
  • local agent workflows
  • logic / math / structured breakdown tasks

It is a particularly natural fit for prompts where you want the model to:

  1. parse the task carefully
  2. build a plan
  3. utilize different tools
  4. then produce a clean answer or action

Limitations

  • This is still an autoregressive language model and can hallucinate.
  • Strong reasoning style does not guarantee factual correctness.
  • More visible reasoning can sometimes increase verbosity.
  • Distillation can improve structure without perfectly reproducing frontier-model judgment.
  • Depending on the prompt mix, some behaviors may lean more “reasoning-first” than “tool-first.”

Acknowledgements

Huge credit goes to the upstream work that made this possible:

  • agentscope-ai/CoPaw-Flash-9B
  • Roman1111111/claude-opus-4.6-10000x
  • Crownelius/Opus-4.6-Reasoning-3300x
  • Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled
  • the broader Qwen / Unsloth ecosystem

Citation / lineage notes

If you use this model, please also acknowledge the upstream projects and datasets it builds on.

Qwen3.5-9B-Opus-OpenClaw-Distilled is for people who want a model that doesn’t just answer — it locks in, thinks cleanly, and then strikes.

Downloads last month
4,493
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ykarout/Qwen3.5-9b-Opus-Openclaw-Distilled-GGUF

Finetuned
Qwen/Qwen3.5-9B
Adapter
(107)
this model

Datasets used to train ykarout/Qwen3.5-9b-Opus-Openclaw-Distilled-GGUF