Uploaded finetuned model

Developed by: ykarout
License: apache-2.0

This qwen3_5 model was trained 2x faster with Unsloth and Huggingface's TRL library.

Qwen3.5-9B-Opus-OpenClaw-Distilled

Qwen3.5-9B-Opus-OpenClaw-Distilled is a reasoning-first, agentically-tuned derivative of Qwen3.5-9b, built to fuse two strengths into one model identity:

OpenClaw & CoPaw’s operational / agentic instincts
Claude Opus-style structured reasoning distillation

The goal is simple: Use a strong base mode --> tune for agentic harness like openclaw and agentscope --> distill opus-4.6 level reasoning --> best of both worlds

GGUFs at ykarout/Qwen3.5-9b-Opus-Openclaw-Distilled-GGUF

TL;DR

It is designed for users who want:

preserved chat + strong agentic usefulness from the Openclaw / CoPaw lineage
a model that feels more “planner + operator” than just “chatbot”

Why this model exists

CoPaw-Flash-9B is already a highly interesting Qwen3.5-based model family member with explicit optimization for agentic behavior such as tool invocation, command execution, memory management, and multi-step planning. Opus builds on top of that foundation instead of starting from a plain base model. The idea is to preserve that practical “gets things done” behavior while injecting denser and more structured reasoning traces through supervised fine-tuning.

At the same time, the inspiration for the reasoning side of this model comes from recent Qwen3.5 reasoning distillations trained with Opus-derived trajectories. In particular, models like Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled emphasize <think>-structured reasoning, response-only training, and normalized reasoning/answer formatting.

Model identity

The Model is intended to sit at the intersection of:

agentic chat utility
structured reasoning
practical local deployment
Qwen3.5 ecosystem compatibility

The intended vibe is:

Qwen3.5 + OpenClaw reflexes + Opus-style reasoning scaffolds

In practice, that means the model is aimed at tasks like:

analytical QA
coding support
workflow planning
terminal / tool-oriented prompting
multi-step decomposition
logic-heavy conversations
“think first, answer second” style interactions

Base model

Base family: Qwen3.5-9b
Immediate base: CoPaw-Flash-9B --> A fine-tune that exhibits much higher agentic capabilities in harnesses like OpenClaw and AgentScope

Training concept

This model was trained as a text-only reasoning SFT derivative focused on preserving and reinforcing the format:

<think>
...
</think>

final answer

The overall training philosophy is aligned with reasoning-distillation approaches that emphasize:

response-only loss masking
explicit <think> formatting
structured step-by-step reasoning before the final answer

Datasets

The training recipe is centered on a unified reasoning dataset built from:

Roman1111111/claude-opus-4.6-10000x
Crownelius/Opus-4.6-Reasoning-3300x

These were normalized into a single conversational SFT dataset:

ykarout/Opus-4.6-reasoning-sft-12k

Dataset processing highlights

The dataset was cleaned and unified so both sources follow the same final structure:

messages-based conversational schema
assistant output normalized into:
- <think> ... </think>
- followed by the final answer
generic repeated system prompts removed where appropriate
token-length profile measured after rendering with the target tokenizer chat template

This makes the training corpus more consistent and more directly usable in TRL / Unsloth conversational SFT pipelines.

Training recipe

High-level recipe:

Framework: Unsloth
Method: LoRA SFT
Objective: improve structured reasoning while retaining CoPaw-style usefulness
Loss behavior: train on assistant responses / completions only
Format target: explicit <think> reasoning followed by answer
Text-only setup: no vision-layer fine-tuning path used

What changed versus CoPaw-Flash-9B

This is not presented as a replacement for CoPaw-Flash-9B’s original design goals.

Instead, it pushes the model further toward:

more explicit reasoning traces
more deliberate planning language
cleaner internal decomposition on complex tasks
stronger “reason-then-answer” behavior

Intended use

Model is best suited for:

Agentic harnesses like OpenClaw, Claude Code, OpenCode, AgentScope etc..
deep analytical prompting
code and debugging assistance
local agent workflows
logic / math / structured breakdown tasks

It is a particularly natural fit for prompts where you want the model to:

parse the task carefully
build a plan
utilize different tools
then produce a clean answer or action

Limitations

This is still an autoregressive language model and can hallucinate.
Strong reasoning style does not guarantee factual correctness.
More visible reasoning can sometimes increase verbosity.
Distillation can improve structure without perfectly reproducing frontier-model judgment.
Depending on the prompt mix, some behaviors may lean more “reasoning-first” than “tool-first.”

Acknowledgements

Huge credit goes to the upstream work that made this possible:

agentscope-ai/CoPaw-Flash-9B
Roman1111111/claude-opus-4.6-10000x
Crownelius/Opus-4.6-Reasoning-3300x
Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled
the broader Qwen / Unsloth ecosystem

Citation / lineage notes

If you use this model, please also acknowledge the upstream projects and datasets it builds on.

Qwen3.5-9B-Opus-OpenClaw-Distilled is for people who want a model that doesn’t just answer — it locks in, thinks cleanly, and then strikes.

Downloads last month: 837

Safetensors

Model size

9B params

Tensor type

BF16

Model tree for ykarout/Qwen3.5-9b-Opus-Openclaw-Distilled

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Finetuned

agentscope-ai/QwenPaw-Flash-9B

Adapter

(2)

this model

Adapters

2 models

ykarout
/

Qwen3.5-9b-Opus-Openclaw-Distilled

Uploaded finetuned model

Qwen3.5-9B-Opus-OpenClaw-Distilled

TL;DR

Why this model exists

Model identity

Base model

Training concept

Datasets

Dataset processing highlights

Training recipe

What changed versus CoPaw-Flash-9B

Intended use

Limitations

Acknowledgements

Citation / lineage notes

Model tree for ykarout/Qwen3.5-9b-Opus-Openclaw-Distilled

Datasets used to train ykarout/Qwen3.5-9b-Opus-Openclaw-Distilled