Uploaded finetuned model

  • Developed by: ykarout
  • License: apache-2.0

This qwen3_5 model was trained 2x faster with Unsloth and Huggingface's TRL library.

Qwen3.5-9B-Opus-OpenClaw-Distilled

Qwen3.5-9B-Opus-OpenClaw-Distilled is a reasoning-first, agentically-tuned derivative of Qwen3.5-9b, built to fuse two strengths into one model identity:

  • OpenClaw & CoPaw’s operational / agentic instincts
  • Claude Opus-style structured reasoning distillation

The goal is simple: Use a strong base mode --> tune for agentic harness like openclaw and agentscope --> distill opus-4.6 level reasoning --> best of both worlds

GGUFs at ykarout/Qwen3.5-9b-Opus-Openclaw-Distilled-GGUF

TL;DR

It is designed for users who want:

  • preserved chat + strong agentic usefulness from the Openclaw / CoPaw lineage
  • a model that feels more “planner + operator” than just “chatbot”

Why this model exists

CoPaw-Flash-9B is already a highly interesting Qwen3.5-based model family member with explicit optimization for agentic behavior such as tool invocation, command execution, memory management, and multi-step planning. Opus builds on top of that foundation instead of starting from a plain base model. The idea is to preserve that practical “gets things done” behavior while injecting denser and more structured reasoning traces through supervised fine-tuning.

At the same time, the inspiration for the reasoning side of this model comes from recent Qwen3.5 reasoning distillations trained with Opus-derived trajectories. In particular, models like Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled emphasize <think>-structured reasoning, response-only training, and normalized reasoning/answer formatting.

Model identity

The Model is intended to sit at the intersection of:

  • agentic chat utility
  • structured reasoning
  • practical local deployment
  • Qwen3.5 ecosystem compatibility

The intended vibe is:

Qwen3.5 + OpenClaw reflexes + Opus-style reasoning scaffolds

In practice, that means the model is aimed at tasks like:

  • analytical QA
  • coding support
  • workflow planning
  • terminal / tool-oriented prompting
  • multi-step decomposition
  • logic-heavy conversations
  • “think first, answer second” style interactions

Base model

  • Base family: Qwen3.5-9b
  • Immediate base: CoPaw-Flash-9B --> A fine-tune that exhibits much higher agentic capabilities in harnesses like OpenClaw and AgentScope

Training concept

This model was trained as a text-only reasoning SFT derivative focused on preserving and reinforcing the format:

<think>
...
</think>

final answer

The overall training philosophy is aligned with reasoning-distillation approaches that emphasize:

  • response-only loss masking
  • explicit <think> formatting
  • structured step-by-step reasoning before the final answer

Datasets

The training recipe is centered on a unified reasoning dataset built from:

  • Roman1111111/claude-opus-4.6-10000x
  • Crownelius/Opus-4.6-Reasoning-3300x

These were normalized into a single conversational SFT dataset:

  • ykarout/Opus-4.6-reasoning-sft-12k

Dataset processing highlights

The dataset was cleaned and unified so both sources follow the same final structure:

  • messages-based conversational schema
  • assistant output normalized into:
    • <think> ... </think>
    • followed by the final answer
  • generic repeated system prompts removed where appropriate
  • token-length profile measured after rendering with the target tokenizer chat template

This makes the training corpus more consistent and more directly usable in TRL / Unsloth conversational SFT pipelines.

Training recipe

High-level recipe:

  • Framework: Unsloth
  • Method: LoRA SFT
  • Objective: improve structured reasoning while retaining CoPaw-style usefulness
  • Loss behavior: train on assistant responses / completions only
  • Format target: explicit <think> reasoning followed by answer
  • Text-only setup: no vision-layer fine-tuning path used

What changed versus CoPaw-Flash-9B

This is not presented as a replacement for CoPaw-Flash-9B’s original design goals.

Instead, it pushes the model further toward:

  • more explicit reasoning traces
  • more deliberate planning language
  • cleaner internal decomposition on complex tasks
  • stronger “reason-then-answer” behavior

Intended use

Model is best suited for:

  • Agentic harnesses like OpenClaw, Claude Code, OpenCode, AgentScope etc..
  • deep analytical prompting
  • code and debugging assistance
  • local agent workflows
  • logic / math / structured breakdown tasks

It is a particularly natural fit for prompts where you want the model to:

  1. parse the task carefully
  2. build a plan
  3. utilize different tools
  4. then produce a clean answer or action

Limitations

  • This is still an autoregressive language model and can hallucinate.
  • Strong reasoning style does not guarantee factual correctness.
  • More visible reasoning can sometimes increase verbosity.
  • Distillation can improve structure without perfectly reproducing frontier-model judgment.
  • Depending on the prompt mix, some behaviors may lean more “reasoning-first” than “tool-first.”

Acknowledgements

Huge credit goes to the upstream work that made this possible:

  • agentscope-ai/CoPaw-Flash-9B
  • Roman1111111/claude-opus-4.6-10000x
  • Crownelius/Opus-4.6-Reasoning-3300x
  • Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled
  • the broader Qwen / Unsloth ecosystem

Citation / lineage notes

If you use this model, please also acknowledge the upstream projects and datasets it builds on.

Qwen3.5-9B-Opus-OpenClaw-Distilled is for people who want a model that doesn’t just answer — it locks in, thinks cleanly, and then strikes.

Downloads last month
837
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 2 Ask for provider support

Model tree for ykarout/Qwen3.5-9b-Opus-Openclaw-Distilled

Finetuned
Qwen/Qwen3.5-9B
Adapter
(2)
this model
Adapters
2 models

Datasets used to train ykarout/Qwen3.5-9b-Opus-Openclaw-Distilled