CodeClawd — Qwen3.5-9B Claude Code (GGUF)

GGUF quantizations of empero-ai/Qwen3.5-9B-Claude-Code — A supervised fine-tune of empero-ai/Qwen3.5-9B-Claude-Opus-4.6-Distill on real Claude Code / Codex agentic sessions from the DataClaw dataset collection.

The model is trained to behave like a capable coding agent: thinking through problems, calling tools, interpreting results, and responding clearly — mirroring the full agentic loop of a real Claude Code session.

The model thinks with <think> tags, calls tools with XML-style <tool_call> blocks.

Available Quantizations

Filename	Quant	Size	RAM Required	Quality
`Qwen3.5-9B-Claude-Code-Q2_K.gguf`	Q2_K	~3.6 GB	~6 GB	Lowest — not recommended
`Qwen3.5-9B-Claude-Code-Q3_K_M.gguf`	Q3_K_M	~4.5 GB	~7 GB	Low
`Qwen3.5-9B-Claude-Code-Q4_0.gguf`	Q4_0	~5.2 GB	~8 GB	Acceptable
`Qwen3.5-9B-Claude-Code-Q4_K_M.gguf`	Q4_K_M	~5.4 GB	~8 GB	Recommended — best size/quality tradeoff
`Qwen3.5-9B-Claude-Code-Q5_K_M.gguf`	Q5_K_M	~6.4 GB	~9 GB	High
`Qwen3.5-9B-Claude-Code-Q6_K.gguf`	Q6_K	~7.3 GB	~10 GB	Very high
`Qwen3.5-9B-Claude-Code-Q8_0.gguf`	Q8_0	~9.5 GB	~12 GB	Near-lossless
`Qwen3.5-9B-Claude-Code-F16.gguf`	F16	~18 GB	~22 GB	Lossless

Recommended: Q4_K_M for most users. Q5_K_M or Q6_K if you have extra VRAM and want higher quality reasoning.

Model Details

Property	Value
Original model	`empero-ai/Qwen3.5-9B-Claude-Code`
Base model	`empero-ai/Qwen3.5-9B-Claude-Opus-4.6-Distill`
Fine-tune type	SFT (LoRA r=64/alpha=128, merged)
Parameters	9B
Training data	DataClaw — 29 datasets of real Claude Code / Codex sessions
Context length	4096 tokens
License	Apache 2.0

Usage with llama.cpp

# Basic inference
./llama-cli \
  -m Qwen3.5-9B-Claude-Code-Q4_K_M.gguf \
  -n 1024 \
  --temp 0.7 \
  --top-p 0.9 \
  -p "<|im_start|>system
You are an expert AI coding assistant. You help users with software engineering tasks including writing code, debugging, refactoring, explaining code, and more. You think through problems carefully inside <think> tags before acting. You have access to tools like Read, Edit, Write, Bash, Grep, Glob, and others to interact with the user's codebase. You use tools when needed and provide clear, concise responses.<|im_end|>
<|im_start|>user
Read the file main.py and explain what it does.<|im_end|>
<|im_start|>assistant
"

# As an OpenAI-compatible server
./llama-server \
  -m Qwen3.5-9B-Claude-Code-Q4_K_M.gguf \
  --host 0.0.0.0 \
  --port 8080 \
  -c 4096 \
  -ngl 99

Usage with Ollama

Create a Modelfile:

FROM Qwen3.5-9B-Claude-Code-Q4_K_M.gguf

SYSTEM """You are an expert AI coding assistant. You help users with software engineering tasks including writing code, debugging, refactoring, explaining code, and more. You think through problems carefully inside <think> tags before acting. You have access to tools like Read, Edit, Write, Bash, Grep, Glob, and others to interact with the user's codebase. You use tools when needed and provide clear, concise responses. You show your reasoning process and use tools methodically to accomplish tasks."""

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 4096

ollama create codeclawd -f Modelfile
ollama run codeclawd

Usage with LM Studio

Download the .gguf file of your chosen quantization
Open LM Studio → Load Model → select the file
In the system prompt field, paste:

You are an expert AI coding assistant. You help users with software engineering tasks including writing code, debugging, refactoring, explaining code, and more. You think through problems carefully inside <think> tags before acting. You have access to tools like Read, Edit, Write, Bash, Grep, Glob, and others to interact with the user's codebase. You use tools when needed and provide clear, concise responses. You show your reasoning process and use tools methodically to accomplish tasks.

Set temperature: 0.7, top-p: 0.9, context: 4096

Agentic Format

This model is trained on the full agentic loop:

user → <think>...</think> → <tool_call> → <tool_result> → response → repeat

Tool Call Format

<tool_call>
<tool>Read</tool>
<input>
{"file_path": "/path/to/file.py"}
</input>
</tool_call>

<tool_result>
<tool>Read</tool>
<output>
... file contents ...
</output>
</tool_result>

Thinking Format

<think>
The user wants to refactor the auth module. I should first read the existing
implementation before suggesting changes.
</think>

Intended Use

Local agentic coding assistants on consumer hardware
IDE integrations (Continue, Cursor, etc.) running locally
Offline / air-gapped development environments
Research into agentic SFT distillation from real sessions

Limitations

Tool format is XML-based — not OpenAI function-call JSON
SFT only (no RLHF or DPO)
May hallucinate tool outputs if not connected to a real tool executor
Quantization below Q4 may degrade reasoning quality noticeably

Original Model

HF (safetensors): empero-ai/Qwen3.5-9B-Claude-Code

License

Apache 2.0 — see base model for details.

Downloads last month: 4,826

GGUF

Model size

9B params

Architecture

qwen35

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for empero-ai/Qwen3.5-9B-Claude-Code-GGUF

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Finetuned

empero-ai/Qwen3.5-9B-Claude-Opus-4.6-Distill

Quantized

(1)

this model

Datasets used to train empero-ai/Qwen3.5-9B-Claude-Code-GGUF

Collection including empero-ai/Qwen3.5-9B-Claude-Code-GGUF

Qwen3.5

Collection

A collection of our Qwen3.5 finetunes • 4 items • Updated 27 days ago