CodeClawd — Qwen3.5-9B Claude Code (GGUF)
GGUF quantizations of empero-ai/Qwen3.5-9B-Claude-Code — A supervised fine-tune of empero-ai/Qwen3.5-9B-Claude-Opus-4.6-Distill on real Claude Code / Codex agentic sessions from the DataClaw dataset collection.
The model is trained to behave like a capable coding agent: thinking through problems, calling tools, interpreting results, and responding clearly — mirroring the full agentic loop of a real Claude Code session.
The model thinks with <think> tags, calls tools with XML-style <tool_call> blocks.
Available Quantizations
| Filename | Quant | Size | RAM Required | Quality |
|---|---|---|---|---|
Qwen3.5-9B-Claude-Code-Q2_K.gguf |
Q2_K | ~3.6 GB | ~6 GB | Lowest — not recommended |
Qwen3.5-9B-Claude-Code-Q3_K_M.gguf |
Q3_K_M | ~4.5 GB | ~7 GB | Low |
Qwen3.5-9B-Claude-Code-Q4_0.gguf |
Q4_0 | ~5.2 GB | ~8 GB | Acceptable |
Qwen3.5-9B-Claude-Code-Q4_K_M.gguf |
Q4_K_M | ~5.4 GB | ~8 GB | Recommended — best size/quality tradeoff |
Qwen3.5-9B-Claude-Code-Q5_K_M.gguf |
Q5_K_M | ~6.4 GB | ~9 GB | High |
Qwen3.5-9B-Claude-Code-Q6_K.gguf |
Q6_K | ~7.3 GB | ~10 GB | Very high |
Qwen3.5-9B-Claude-Code-Q8_0.gguf |
Q8_0 | ~9.5 GB | ~12 GB | Near-lossless |
Qwen3.5-9B-Claude-Code-F16.gguf |
F16 | ~18 GB | ~22 GB | Lossless |
Recommended:
Q4_K_Mfor most users.Q5_K_MorQ6_Kif you have extra VRAM and want higher quality reasoning.
Model Details
| Property | Value |
|---|---|
| Original model | empero-ai/Qwen3.5-9B-Claude-Code |
| Base model | empero-ai/Qwen3.5-9B-Claude-Opus-4.6-Distill |
| Fine-tune type | SFT (LoRA r=64/alpha=128, merged) |
| Parameters | 9B |
| Training data | DataClaw — 29 datasets of real Claude Code / Codex sessions |
| Context length | 4096 tokens |
| License | Apache 2.0 |
Usage with llama.cpp
# Basic inference
./llama-cli \
-m Qwen3.5-9B-Claude-Code-Q4_K_M.gguf \
-n 1024 \
--temp 0.7 \
--top-p 0.9 \
-p "<|im_start|>system
You are an expert AI coding assistant. You help users with software engineering tasks including writing code, debugging, refactoring, explaining code, and more. You think through problems carefully inside <think> tags before acting. You have access to tools like Read, Edit, Write, Bash, Grep, Glob, and others to interact with the user's codebase. You use tools when needed and provide clear, concise responses.<|im_end|>
<|im_start|>user
Read the file main.py and explain what it does.<|im_end|>
<|im_start|>assistant
"
# As an OpenAI-compatible server
./llama-server \
-m Qwen3.5-9B-Claude-Code-Q4_K_M.gguf \
--host 0.0.0.0 \
--port 8080 \
-c 4096 \
-ngl 99
Usage with Ollama
Create a Modelfile:
FROM Qwen3.5-9B-Claude-Code-Q4_K_M.gguf
SYSTEM """You are an expert AI coding assistant. You help users with software engineering tasks including writing code, debugging, refactoring, explaining code, and more. You think through problems carefully inside <think> tags before acting. You have access to tools like Read, Edit, Write, Bash, Grep, Glob, and others to interact with the user's codebase. You use tools when needed and provide clear, concise responses. You show your reasoning process and use tools methodically to accomplish tasks."""
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 4096
ollama create codeclawd -f Modelfile
ollama run codeclawd
Usage with LM Studio
- Download the
.gguffile of your chosen quantization - Open LM Studio → Load Model → select the file
- In the system prompt field, paste:
You are an expert AI coding assistant. You help users with software engineering tasks including writing code, debugging, refactoring, explaining code, and more. You think through problems carefully inside <think> tags before acting. You have access to tools like Read, Edit, Write, Bash, Grep, Glob, and others to interact with the user's codebase. You use tools when needed and provide clear, concise responses. You show your reasoning process and use tools methodically to accomplish tasks.
- Set temperature:
0.7, top-p:0.9, context:4096
Agentic Format
This model is trained on the full agentic loop:
user → <think>...</think> → <tool_call> → <tool_result> → response → repeat
Tool Call Format
<tool_call>
<tool>Read</tool>
<input>
{"file_path": "/path/to/file.py"}
</input>
</tool_call>
<tool_result>
<tool>Read</tool>
<output>
... file contents ...
</output>
</tool_result>
Thinking Format
<think>
The user wants to refactor the auth module. I should first read the existing
implementation before suggesting changes.
</think>
Intended Use
- Local agentic coding assistants on consumer hardware
- IDE integrations (Continue, Cursor, etc.) running locally
- Offline / air-gapped development environments
- Research into agentic SFT distillation from real sessions
Limitations
- Tool format is XML-based — not OpenAI function-call JSON
- SFT only (no RLHF or DPO)
- May hallucinate tool outputs if not connected to a real tool executor
- Quantization below Q4 may degrade reasoning quality noticeably
Original Model
- HF (safetensors):
empero-ai/Qwen3.5-9B-Claude-Code
License
Apache 2.0 — see base model for details.
- Downloads last month
- 4,826
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for empero-ai/Qwen3.5-9B-Claude-Code-GGUF
Base model
Qwen/Qwen3.5-9B-Base