CodeClawd — Qwen3.5-9B Claude Code (GGUF)

GGUF quantizations of empero-ai/Qwen3.5-9B-Claude-Code — A supervised fine-tune of empero-ai/Qwen3.5-9B-Claude-Opus-4.6-Distill on real Claude Code / Codex agentic sessions from the DataClaw dataset collection.

The model is trained to behave like a capable coding agent: thinking through problems, calling tools, interpreting results, and responding clearly — mirroring the full agentic loop of a real Claude Code session.

The model thinks with <think> tags, calls tools with XML-style <tool_call> blocks.


Available Quantizations

Filename Quant Size RAM Required Quality
Qwen3.5-9B-Claude-Code-Q2_K.gguf Q2_K ~3.6 GB ~6 GB Lowest — not recommended
Qwen3.5-9B-Claude-Code-Q3_K_M.gguf Q3_K_M ~4.5 GB ~7 GB Low
Qwen3.5-9B-Claude-Code-Q4_0.gguf Q4_0 ~5.2 GB ~8 GB Acceptable
Qwen3.5-9B-Claude-Code-Q4_K_M.gguf Q4_K_M ~5.4 GB ~8 GB Recommended — best size/quality tradeoff
Qwen3.5-9B-Claude-Code-Q5_K_M.gguf Q5_K_M ~6.4 GB ~9 GB High
Qwen3.5-9B-Claude-Code-Q6_K.gguf Q6_K ~7.3 GB ~10 GB Very high
Qwen3.5-9B-Claude-Code-Q8_0.gguf Q8_0 ~9.5 GB ~12 GB Near-lossless
Qwen3.5-9B-Claude-Code-F16.gguf F16 ~18 GB ~22 GB Lossless

Recommended: Q4_K_M for most users. Q5_K_M or Q6_K if you have extra VRAM and want higher quality reasoning.


Model Details

Property Value
Original model empero-ai/Qwen3.5-9B-Claude-Code
Base model empero-ai/Qwen3.5-9B-Claude-Opus-4.6-Distill
Fine-tune type SFT (LoRA r=64/alpha=128, merged)
Parameters 9B
Training data DataClaw — 29 datasets of real Claude Code / Codex sessions
Context length 4096 tokens
License Apache 2.0

Usage with llama.cpp

# Basic inference
./llama-cli \
  -m Qwen3.5-9B-Claude-Code-Q4_K_M.gguf \
  -n 1024 \
  --temp 0.7 \
  --top-p 0.9 \
  -p "<|im_start|>system
You are an expert AI coding assistant. You help users with software engineering tasks including writing code, debugging, refactoring, explaining code, and more. You think through problems carefully inside <think> tags before acting. You have access to tools like Read, Edit, Write, Bash, Grep, Glob, and others to interact with the user's codebase. You use tools when needed and provide clear, concise responses.<|im_end|>
<|im_start|>user
Read the file main.py and explain what it does.<|im_end|>
<|im_start|>assistant
"
# As an OpenAI-compatible server
./llama-server \
  -m Qwen3.5-9B-Claude-Code-Q4_K_M.gguf \
  --host 0.0.0.0 \
  --port 8080 \
  -c 4096 \
  -ngl 99

Usage with Ollama

Create a Modelfile:

FROM Qwen3.5-9B-Claude-Code-Q4_K_M.gguf

SYSTEM """You are an expert AI coding assistant. You help users with software engineering tasks including writing code, debugging, refactoring, explaining code, and more. You think through problems carefully inside <think> tags before acting. You have access to tools like Read, Edit, Write, Bash, Grep, Glob, and others to interact with the user's codebase. You use tools when needed and provide clear, concise responses. You show your reasoning process and use tools methodically to accomplish tasks."""

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 4096
ollama create codeclawd -f Modelfile
ollama run codeclawd

Usage with LM Studio

  1. Download the .gguf file of your chosen quantization
  2. Open LM Studio → Load Model → select the file
  3. In the system prompt field, paste:
You are an expert AI coding assistant. You help users with software engineering tasks including writing code, debugging, refactoring, explaining code, and more. You think through problems carefully inside <think> tags before acting. You have access to tools like Read, Edit, Write, Bash, Grep, Glob, and others to interact with the user's codebase. You use tools when needed and provide clear, concise responses. You show your reasoning process and use tools methodically to accomplish tasks.
  1. Set temperature: 0.7, top-p: 0.9, context: 4096

Agentic Format

This model is trained on the full agentic loop:

user → <think>...</think> → <tool_call> → <tool_result> → response → repeat

Tool Call Format

<tool_call>
<tool>Read</tool>
<input>
{"file_path": "/path/to/file.py"}
</input>
</tool_call>

<tool_result>
<tool>Read</tool>
<output>
... file contents ...
</output>
</tool_result>

Thinking Format

<think>
The user wants to refactor the auth module. I should first read the existing
implementation before suggesting changes.
</think>

Intended Use

  • Local agentic coding assistants on consumer hardware
  • IDE integrations (Continue, Cursor, etc.) running locally
  • Offline / air-gapped development environments
  • Research into agentic SFT distillation from real sessions

Limitations

  • Tool format is XML-based — not OpenAI function-call JSON
  • SFT only (no RLHF or DPO)
  • May hallucinate tool outputs if not connected to a real tool executor
  • Quantization below Q4 may degrade reasoning quality noticeably

Original Model

License

Apache 2.0 — see base model for details.

Downloads last month
4,826
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for empero-ai/Qwen3.5-9B-Claude-Code-GGUF

Finetuned
Qwen/Qwen3.5-9B
Quantized
(1)
this model

Datasets used to train empero-ai/Qwen3.5-9B-Claude-Code-GGUF

Collection including empero-ai/Qwen3.5-9B-Claude-Code-GGUF