CoPaw-Flash-9B GGUF

GGUF quantizations of agentscope-ai/CoPaw-Flash-9B for use with llama.cpp and compatible tools.

Available Quantizations

File	Quant	Size	BPW	Description
`CoPaw-Flash-9B-BF16.gguf`	BF16	17 GB	16.0	Full precision, no quality loss
`CoPaw-Flash-9B-Q8_0.gguf`	Q8_0	8.9 GB	8.5	Near-lossless quantization
`CoPaw-Flash-9B-Q5_K_M.gguf`	Q5_K_M	6.1 GB	5.7	Good balance of quality and size
`CoPaw-Flash-9B-Q4_K_M.gguf`	Q4_K_M	5.3 GB	5.0	Best for constrained hardware

Model Details

Architecture: Qwen3.5 (mixed linear/full attention, 32 layers)
Parameters: ~9B
Context: 262,144 tokens
Base model: Qwen/Qwen3.5-9B
License: Apache 2.0

Usage with llama.cpp

# Download a quantization
huggingface-cli download heiertech/CoPaw-Flash-9B-GGUF CoPaw-Flash-9B-Q4_K_M.gguf

# Run with llama-server
llama-server -m CoPaw-Flash-9B-Q4_K_M.gguf -ngl 99 -fa