CoPaw-Flash-9B GGUF
GGUF quantizations of agentscope-ai/CoPaw-Flash-9B for use with llama.cpp and compatible tools.
Available Quantizations
| File | Quant | Size | BPW | Description |
|---|---|---|---|---|
CoPaw-Flash-9B-BF16.gguf |
BF16 | 17 GB | 16.0 | Full precision, no quality loss |
CoPaw-Flash-9B-Q8_0.gguf |
Q8_0 | 8.9 GB | 8.5 | Near-lossless quantization |
CoPaw-Flash-9B-Q5_K_M.gguf |
Q5_K_M | 6.1 GB | 5.7 | Good balance of quality and size |
CoPaw-Flash-9B-Q4_K_M.gguf |
Q4_K_M | 5.3 GB | 5.0 | Best for constrained hardware |
Model Details
- Architecture: Qwen3.5 (mixed linear/full attention, 32 layers)
- Parameters: ~9B
- Context: 262,144 tokens
- Base model: Qwen/Qwen3.5-9B
- License: Apache 2.0
Usage with llama.cpp
# Download a quantization
huggingface-cli download heiertech/CoPaw-Flash-9B-GGUF CoPaw-Flash-9B-Q4_K_M.gguf
# Run with llama-server
llama-server -m CoPaw-Flash-9B-Q4_K_M.gguf -ngl 99 -fa
Quantized by
- Downloads last month
- 2,341
Hardware compatibility
Log In to add your hardware
3-bit
4-bit
5-bit
8-bit
16-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support