Qwen3-0.6B Quantized Models
This repository contains three quantized versions of the Qwen3-0.6B model, optimized for different use cases and hardware requirements.
Models Included
1. GGUF Q4_K_M (462 MB)
- Format: GGUF (llama.cpp compatible)
- Quantization: 4-bit K-quant (Q4_K_M)
- Best for: CPU inference, llama.cpp/prima.cpp, resource-constrained environments
- File:
Qwen3-0.6B-GGUF/Qwen3-0.6B.Q4_K_M.gguf
2. GPTQ-Int4 (517 MB)
- Format: Safetensors (HuggingFace Transformers)
- Quantization: 4-bit GPTQ (group_size=128, symmetric)
- Best for: GPU inference with AutoGPTQ or Transformers
- Quantizer: gptqmodel 4.0.0
- Directory:
Qwen3-0.6B-GPTQ-Int4/
3. GPTQ-Int8 (727 MB)
- Format: Safetensors (HuggingFace Transformers)
- Quantization: 8-bit GPTQ (group_size=128, symmetric)
- Best for: Higher accuracy with good compression
- Quantizer: gptqmodel 2.2.0
- Directory:
Qwen3-0.6B-GPTQ-Int8/
Model Specifications
| Feature | Value |
|---|---|
| Base Model | Qwen3-0.6B |
| Parameters | 0.6B |
| Architecture | Qwen3ForCausalLM |
| Hidden Size | 1024 |
| Layers | 28 |
| Attention Heads | 16 |
| KV Heads | 8 |
| Max Context | 40,960 tokens |
| Vocab Size | 151,936 |
Usage
GGUF (llama.cpp / prima.cpp)
# Using prima.cpp
./llama-server -m Qwen3-0.6B-GGUF/Qwen3-0.6B.Q4_K_M.gguf --port 8080
# Using ollama
ollama run qwen3:0.6b
GPTQ (Transformers)
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"Bopalv/Qwen3-0.6B-quantized",
subfolder="Qwen3-0.6B-GPTQ-Int4",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
"Bopalv/Qwen3-0.6B-quantized",
subfolder="Qwen3-0.6B-GPTQ-Int4"
)
Quantization Details
| Model | Bits | Group Size | Symmetric | Format | Size |
|---|---|---|---|---|---|
| GGUF Q4_K_M | 4 | N/A | Yes | GGUF | 462 MB |
| GPTQ-Int4 | 4 | 128 | Yes | Safetensors | 517 MB |
| GPTQ-Int8 | 8 | 128 | Yes | Safetensors | 727 MB |
Original Model
This is a quantized version of Qwen3-0.6B by Qwen Team.
License
Apache 2.0 (same as base model)
- Downloads last month
- 55
Hardware compatibility
Log In to add your hardware
4-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support