Qwen2.5-0.5B-PreSINQ GGUF

Pre-SINQ (Sinkhorn Normalization) applied to Qwen2.5-0.5B, converted to GGUF and quantized.

What is Pre-SINQ?

Pre-SINQ applies Sinkhorn-inspired weight reparameterization to make model weights easier to quantize. The model output is mathematically identical to the original - no accuracy loss.

Available Quantizations

File Size Quality
qwen25-0.5b-presinq-f16.gguf 949M Perfect (reference)
qwen25-0.5b-presinq-q8_0.gguf 507M Perfect
qwen25-0.5b-presinq-q6_k.gguf 483M Perfect
qwen25-0.5b-presinq-q5_k_m.gguf 401M Perfect
qwen25-0.5b-presinq-q5_k_s.gguf 394M Perfect
qwen25-0.5b-presinq-q5_1.gguf 400M Perfect
qwen25-0.5b-presinq-q5_0.gguf 379M Perfect
qwen25-0.5b-presinq-q4_k_m.gguf 380M Perfect
qwen25-0.5b-presinq-q4_k_s.gguf 368M Perfect
qwen25-0.5b-presinq-q4_1.gguf 358M Good
qwen25-0.5b-presinq-q4_0.gguf 336M Good

All quantizations Q5_0 and above produce identical output to F16. Q4 variants show minimal degradation.

Usage with prima.cpp

# Download a quantization
# Then run:
llama-cli -m qwen25-0.5b-presinq-q4_k_m.gguf -p "Hello" -n 64

Based On

License

Apache 2.0

Downloads last month
331
GGUF
Model size
0.5B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support