Qwen2.5-0.5B-PreSINQ GGUF

Pre-SINQ (Sinkhorn Normalization) applied to Qwen2.5-0.5B, converted to GGUF and quantized.

What is Pre-SINQ?

Pre-SINQ applies Sinkhorn-inspired weight reparameterization to make model weights easier to quantize. The model output is mathematically identical to the original - no accuracy loss.

Available Quantizations

File	Size	Quality
`qwen25-0.5b-presinq-f16.gguf`	949M	Perfect (reference)
`qwen25-0.5b-presinq-q8_0.gguf`	507M	Perfect
`qwen25-0.5b-presinq-q6_k.gguf`	483M	Perfect
`qwen25-0.5b-presinq-q5_k_m.gguf`	401M	Perfect
`qwen25-0.5b-presinq-q5_k_s.gguf`	394M	Perfect
`qwen25-0.5b-presinq-q5_1.gguf`	400M	Perfect
`qwen25-0.5b-presinq-q5_0.gguf`	379M	Perfect
`qwen25-0.5b-presinq-q4_k_m.gguf`	380M	Perfect
`qwen25-0.5b-presinq-q4_k_s.gguf`	368M	Perfect
`qwen25-0.5b-presinq-q4_1.gguf`	358M	Good
`qwen25-0.5b-presinq-q4_0.gguf`	336M	Good

All quantizations Q5_0 and above produce identical output to F16. Q4 variants show minimal degradation.

Usage with prima.cpp

# Download a quantization
# Then run:
llama-cli -m qwen25-0.5b-presinq-q4_k_m.gguf -p "Hello" -n 64

Based On

License

Apache 2.0

Downloads last month: 331

GGUF

Model size

0.5B params

Architecture

qwen2

Hardware compatibility

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support