Logo

PreSINQ GGUF Quantized Qwen3-1.7B Model

This repository contains the official PreSINQ GGUF-quantized versions of the Qwen3-1.7B model. For a detailed explanation of PreSINQ strategy please refer to the the official SINQ repository. SINQ is a fast and high-quality quantization technique designed to significantly reduce Large Language Model size while preserving accuracy.

If you find this project useful, please consider giving a ⭐ to the official SINQ repository.

Model Details

Model Name: Qwen3-1.7B-PreSINQ-GGUF
Base Model: Qwen/Qwen3-1.7B
Task: Text Generation
Framework: PyTorch / Transformers
License: Apache-2.0
Quantized By: Huawei – Computing Systems Lab

How to Obtain the PreSINQ Model

The PreSINQ Qwen3-1.7B models are produced using the PreSINQ GGUF script available in the official SINQ repository.

The models provided here correspond to the best-performing configurations for each quantization type.

📊 Best PreSINQ Quantization Results (Qwen3-1.7B)

Results below are measured on the WikiText-2 test set.

Method	Bits	Size (GB)	Perplexity ↓
Baseline (FP16)	FP16	3.79	17.1294
Baseline + Q4_K_S	4-bit	1.15	19.5454
PreSINQ + Q4_K_S	4-bit	1.01	17.4544
Baseline + Q3_K_S	3-bit	0.95	24.0242
PreSINQ + Q3_K_S	3-bit	0.83	18.8032

However, you can generate good PreSINQ models (not the best one) faster by reducing the number of configurations explored during the PreSINQ script execution. The table below shows perplexity for different PreSINQ parameter configurations using Q4_K_S quantization.
Evaluation is performed on a 5k-line subset of the Pile validation dataset.

Group Size	Iterations	Repetitions	Perplexity
32	2	1	11.7196
32	4	1	11.7238
32	8	1	11.6885
32	16	1	11.6909
64	2	1	11.7421
64	4	1	11.7240
64	8	1	11.6975
64	16	1	11.7001
128	2	1	11.7129
128	4	1	11.7118
128	8	1	11.7149
128	16	1	11.7208

🚀 Usage

Usage Example

You can load and run the PreSINQ GGUF models using:

🤗 Transformers
llama.cpp
Any GGUF-compatible inference framework

🧾 How to Cite This Work

If you find SINQ useful in your research or applications:

Please give a ⭐ to the official SINQ repository
Cite our paper:

@misc{muller2025sinq,
      title={SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights}, 
      author={Lorenz K. Muller and Philippe Bich and Jiawei Zhuang and Ahmet Celik and Luca Benfenati and Lukas Cavigelli},
      year={2025},
      eprint={2509.22944},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={http://arxiv.org/abs/2509.22944}
}