Logo

πŸ™ Github   |   πŸ“„ Paper

PreSINQ GGUF Quantized Qwen3-1.7B Model

This repository contains the official PreSINQ GGUF-quantized versions of the Qwen3-1.7B model. For a detailed explanation of PreSINQ strategy please refer to the the official SINQ repository. SINQ is a fast and high-quality quantization technique designed to significantly reduce Large Language Model size while preserving accuracy.

If you find this project useful, please consider giving a ⭐ to the official SINQ repository.


Model Details

  • Model Name: Qwen3-1.7B-PreSINQ-GGUF
  • Base Model: Qwen/Qwen3-1.7B
  • Task: Text Generation
  • Framework: PyTorch / Transformers
  • License: Apache-2.0
  • Quantized By: Huawei – Computing Systems Lab

How to Obtain the PreSINQ Model

The PreSINQ Qwen3-1.7B models are produced using the PreSINQ GGUF script available in the official SINQ repository.

The models provided here correspond to the best-performing configurations for each quantization type.

πŸ“Š Best PreSINQ Quantization Results (Qwen3-1.7B)

Results below are measured on the WikiText-2 test set.

Method Bits Size (GB) Perplexity ↓
Baseline (FP16) FP16 3.79 17.1294
Baseline + Q4_K_S 4-bit 1.15 19.5454
PreSINQ + Q4_K_S 4-bit 1.01 17.4544
Baseline + Q3_K_S 3-bit 0.95 24.0242
PreSINQ + Q3_K_S 3-bit 0.83 18.8032

However, you can generate good PreSINQ models (not the best one) faster by reducing the number of configurations explored during the PreSINQ script execution. The table below shows perplexity for different PreSINQ parameter configurations using Q4_K_S quantization.
Evaluation is performed on a 5k-line subset of the Pile validation dataset.

Group Size Iterations Repetitions Perplexity
32 2 1 11.7196
32 4 1 11.7238
32 8 1 11.6885
32 16 1 11.6909
64 2 1 11.7421
64 4 1 11.7240
64 8 1 11.6975
64 16 1 11.7001
128 2 1 11.7129
128 4 1 11.7118
128 8 1 11.7149
128 16 1 11.7208

πŸš€ Usage

Usage Example

You can load and run the PreSINQ GGUF models using:

  • πŸ€— Transformers
  • llama.cpp
  • Any GGUF-compatible inference framework

🧾 How to Cite This Work

If you find SINQ useful in your research or applications:

  • Please give a ⭐ to the official SINQ repository
  • Cite our paper:
@misc{muller2025sinq,
      title={SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights}, 
      author={Lorenz K. Muller and Philippe Bich and Jiawei Zhuang and Ahmet Celik and Luca Benfenati and Lukas Cavigelli},
      year={2025},
      eprint={2509.22944},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={http://arxiv.org/abs/2509.22944}
}
Downloads last month
35
GGUF
Model size
2B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for huawei-csl/Qwen3-1.7B-PreSINQ-GGUF

Finetuned
Qwen/Qwen3-1.7B
Quantized
(254)
this model

Collection including huawei-csl/Qwen3-1.7B-PreSINQ-GGUF

Paper for huawei-csl/Qwen3-1.7B-PreSINQ-GGUF