🌟Qwen3.5-9B-GLM5.1-Distill-v1-INT8-FOEM

This is an unofficial quantized version of Qwen3.5-9B-GLM5.1-Distill-v1.

🧠 Quantization Framework

GPTQModel

🗺️ Quantization Method

FOEM (AAAI 2026)

FOEM is an improved quantization method over GPTQ. The resulting model preserves the same inference structure as GPTQ, ensuring compatibility with existing deployment pipelines while achieving better accuracy.

📚 Calibration Dataset

We randomly sampled 512 examples from nohurry/Opus-4.6-Reasoning-3000x-filtered.

📋 Usage Example

This model can be deployed using standard frameworks such as vLLM, just like other GPTQModel-quantized models.

Example evaluation command:

lm-eval --model vllm --model_args pretrained=models/gptqmodel/Qwen3.5-9B-GLM5.1-Distill-v1-INT8-FOEM,tensor_parallel_size=1,gpu_memory_utilization=0.45 --tasks wikitext --batch_size 1

⚠️ Limitations & Intended Use

(Adapted from the original repository of Jackrong/Qwopus3.5-27B-v3)

Hallucination Risk: While reasoning is strong, the model remains an autoregressive LLM; external facts provided during the thinking sequence may occasionally contain hallucinations if verifying real-world events.
Intended Scenario: Best suited for offline analytical tasks, coding, math, and heavy logic-dependent prompting where the user needs to transparently follow the AI's internal logic.
This model is a test version intended solely for learning and demonstration purposes, and is for academic research and technical exploration use only.
Developer Disclaimer: This is an independent, personal project. Since the developer lacks the specialized technical resources and infrastructure of a large-scale industrial lab, the model's reasoning chain (CoT) may occasionally exhibit instability, logic loops, or reasoning drift. Users are advised to use this model with these experimental limitations in mind.

🙏 Acknowledgements

Special thanks to Jackrong for providing the original model: Qwen3.5-9B-GLM5.1-Distill-v1.

📖 Citation

If you use this model in your research or projects, please cite:

@misc{jackrong_qwen35_27b_v3
  title        = {Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1},
  author       = {Jackrong},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Jackrong/Qwen3.5-9B-GLM5.1-Distill-v1}}
}

@misc{qubitium2024gptqmodel,
  author = {ModelCloud.ai and qubitium@modelcloud.ai},
  title = {GPT-QModel},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/modelcloud/gptqmodel}},
  note = {Contact: qubitium@modelcloud.ai},
  year = {2024},
}

@inproceedings{zheng2026first,
  title={First-order error matters: Accurate compensation for quantized large language models},
  author={Zheng, Xingyu and Qin, Haotong and Li, Yuye and Chu, Haoran and Wang, Jiakai and Guo, Jinyang and Magno, Michele and Liu, Xianglong},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={40},
  number={34},
  pages={28883--28891},
  year={2026}
}