FOEM Quantization
Collection
โข 19 items โข Updated โข 1
This is an unofficial quantized version of Qwopus3.5-9B-v3.
FOEM is an improved quantization method over GPTQ. The resulting model preserves the same inference structure as GPTQ, ensuring compatibility with existing deployment pipelines while achieving better accuracy.
We randomly sampled 512 examples from nohurry/Opus-4.6-Reasoning-3000x-filtered.
This model can be deployed using standard frameworks such as vLLM, just like other GPTQModel-quantized models.
Example evaluation command:
lm-eval --model vllm --model_args pretrained=models/gptqmodel/Qwopus3.5-9B-v3-INT8-FOEM,tensor_parallel_size=1,gpu_memory_utilization=0.45 --tasks wikitext --batch_size 1
(Adapted from the original repository of Jackrong/Qwopus3.5-9B-v3)
Special thanks to Jackrong for providing the original model: Qwopus3.5-9B-v3.
If you use this model in your research or projects, please cite:
@misc{jackrong_qwen35_9b_v3
title = {Jackrong/Qwopus3.5-9B-v3},
author = {Jackrong},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Jackrong/Qwopus3.5-9B-v3}}
}
@misc{qubitium2024gptqmodel,
author = {ModelCloud.ai and qubitium@modelcloud.ai},
title = {GPT-QModel},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/modelcloud/gptqmodel}},
note = {Contact: qubitium@modelcloud.ai},
year = {2024},
}
@inproceedings{zheng2026first,
title={First-order error matters: Accurate compensation for quantized large language models},
author={Zheng, Xingyu and Qin, Haotong and Li, Yuye and Chu, Haoran and Wang, Jiakai and Guo, Jinyang and Magno, Michele and Liu, Xianglong},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={40},
number={34},
pages={28883--28891},
year={2026}
}
Base model
Qwen/Qwen3.5-9B-Base