๐ŸŒŸGemopus-4-E4B-it-INT8-FOEM

This is an unofficial quantized version of Gemopus-4-E4B-it.

๐Ÿง  Quantization Framework

GPTQModel

๐Ÿ—บ๏ธ Quantization Method

FOEM (AAAI 2026)

FOEM is an improved quantization method over GPTQ. The resulting model preserves the same inference structure as GPTQ, ensuring compatibility with existing deployment pipelines while achieving better accuracy.

๐Ÿ“š Calibration Dataset

We randomly sampled 512 examples from nohurry/Opus-4.6-Reasoning-3000x-filtered.

๐Ÿ“‹ Usage Example

This model can be deployed using standard frameworks such as vLLM, just like other GPTQModel-quantized models.

Example evaluation command:

lm-eval --model vllm --model_args pretrained=models/gptqmodel/Gemopus-4-E4B-it-INT8-FOEM,tensor_parallel_size=1,gpu_memory_utilization=0.45 --tasks wikitext --batch_size 1

โš ๏ธ Limitations & Usage Recommendations

(Adapted from the original repository of Jackrong/Gemopus-4-E4B-it)

  • Compute & Knowledge Boundaries: This model is designed specifically for ultra-fast local inference on edge devices (like thin-and-light laptops and smartphones). Constrained by its smaller parameter size, the breadth of its world knowledge and extremely deep logical reasoning capabilities cannot rival those of hundred-billion-parameter behemoths in the cloud.
  • Potential Hallucinations: When dealing with extremely obscure domains, niche knowledge, or complex math problems that require multi-step long-chain calculations, hallucinations may still occur.
  • Best Practices: It is strongly recommended to use it as a local high-frequency text processing assistant, ideal for scenarios involving daily copywriting assistance, code completion, formatting, and summary extraction, especially those that involve privacy or are latency-sensitive.
  • Disclaimer: This is an experimental weight optimized independently based on edge interaction needs. You are welcome to conduct local deployment testing and academic exchanges at any time.

๐Ÿ™ Acknowledgements

Special thanks to Jackrong for providing the original model: Gemopus-4-E4B-it.

๐Ÿ“– Citation

If you use this model in your research or projects, please cite:

@misc{jackrong_qwen35_27b_v3
  title        = {Jackrong/Gemopus-4-E4B-it},
  author       = {Jackrong},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Jackrong/Gemopus-4-E4B-it}}
}
@misc{qubitium2024gptqmodel,
  author = {ModelCloud.ai and qubitium@modelcloud.ai},
  title = {GPT-QModel},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/modelcloud/gptqmodel}},
  note = {Contact: qubitium@modelcloud.ai},
  year = {2024},
}
@inproceedings{zheng2026first,
  title={First-order error matters: Accurate compensation for quantized large language models},
  author={Zheng, Xingyu and Qin, Haotong and Li, Yuye and Chu, Haoran and Wang, Jiakai and Guo, Jinyang and Magno, Michele and Liu, Xianglong},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={40},
  number={34},
  pages={28883--28891},
  year={2026}
}
Downloads last month
37
Safetensors
Model size
8B params
Tensor type
I32
ยท
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Xingyu-Zheng/Gemopus-4-E4B-it-INT8-FOEM

Quantized
(10)
this model

Dataset used to train Xingyu-Zheng/Gemopus-4-E4B-it-INT8-FOEM

Collection including Xingyu-Zheng/Gemopus-4-E4B-it-INT8-FOEM