🌟Gemopus-4-E4B-it-INT8-FOEM

This is an unofficial quantized version of Gemopus-4-E4B-it.

🧠 Quantization Framework

🗺️ Quantization Method

FOEM (AAAI 2026)

FOEM is an improved quantization method over GPTQ. The resulting model preserves the same inference structure as GPTQ, ensuring compatibility with existing deployment pipelines while achieving better accuracy.

📚 Calibration Dataset

We randomly sampled 512 examples from nohurry/Opus-4.6-Reasoning-3000x-filtered.

📋 Usage Example

This model can be deployed using standard frameworks such as vLLM, just like other GPTQModel-quantized models.

Example evaluation command:

lm-eval --model vllm --model_args pretrained=models/gptqmodel/Gemopus-4-E4B-it-INT8-FOEM,tensor_parallel_size=1,gpu_memory_utilization=0.45 --tasks wikitext --batch_size 1

⚠️ Limitations & Usage Recommendations

(Adapted from the original repository of Jackrong/Gemopus-4-E4B-it)

Compute & Knowledge Boundaries: This model is designed specifically for ultra-fast local inference on edge devices (like thin-and-light laptops and smartphones). Constrained by its smaller parameter size, the breadth of its world knowledge and extremely deep logical reasoning capabilities cannot rival those of hundred-billion-parameter behemoths in the cloud.
Potential Hallucinations: When dealing with extremely obscure domains, niche knowledge, or complex math problems that require multi-step long-chain calculations, hallucinations may still occur.
Best Practices: It is strongly recommended to use it as a local high-frequency text processing assistant, ideal for scenarios involving daily copywriting assistance, code completion, formatting, and summary extraction, especially those that involve privacy or are latency-sensitive.
Disclaimer: This is an experimental weight optimized independently based on edge interaction needs. You are welcome to conduct local deployment testing and academic exchanges at any time.

🙏 Acknowledgements

Special thanks to Jackrong for providing the original model: Gemopus-4-E4B-it.

📖 Citation

If you use this model in your research or projects, please cite:

@misc{jackrong_qwen35_27b_v3
  title        = {Jackrong/Gemopus-4-E4B-it},
  author       = {Jackrong},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Jackrong/Gemopus-4-E4B-it}}
}

@misc{qubitium2024gptqmodel,
  author = {ModelCloud.ai and qubitium@modelcloud.ai},
  title = {GPT-QModel},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/modelcloud/gptqmodel}},
  note = {Contact: qubitium@modelcloud.ai},
  year = {2024},
}

@inproceedings{zheng2026first,
  title={First-order error matters: Accurate compensation for quantized large language models},
  author={Zheng, Xingyu and Qin, Haotong and Li, Yuye and Chu, Haoran and Wang, Jiakai and Guo, Jinyang and Magno, Michele and Liu, Xianglong},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={40},
  number={34},
  pages={28883--28891},
  year={2026}
}

Downloads last month: 37

Safetensors

Model size

8B params

Tensor type

I32

BF16

Model tree for Xingyu-Zheng/Gemopus-4-E4B-it-INT8-FOEM

Base model

Jackrong/Gemopus-4-E4B-it

Quantized

(10)

this model

Dataset used to train Xingyu-Zheng/Gemopus-4-E4B-it-INT8-FOEM

Collection including Xingyu-Zheng/Gemopus-4-E4B-it-INT8-FOEM

FOEM Quantization

Collection

• 20 items • Updated about 7 hours ago • 1