🌟 Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-INT4-FOEM

This is an unofficial quantized version of Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled.

🧠 Quantization Framework

GPTQModel

🗺️ Quantization Method

FOEM (AAAI 2026)

FOEM is an improved quantization method over GPTQ. The resulting model preserves the same inference structure as GPTQ, ensuring compatibility with existing deployment pipelines while achieving better accuracy.

📚 Calibration Dataset

We randomly sampled 512 examples from nohurry/Opus-4.6-Reasoning-3000x-filtered.

📋 Usage Example

This model can be deployed using standard frameworks such as vLLM, just like other GPTQModel-quantized models.

Example evaluation command:

lm-eval --model vllm --model_args pretrained=models/gptqmodel/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-int4-foem,tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.45 --tasks wikitext --batch_size 1

Note: Running the latest version of vLLM directly may lead to errors. Please refer to the following discussion for a workaround: https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled/discussions/68

⚠️ Limitations & Intended Use

(Adapted from the original repository of Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled)

  • Hallucination Risk: While reasoning is strong, the model remains an autoregressive LLM; external facts provided during the thinking sequence may occasionally contain hallucinations if verifying real-world events.
  • Intended Scenario: Best suited for offline analytical tasks, coding, math, and heavy logic-dependent prompting where the user needs to transparently follow the AI's internal logic.
  • Preview Version Notice: Because this model is relatively new and intentionally lightweight, the surrounding ecosystem — including inference templates, fine-tuning pipelines, routing configurations, and tooling integrations — may not yet be fully mature or standardized. As a result, users may encounter occasional bugs, compatibility inconsistencies, or integration edge cases. The current release should be considered a preview build while the broader architectural stack and supporting utilities continue to stabilize and improve.

🙏 Acknowledgements

Special thanks to Jackrong for providing the original model: Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled

📖 Citation

If you use this model in your research or projects, please cite:

@misc{jackrong_qwen35_opus_distilled,
  title        = {Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled},
  author       = {Jackrong},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled}}
}
@misc{qubitium2024gptqmodel,
  author = {ModelCloud.ai and qubitium@modelcloud.ai},
  title = {GPT-QModel},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/modelcloud/gptqmodel}},
  note = {Contact: qubitium@modelcloud.ai},
  year = {2024},
}
@inproceedings{zheng2026first,
  title={First-order error matters: Accurate compensation for quantized large language models},
  author={Zheng, Xingyu and Qin, Haotong and Li, Yuye and Chu, Haoran and Wang, Jiakai and Guo, Jinyang and Magno, Michele and Liu, Xianglong},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={40},
  number={34},
  pages={28883--28891},
  year={2026}
}
Downloads last month
1,302
Safetensors
Model size
27B params
Tensor type
BF16
·
I32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Xingyu-Zheng/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-INT4-FOEM

Dataset used to train Xingyu-Zheng/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-INT4-FOEM

Collection including Xingyu-Zheng/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-INT4-FOEM

Paper for Xingyu-Zheng/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-INT4-FOEM