FOEM Quantization
Collection
• 20 items • Updated • 1
The 31B version may encounter an error at language_model.model.layers.5.self_attn.qkv_proj when deployed with vLLM.
This issue appears to originate from GPTQModel, as it does not occur in the E2B version. We are currently investigating and working on a fix.
FOEM is an improved quantization method over GPTQ. The resulting model preserves the same inference structure as GPTQ, ensuring compatibility with existing deployment pipelines while achieving better accuracy.
We randomly sampled 512 examples from nohurry/Opus-4.6-Reasoning-3000x-filtered.
This model can be deployed using standard frameworks such as vLLM, just like other GPTQModel-quantized models.
Base model
google/gemma-4-31B-it