Comparing with Official GPTQ-Int4 quantized model?

by haili-tian - opened 24 days ago

This quantization method and approach are essentially consistent with the officially released GPTQ-Int4 — only the routed experts are quantized, while the rest remain in BF16/FP16.

May I ask:

Where do the original model/weights come from -- Qwen3.5’s BF16 model, GPTQ‑Int4, or something else (e.g., one of Unsloth quantized ggufs)?
Have benchmarks been conducted comparing them (MXFP4_MOE_BF16/MXFP4_MOE_FP16) with the Qwen3.5 GPTQ-Int4 model?

noctrex

Owner 23 days ago

I used unsloth's BF16 model for it, and AFAIK there are no benchmarks with GPTQ, I only run llama.cpp on my machine

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment