Requested 122B-A10B-FP8-abliterated please and thanks

by easonchow0419 - opened Mar 8

Discussion

easonchow0419

Mar 8

Requested 122B-A10B-FP8-abliterated please and thanks

wangzhang

Owner Mar 9

will do it later this week, thx for asking!

wangzhang

Owner 10 days ago

https://huggingface.co/wangzhang/Qwen3.5-122B-A10B-abliterated-GGUF

pietro-melella

10 days ago

Hi Steve, thanks for the GGUF — really appreciate the work on abliterix.

I'm also looking for an FP8 safetensors version specifically for vLLM serving (need parallel tool calling + continuous batching for production).

I've spent a few weeks trying to quantize both heretic and abliterix to FP8 on 2×H100 80GB. Here's what I found:

vLLM on-the-fly --quantization fp8 works perfectly with heretic (good quality, full inference) but not with abliterix
llm-compressor offline FP8 produces a checkpoint but it gives garbled output — the MoE gates, GatedDeltaNet attention, and shared experts need to stay in BF16 (there's an 8-pattern ignore list required), plus there's a transformers version conflict (llm-compressor pins ≤4.57 but qwen3_5_moe needs ≥5.2)
save_sharded_state can't persist the on-the-fly FP8 to disk — vLLM doesn't save the FP8 attention scales (q_scale, k_scale, v_scale)
Qwen has no official self-quantization guide — they only provide pre-quantized base (censored) model
I have working scripts, patches, and a full write-up of everything I tried. Would you be open to connecting to discuss this? I'd love to collaborate on getting a proper FP8 abliterix checkpoint out there.

Happy to share everything I have — just let me know the best way to reach you.

pietro-melella

about 15 hours ago

BTW, I did it, so there is no need anymore.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment