smol-IQ2_XS

#3
by Garf - opened

Would you consider making a llama.cpp compatible one? The 397B version really has excellent performance.

@Garf

What kind of CPU/RAM/GPU(s) rig are you targeting?

Thanks, I've been considering adding a mainline compatible mix using all legacy quants e.g. q8_0/q4_0/q4_1 which would likely give the best speed performance on AMD backends and possibly better quality than MXFP4 probably.

Otherwise, check out https://huggingface.co/AesSedai/Qwen3.5-122B-A10B-GGUF who makes basically the same MoE optimized recipes as me but using mainline llama.cpp quantization types.

24G GPU (Nvidia) + 96GB RAM (Zen3). I've found AesSedai's indeed and am using that.

AesSedai removed their IQ2_XS quant, so going to repeat this ask.

@Garf

Hey sorry I'm confused, are you looking for Qwen3.5-122B-A10B or Qwen3.5-397B-A17B to fit your 120GB rig?

If you mean 397B, I already have one, right? https://huggingface.co/ubergarm/Qwen3.5-397B-A17B-GGUF#smol-iq2_xs-11341-gib-246-bpw

If you mean a mainline compatible 122B, you can find some of AesSedai's older ones looking through the history as he didn't super squash the repo yet psure e.g.: https://huggingface.co/AesSedai/Qwen3.5-122B-A10B-GGUF/tree/c615dde4fb7f7be2e9ec20aef9d29f985bf6554f/IQ2_XXS

Also bartowski recently re-uploaded a bunch that seem to be pretty good here: https://huggingface.co/bartowski/Qwen_Qwen3.5-122B-A10B-GGUF available in many sizes most all would fit your rig.

Or you can use ik_llama.cpp to run the ones in this repo, I'm using the IQ4_KSS as my "daily driver" for quick questions, limited simple vibe coding scripts with opencode, etc.

Hopefully I'll have access to my quanting remote rig again soon, its down for maintenance tonight.

The 122B, I have the 397B. Didn't realize I could get the AesSedai one from history!

@Garf I did upload a new Qwen3.5-122B-A10B IQ2_XXS fused gate+up last night by request (https://huggingface.co/AesSedai/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF/discussions/2#69b3647d52c1a73738445cc5)

It's available here: https://huggingface.co/AesSedai/Qwen3.5-122B-A10B-GGUF/tree/main/IQ2_XXS so no need to go digging through the history :)

Sign up or log in to comment