smol-IQ2_XS

by Garf - opened Feb 25

Discussion

Garf

Feb 25

Would you consider making a llama.cpp compatible one? The 397B version really has excellent performance.

ubergarm

Owner Feb 25

@Garf

What kind of CPU/RAM/GPU(s) rig are you targeting?

Thanks, I've been considering adding a mainline compatible mix using all legacy quants e.g. q8_0/q4_0/q4_1 which would likely give the best speed performance on AMD backends and possibly better quality than MXFP4 probably.

Otherwise, check out https://huggingface.co/AesSedai/Qwen3.5-122B-A10B-GGUF who makes basically the same MoE optimized recipes as me but using mainline llama.cpp quantization types.

Garf

Feb 25

24G GPU (Nvidia) + 96GB RAM (Zen3). I've found AesSedai's indeed and am using that.

Garf

Mar 12

AesSedai removed their IQ2_XS quant, so going to repeat this ask.

ubergarm

Owner Mar 12

@Garf

Hey sorry I'm confused, are you looking for Qwen3.5-122B-A10B or Qwen3.5-397B-A17B to fit your 120GB rig?

If you mean 397B, I already have one, right? https://huggingface.co/ubergarm/Qwen3.5-397B-A17B-GGUF#smol-iq2_xs-11341-gib-246-bpw

If you mean a mainline compatible 122B, you can find some of AesSedai's older ones looking through the history as he didn't super squash the repo yet psure e.g.: https://huggingface.co/AesSedai/Qwen3.5-122B-A10B-GGUF/tree/c615dde4fb7f7be2e9ec20aef9d29f985bf6554f/IQ2_XXS

Also bartowski recently re-uploaded a bunch that seem to be pretty good here: https://huggingface.co/bartowski/Qwen_Qwen3.5-122B-A10B-GGUF available in many sizes most all would fit your rig.

Or you can use ik_llama.cpp to run the ones in this repo, I'm using the IQ4_KSS as my "daily driver" for quick questions, limited simple vibe coding scripts with opencode, etc.

Hopefully I'll have access to my quanting remote rig again soon, its down for maintenance tonight.

Garf

Mar 12

The 122B, I have the 397B. Didn't realize I could get the AesSedai one from history!

AesSedai

Mar 13

•

edited Mar 15

@Garf I did upload a new Qwen3.5-122B-A10B IQ2_XXS fused gate+up last night by request (https://huggingface.co/AesSedai/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF/discussions/2#69b3647d52c1a73738445cc5)

It's available here: https://huggingface.co/AesSedai/Qwen3.5-122B-A10B-GGUF/tree/main/IQ2_XXS so no need to go digging through the history :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment