Run Kimi 2.6 Guide!

#5
by danielhanchen - opened

Hey guys! Kimi K2.6 can now run on CPU, GPU and SSD setups! πŸ”₯ We may upload 1-bit and 3-bit quants later depending on KLD scores.

We shrank the SOTA 1T model to 340GB via Dynamic GGUFs where important layers are upcasted.
Run at >40 tok/s on 350GB RAM/VRAM setups. Run full precision on 610 GB.

UD-Q8_K_XL is lossless because Kimi uses int4 for MoE weights and BF16 for everything else, and Q8_K_XL follows that. UD-Q4_K_XL is similar except the remaining tensors are Q8_0, so it is near full precision and requires 600GB RAM/VRAM. Other non-Unsloth Q8 GGUFs may follow the UD-Q4_K_XL approach rather than the 'truly lossless' UD-Q8_K_XL.

Guide: https://unsloth.ai/docs/models/kimi-k2.6

kimi k2.6 infographic
danielhanchen pinned discussion

Thanks again for staying on top of all the model releases! (although curious how come you never released anything for Step 3.5 Flash?)

Kimi2.6 is an interesting model in terms of the quantization size for Mac Studio users - Both Q4 and Q8 won't fit the 512GB model, and the Q2 won't fit on the 256GB, but leaves lots of space available on the 512GB.

Really curious to see what Q3 looks like.

Also, is there anything that could be done to make UD_Q2 "bigger" (i.e. more accurate), or is UD_Q2_K_XL as "good as it gets" for a Q2.

Sign up or log in to comment