Guide to run Kimi K2.6 on CPU, GPU and SSD setups! 🔥

#17
by danielhanchen - opened

Hey guys, Kimi K2.6 can now run on CPU, GPU and SSD setups! 🔥 We may upload 1-bit and 3-bit quants later depending on KLD scores.

Kimi K2.6 GGUFs to run: https://huggingface.co/unsloth/Kimi-K2.6-GGUF

We shrank the SOTA 1T model to 340GB via Dynamic GGUFs where important layers are upcasted.
Run at >40 tok/s on 350GB RAM/VRAM setups. Run full precision on 610 GB.

UD-Q8_K_XL is lossless because Kimi uses int4 for MoE weights and BF16 for everything else, and Q8_K_XL follows that. UD-Q4_K_XL is similar except the remaining tensors are Q8_0, so it is near full precision and requires 600GB RAM/VRAM. Other non-Unsloth Q8 GGUFs may follow the UD-Q4_K_XL approach rather than the 'truly lossless' UD-Q8_K_XL.

Guide: https://unsloth.ai/docs/models/kimi-k2.6

kimi k2.6 infographic

Interesting, thanks!

Sign up or log in to comment