How to Run GLM-5 Locally Guide! π₯
Hey guys we made many tutorials on running our dynamic GGUFs, FP8 quants and how to use the model in Claude Code, OpenAI Codex, how to tool-call with the model and more!
Guide: https://unsloth.ai/docs/models/glm-5

getting a solid 12 tk/s on my server with q4_k_xl at 60k context! as always thanks for the quick quants, this model is great !
getting a solid 12 tk/s on my server with q4_k_xl at 60k context! as always thanks for the quick quants, this model is great !
What kind of hardware are you running?
What kind of hardware are you running?
3x3090, 1x4090, Intel QYFS / ASUS Sage W790E Motherboard with 512GB DDR5
getting a solid 12 tk/s on my server with q4_k_xl at 60k context! as always thanks for the quick quants, this model is great !
There is GLM-5-NVFP4 under 440GB which is about the same size and may run much much faster than 12 tk/s