How to Run GLM-5 Locally Guide! πŸ”₯

#4
by danielhanchen - opened
Unsloth AI org
β€’
edited Feb 15

Hey guys we made many tutorials on running our dynamic GGUFs, FP8 quants and how to use the model in Claude Code, OpenAI Codex, how to tool-call with the model and more!

Guide: https://unsloth.ai/docs/models/glm-5

glm-5-guide

danielhanchen pinned discussion

getting a solid 12 tk/s on my server with q4_k_xl at 60k context! as always thanks for the quick quants, this model is great !

getting a solid 12 tk/s on my server with q4_k_xl at 60k context! as always thanks for the quick quants, this model is great !

What kind of hardware are you running?

What kind of hardware are you running?

3x3090, 1x4090, Intel QYFS / ASUS Sage W790E Motherboard with 512GB DDR5

getting a solid 12 tk/s on my server with q4_k_xl at 60k context! as always thanks for the quick quants, this model is great !

There is GLM-5-NVFP4 under 440GB which is about the same size and may run much much faster than 12 tk/s

Sign up or log in to comment