IQ4_XS fits 64G mac when set max vram to 61440

by CoruNethron - opened Oct 24, 2025

Oct 24, 2025

•

edited Oct 24, 2025

Hello. Thank you for the quants.
Just to inform ones who choose right quantization. On 64Gb M1 max macbook, I was able to run IQ4_XS, after setting sudo sysctl iogpu.wired_limit_mb=61440 and closing all other high-RAM apps. Context size set to 65536. Experienced crash after closing llama.cpp with ctrl+c. But before that moment it worked just fine.
Thank you, thank cerebras, thank z.ai, thank Gerganov and all who make this possible.
Great model.

engrtipusultan

Oct 24, 2025

•

edited Oct 24, 2025

It would have been really awesome if Q4s other than IQ were under 50Gb (for vulkan inference) but yes thank you cerebras, z.ai, Gerganov and bartowski.

BingoBird

27 days ago

Thanks also to the people who suggested that ggerganov put bundle-in model metadata (jinja) such as temp, top_k etc into his revised fileformat!

(that's me! :)

CoruNethron

26 days ago

We might expect also higher context size to fit, once TurboQuant lands llama.cpp

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment