any love for 16gb?

by iucpxleps - opened 13 days ago

Discussion

iucpxleps

13 days ago

any smaller versions for us in limbo with 16GB vram :)

zekromVale

9 days ago

This is a MoE model and only has 4 active parameters at a time. Instead of offloading layers offload some experts with llama cpp moe-cpu or n-moe-cpu. I can easily get 40tps with the normal gemma4 at Q6 with just 16GB vram. (Had other issues slowing me down and dropped to IQ4_XS now and get around 80tps. It was literally internal VRAM bus bound not regular VRAM bound). Still, an IQ4_XS would be nice. Going to give this a shot soon with my 5070ti.

https://huggingface.co/blog/Doctor-Shotgun/llamacpp-moe-offload-guide

iucpxleps

9 days ago

thanks i will try that way

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment