LilaRest/gemma-4-31B-it-NVFP4-turbo · Just squeeze 4-5 GB more.

Just squeeze 4-5 GB more.

by emircanerkul - opened 13 days ago

Why not squeeze a bit more to fit 16gb vram gpus? I tested gemma-4-26B-A4B i4-lowest one with 6800xt and got 70tps. I wonder if this full model good or moe one

LilaRest

Owner 12 days ago

This quant targets Blackwell FP4 tensor cores (RTX 5090, PRO 6000, etc.), so it wouldn't benefit AMD GPUs anyway. I've already quantized the most lossless layers. Pushing further might get it closer to 16GB, but you wouldn't have enough space left for the KV cache.

emircanerkul

12 days ago

I see, thank you.

emircanerkul changed discussion status to closed 12 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment