Possible future 32 GB single-GPU variant?

#31
by vanbukin - opened

Your model delivers phenomenal results for its size. If possible, I’d love to see a future version that fits on a single RTX 5090 (32 GB) together with the KV cache. An NVFP4 version would be especially valuable if that is feasible.

Thank you for the great work!

Why future and not this one, there's at least 24 variants of NVFP4 quantization of this model as of today?

To check: open "Model Card" tab, click on the link "Quantizations 254 models", filter by "NVFP4": 24 results.

Sign up or log in to comment