Gguf version please

#1
by ryg81 - opened

Please add gguf version like q4 or q5 for consumer gpus

Thanks a lot for the feedback and for trying out the model. I agree that the ~60GB VRAM peak is a significant bottleneck for consumer hardware. While GGUF quantization is technically feasible and highly effective for dropping VRAM, my current focus has been shifted to other projects, so I don't have the bandwidth to maintain GGUF quants right now.

Since the model weights are fully open source, I highly encourage and welcome anyone in the community to quantize it. If you or anyone else manages to create a GGUF version (e.g., using city96's conversion tools), please let me know, I would be more than happy to feature it and link to your repo in our page.

Sign up or log in to comment