Thanks for quantize this model! could you further quantize it into 3.0bpw?
#1
by blackcat1402 - opened
Hi, thanks for your quick response on this model. To fit it into 32G VRAM, it is kind of you to quantize a 3.0bpw model in exllamav2 format. Thanks in advance!
Would a 2.4 or 2.2 bpw fit on a 24gb card? Would love to try this.
No rush!
Understood. I have had mixed results with 2.3bpw lonestriker models back earlier this year. I need more VRAM for sure
@DTechNation if you’d like, I have a openwebui endpoint I run for some friends, runs the model at 7.0bpw with 90k context.
I could give you access for a week to experiment.
Chat.bigstorm.ai to signup.
Just let me know!
Closing for inactivity
bigstorm changed discussion status to closed