am looking forward to see what size gguf you can make
#1
by infinityai - opened
am looking forward to see what size gguf you can make
I don't know if you've seen this or not about this, it's supposed to let you quantise from 16 bit to 2 bit and have next to no loss
It's supposed to be a way to compress the weights without any loss supposedly
Can you have a look at it and let me know what you think?
Official repos
QuIP (original):
https://github.com/Cornell-RelaxML/QuIP
QuIP# (improved + CUDA kernels):
https://github.com/Cornell-RelaxML/quip-sharp
It would be really cool if we could compress some of the top models using this QuIP compression quantisation technique on top to with the REAP