Can I use this model with TurboQuant 4bit KV cache in your llama.cpp runtime fork?

#2
by Gavin-chen - opened

If I can do this, I can use 128K context in my workflow. Thanks in advance!

Hi, it is in my plan but I am away for some family matters. Turbo4 and turbo3 from Tom will be incorporated . You can also create an MR if you like to contribute

Sign up or log in to comment