GGUF/llama.cpp support
#1
by tcpmux - opened
Would be awesome!
It may already be supported since it's just llama architecture. There are GGUF of the base model uploaded. As long as it doesn't mirror/echo from the instruction tuning should be a good one.
"It may already be supported since it's just llama architecture"
Sadly it's not. It can be converted and quantized, but the corresponding file is not properly accepted by llama.cpp and instead crashes with errors when loading.