gguf?
#7
by deniiiiiij - opened
Hi. will there be a gguf? quantize who can)
This isn't a matter of quantization; if you want llama.cpp to support this model, you would need to port its specific inference algorithms and framework. Markovian RSA is by no means a simple mechanism; it essentially involves the parallel processing of multiple inference streams followed by recursive aggregation—you can think of it, simply put, as a dynamic "Best-of-N" inference system. This approach is not only slow but also memory-intensive, making it unsuitable for personal deployment; although the model itself is only 8B, its actual inference overhead is substantial.