Please add support to mlx and llama.cpp inference engines
#3
by Narutoouz - opened
I never expected the long context inference could be made faster soon! Thanks for your great work, Please make mlx and gguf versions of this model, also add support to llama.cpp and mlx lm inference engine to support this model just like for vllm.
Hi @Narutoouz , The GGUF version is available here: https://huggingface.co/audreyt/Brumby-14B-Base-GGUF