Please add support to mlx and llama.cpp inference engines

by Narutoouz - opened Nov 12, 2025

Nov 12, 2025

I never expected the long context inference could be made faster soon! Thanks for your great work, Please make mlx and gguf versions of this model, also add support to llama.cpp and mlx lm inference engine to support this model just like for vllm.

audreyt

Feb 27

Hi @Narutoouz , The GGUF version is available here: https://huggingface.co/audreyt/Brumby-14B-Base-GGUF

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment