These are AMD GFX906-focused GGUF quantizations of Kimi-Linear-48B-A3B-Instruct.

For GFX906 users: Kimi-Linear support has been merged into the llama.cpp-gfx906 fork.

You can git clone it and compile locally with the following commands:

git clone https://github.com/iacopPBK/llama.cpp-gfx906.git
cd llama.cpp-gfx906
./SCRIPT_compile_MI50.sh  # edit ROCM_PATH if not using /opt/rocm

Full credits for the Kimi-Linear implementation goes to ymcki! See their github repo here.

GGUF

Model size

49B params

Architecture

kimi-linear

Hardware compatibility

4-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kamali-Lab/Kimi-Linear-48B-A3B-Instruct-GGUF

Base model

Quantized

(21)

this model