command to create GGUF MXFP4 mixed with BF16

#5
by ghit72 - opened

Hello noctrex,
I'm new to this topic. I would like to ask if you could share the command or workflow how you were able to mix MXFP4_MOE with the BF16 tensors. I played arround with different command variations of llama-quantize MXFP4_MOE and "COPY", but no success.
Thank you.

Owner

I made the changes in the source code, go to src/llama-quant.cpp, and change the section:

    } else if (ftype == LLAMA_FTYPE_MOSTLY_MXFP4_MOE) {
        // MoE   tensors -> MXFP4
        // other tensors -> Q8_0
        if (tensor->ne[2] > 1) {
            new_type = GGML_TYPE_MXFP4;
        } else {
            new_type = GGML_TYPE_Q8_0;
        }

from GGML_TYPE_Q8_0 to GGML_TYPE_BF16
and compile

noctrex changed discussion status to closed

Sign up or log in to comment