nan kld
Hey, just wanted to let you know that I am getting the nan kld, too.
I did my baseline from a custom q8 that is basically the full precision model (q8 vs f8 for the experts).
Did you use the full bf16 precision as baseline?
I am also getting the nan for unsloths UD-Q4K_XL and am actually using a quant that gives nan in llama-perplexity.
So it might not be quant related but something else?
Curious what you found out so far.
I used the BF16 for the baseline, yes.
The same thing happened with the Q4-ish quants of the Mistral Small 120B model too, it got nan's when testing too.
I think there's something wrong with lcpp in that there's a numerical issue somewhere happening but I don't have any further info at the moment, my GPUs are tied up doing some other testing today so I can explore this more tomorrow.
Hi @AesSedai
I used the BF16 for the baseline, yes.
Can you link to where you obtained the BF16 weights? https://huggingface.co/MiniMaxAI/MiniMax-M2.7 only has about 230GB of tensors, which I assume must be fp8.
I found a (seemingly) functioning ablated version of M2.7 -- Youssofal/MiniMax-M2.7-abliterated-BF16 -- with its respective GGUFs at -- Youssofal/MiniMax-M2.7-Abliterated-Heretic-GGUF.
Q3K_M is working, apart from a few sus things, like rare errors in names (e.g., outputting Amano instead of Amane) - no idea if it's due to Q3 quantization itself or anything else.
It is not clear whether Q4K_M got those issues you guys were talking about - and if it does have them, I'm not sure if the author will attempt to address it.
Question is, if you have time on your schedule and if it's not too burdensome, could you please have a look at it later? I'm not demanding to make new quants, of course.
late edit: just minor stuff, nevermind it
I don't really quant finetunes if that's what you're asking, but I do plan on looking more into the Q4_K_M nan issue. I've got my rig crunching some lineage bench testing for someone else at the moment but should be able to look into it further tomorrow evening I hope.
If you wanted to test if the Youssofal/MiniMax-M2.7-Abliterated-Heretic-GGUF had the nan issue, you should be able to reproduce it by trying to run perplexity on the model, eg:
./build/bin/llama-perplexity \
--file /path/to/wiki.test.raw \
--model /path/to/MiniMax-M2.7-abliterated-Q4_K_M.gguf-00001-of-00004.gguf
Any file path for testing should work, but wiki.test.raw is a pretty common one to use for PPL. If the quant has the issue, it'll show as nan on some of the output rows.
I don't really quant finetunes if that's what you're asking, but I do plan on looking more into the Q4_K_M
nanissue. I've got my rig crunching some lineage bench testing for someone else at the moment but should be able to look into it further tomorrow evening I hope.If you wanted to test if the
Youssofal/MiniMax-M2.7-Abliterated-Heretic-GGUFhad thenanissue, you should be able to reproduce it by trying to run perplexity on the model, eg:./build/bin/llama-perplexity \ --file /path/to/wiki.test.raw \ --model /path/to/MiniMax-M2.7-abliterated-Q4_K_M.gguf-00001-of-00004.ggufAny file path for testing should work, but wiki.test.raw is a pretty common one to use for PPL. If the quant has the issue, it'll show as
nanon some of the output rows.
Got it! Oh, and no, not really - I'm just being too cautious rather than asking for new GGUFs, like I mentioned.
I'm currently doing some initial conversational tests with Q4K_M of that specific M2.7, it seems to be doing good so far.
Will attempt to check it for NaN issue later (if my last functioning brain cell won't give up on me, lol).
Interesting comment by bartowski about CUDA and NaN https://old.reddit.com/r/LocalLLaMA/comments/1slk4di/minimax_m27_gguf_investigation_fixes_benchmarks/og8kk8x/
I bumped up the layer 61 ffn_down_exps to Q6_K and not getting nan anymore in KLD / PPL, so that will be uploading shortly. The issue seems to be Q4_K or Q5_K used for that layer's ffn_down_exps, maybe an activations overflow or something in lcpp. Not sure exactly, but swapping that one layer's quantization level did indeed resolve it.
@blankreg I tried Bart's trick but it doesn't work - they themselves edited the comment and it still NaNs - the only solution that seems to have worked was the Q6_K trick which Aes also employed in Q4_K_M I think today