GGUF not working.
@Flexan
Hey, I'm DQN Labs, I trained dqnCode v0.3, and usually I publish all my models in GGUF too, but for some reason LFM-style models that I train aren't able to convert to GGUF, they only work in MLX or HF format. I see you've converted to GGUF but when I try to load the GGUF it still doesn't work for me, and the model doesn't load. Can you let me know if it works for you?
Hello, thanks for your issue. I don't really have much understanding of MLX models, but from what I can tell by my research just now is that they are an Apple Silicon specific format. They seem to be somewhat incompatible with GGUF without dequantization, though I'm not sure how that works.
I'd love to convert the model, though I don't have an Apple device, so it doesn't seem like it's possible for me. According to Gemini, you'd have to run this command to dequantize it and convert it to GGUF:
pip install mlx-lm
python -m mlx_lm.convert --model <path> --export-gguf
The GGUF indeed doesn't work for me, I must've forgotten to test it. Sorry about that! :<
@Flexan I've tried this on my Mac device and yeah, that is the normal process that you do for models to convert them, but there's a problem with that:
The --export-gguf flag in that command only works if the model architecture is Mistral or Llama, so usually I have to avoid using that flag, convert them to HF and use llama.cpp to convert instead.
But that solution usually works for all my models, however every single time I try to convert the LFM models that I fine tune, it becomes corrupted in some way and the GGUF is able to load on LM Studio, Ollama, anywhere I try. The MLX and HF versions though, are perfectly fine.
But I thank you for your support and thanks a lot for quantizing my models! I'm publishing a new model called dqnMath v0.2 in a few minutes, if you're free, I'd love it if you could quantize those too (only if you're free, no worries if not). I'll be publishing those in GGUF format (f16 and q4km). Thanks :D
I see. The configuration seems correct, so unfortunately I don't think there's much I can do. I try my best to make all models I lay eyes on convert, but I guess it might be an LFM2.5 specific issue.
No problem! Is the model DQN-Labs/dqnMath-v0.2? I'll convert them as soon as the model files become available ^~^
@Flexan Yeah, that's the repo! Working on some issues right now (not related to GGUF luckily) and drafting the model card so it'll take some time, but I'd love it if you could just send in a PR with the new quants! I'll check for your PR and merge when available. Thanks so much for doing this :D
@DQN-Labs I just finished quantizing dqnMath-v0.1-3B-HF and dqnMath-v0.2-3.8B-HF to all quants. If those weren't the repos you meant, I'm happy to convert another one. I did not see any updates in the repo I mentioned, so I decided you must've meant these ^-^
Did you mean that you want me to make a pull request so they're available in your repos? Your repo DQN-Labs/dqnMath-v0.2 doesn't exist anymore, but I can make one for DQN-Labs/dqnMath-v0.1 :]
I've also confirmed both of those GGUFs work correctly. Although with testing v0.2, I found that by asking "What is 42 + 3" it sometimes says the answer is 75 (I'm not sure if it's supposed to, but v0.2 does not think. v0.1 does). I'm assuming the model is supposed to be better at math.
@Flexan So I'd actually appreciate if you revert the PR for dqnMath-v0.2-3.8B-HF, because I found a critical flaw in the model's behavior, and I just retrained and am evaluating. That explains why the model was acting up in your testing. Nothing to worry about, as I've found the same and I'm in the process of fixing it :D
P. S. Can I get your email/Discord to stay in contact with you?
Thanks for quantizing some of the other models too, much appreciated!
@DQN-Labs I just finished quantizing dqnMath-v0.1-3B-HF and dqnMath-v0.2-3.8B-HF to all quants. If those weren't the repos you meant, I'm happy to convert another one. I did not see any updates in the repo I mentioned, so I decided you must've meant these ^-^
Did you mean that you want me to make a pull request so they're available in your repos? Your repo
DQN-Labs/dqnMath-v0.2doesn't exist anymore, but I can make one forDQN-Labs/dqnMath-v0.1:]I've also confirmed both of those GGUFs work correctly. Although with testing v0.2, I found that by asking "What is 42 + 3" it sometimes says the answer is 75 (I'm not sure if it's supposed to, but v0.2 does not think. v0.1 does). I'm assuming the model is supposed to be better at math.
And yes, I'd love for dqnMath v0.2 if you could PR your quants to the main repo so I can have them available for download on my repo too, as well as your own.